Stat 209B-- Lectures, Course Files, and Readings

Week 0
Course introduction (slides and audio posted on main page)
Background readings (not required, but of interest if you haven't seen these before)
1.   Correlation and Causation: A Comment, Stephen Stigler Perspectives in Biology and Medicine, volume 48, number 1 supplement (winter 2005)
2.    Secret to Winning a Nobel Prize? Eat More Chocolate  (Time)
    Publication: Chocolate Consumption, Cognitive Function, and Nobel Laureates Franz H. Messerli, M.D. N Engl J Med 2012; 367:1562-1564 October 18, 2012
3.  David Freedman chapters.
   From Association to Causation: Some Remarks on the History of Statistics;  
   Statistical Models for Causation: A critical review    
   Statistical Models and Shoe Leather, Sociological Methodology, Vol. 21. (1991), pp. 291-313. JStor link

Week 1

Lecture slides, week 1 (pdf)
Audio companion, week 1
parta   partb   partc
1. Encouragement Designs: example of potential outcomes formulation.

Lecture Topics
 Illustration using encouragement design representation in Holland (1988).    copies of selected overheads.
 Encouragement Designs. Potential outcomes formulation and IV parameter estimation in Holland (1988).    Estimation handout
 Do regression methods (path analysis) identify causal effects? Demonstrations of failure for Holland's encouragement design.    class handout    Encouragement design slides

Primary Readings
Paul Holland, Causal Effects and Encouragement Designs. Causal Inference, Path Analysis, and Recursive Structural Equations Models
Paul W. Holland Sociological Methodology, Vol. 18. (1988), pp. 449-484. (Encouragement design results; sections 3-5)
Holland Appendix (esp pp. 475-480) presents the potential outcomes formulation.
    Abstract   Rubin's model for causal inference in experiments and observational studies is enlarged to analyze the problem of "causes causing causes" and is compared to path analysis and recursive structural equations models.
A special quasi-experimental design, the encouragement design, is used to give concreteness to the discussion by focusing on the simplest problem that involves both direct and indirect causation.
It is shown that Rubin's model extends easily to this situation and specifies conditions under which the parameters of path analysis and recursive structural equations models have causal interpretations.

Encouragement Design research examples:
   Sesamee Street evaluation
Gelman-Hill text sec 10.5; Data Analysis Using Regression and Multilevel/Hierarchical Models
   Salt and Blood Pressure clinical trial
Publication: Feasibility and efficacy of sodium reduction in the Trials of Hypertension Prevention, phase I Trials of Hypertension Prevention Collaborative Research Group. S K Kumanyika, P R Hebert, J A Cutler, V I Lasser, C P Sugars, L Steffen-Batey, A A Brewer, MI. Hypertension doi: 10.1161/01.HYP.22.4.5021993;22:502-512

2. Mediating (process) variables

Lecture Topics
 Historical (Barron-Kenny) methods  David Kenny web page
 R-implementations: mediating variables         data analysis example    data file
    Barron-Kenny method via Sobel function in the multilevel package.
    More extensive implementation (incl BCa bootstrapping) function mediation in package MBESS Ken Kelley;
    power and sample size calculations in package powerMediation
    mediation package. takes the topic up a large level of complexity/capabilities

Primary Readings
Vignette for mediation package   Causal Mediation Analysis Using R   
Mediation Analysis David P. MacKinnon, Amanda J. Fairchild, and Matthew S. Fritz Department of Psychology, Arizona State University, Tempe, Arizona 85287-1104; Annu. Rev. Psychol. 2007. 58:593-614

Mediation research examples:
  Framing experiment
Brader T, Valentino NA, Suhat E (2008). What Triggers Public Opposition to Immigration? Anxiety, Group Cues, and Immigration." American Journal of Political Science, 52(4), 959-978.  jstor link
Data in mediation package; data description and analyses in mediation package vignette (linked below)
  Bench Science vs Path Analysis: Exercise and Alzheimers
The irisin bench-science mediation example is discussed at the beginning of Week 2 lecture for recap and because I couldn't find it at the time.
NYTimes:How Exercise May Help Keep Our Memory Sharp .
Publication: Exercise-linked FNDC5/irisin rescues synaptic plasticity and memory defects in Alzheimer's models   Nature Medicine volume 25, pages165-175 (2019)
  Mediated moderation?
   Stanford Medicine     Common opioids less effective for patients on SSRI antidepressants    Publication: Predicting inadequate postoperative pain management in depressed patients: A machine learning approach Arjun Parthipan,Imon Banerjee,Keith Humphreys,Steven M. Asch,Catherine Curtin,Ian Carroll ,Tina Hernandez-Boussard Published: February 6, 2019
   New Yorker. December 23, 2013. The Power of the Hoodie-Wearing C.E.O.    Publication: The Red Sneakers Effect: Inferring Status and Competence from Signals of Nonconformity Author(s): Silvia Bellezza, Francesca Gino, and Anat Keinan Source: Journal of Consumer Research

Additional Resources
Mediators and Moderators of Treatment Effects in Randomized Clinical Trials  Helena Chmura Kraemer; G. Terence Wilson; Christopher G. Fairburn; W. Stewart Agras Arch Gen Psychiatry. 2002;59:877-883
additional technical papers. Causal Mediation Analysis Using R K. Imai, L. Keele, D. Tingley, and T. Yamamoto    American Political Science Review Vol. 105, No. 4 November 2011 Unpacking the Black Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M.,West, S. G., Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7, 83-104.
      Useful expositions Using R
Chapter 14: Mediation and Moderation  Alyssa Blair
Mediation and Moderation Analyses with R - OSF  presentation slides

Week 1 Review Questions

Question 1. Mediating Variable Computations: Class example continued
The data set shown in class example ss423 is linked above and in the legacy directory
for predictor (IV) 'belong' outcome 'depress' and (potential) mediating variable 'master' The class example showed you the Baron-Kenny analysis using functions from the multilevel and MBESS packages.
Here just use 'lm' basic regression and the recipees from the class handout to recreate point estimates and asymptotic standard errors, significance tests for the mediating variable effect.
Compare your result with the class example posting.
Extra: also try out the more 'sophisticated' functions in the mediation package.
Solution for question 1

Question 2. Potential Outcomes, Encouragement Design Estimation and (Causal) Mediation
Task 1. Create a potential Outcomes dataset following the first ALICE specification in the posted slides (week 3) ## ALICE example beta = 3 rho = 3 tau = 1, delta = 3 (I did n=400; larger would be better so I redid with n = 6400)

Task 2. Use the artificial data to show the results for the mediation (indirect) effect by hand doing the 3 regressions using multilevel package (sobel) using MBESS package using the causal mediation estimation ACME from the mediation package and compare with rho*beta

Task 3 estimate beta by the Wald estimator (assuming tau = 0) and estimate mediation effect

Solution for question 2

Question 3. Sesame Street: Encouragement Design research example
Sesame Street research setting and data description given pdf p.30 of Lecture 1 (also Gelman text).
For this exercise use postnumb : posttest on numbers (0-54), along with the measures encour and regular from the class example in Lecture 1.
Use the encouragement design formulation to estimate the effect on child cognitive development (postnumb here) of watching more Sesame Street.
What assumption is necessary for the IV estimation in this design?
Obtain a point and interval estimate for the effect of viewing (use ivreg as in class example).
From simple descriptives reproduce this instrumental variables estimate (Wald estimator).
The second approach (path analysis) analyzed by Holland requires what assumption?
Obtain the path analyses (regression) estimate for the effect on child cognitive development (postnumb here) of watching more Sesame Street.
Compare with the IV estimate (which employs different assumptions).
Solution for question 3

Week 2

Moderating Variables in experimental studies (heterogeneous treatment effects)

Lecture slides, week 2 (pdf)
Audio companion, week 2
parta  partb   partc
Lecture topics
0. Moderation, mediation recap slide
1. Review: formulation and purposes of analysis of covariance
    basic (old) ancova exposition slides           ancova and extensions, math notes
   High School and Beyond (observational study) school means data example HSB ancova handout (ascii version)      data for HSB ancova     HSB ancova, scanned pdf
2. Moderating variables, Heterogeneous Treatment Effects (CATE).
      Analyzing treatment effects as a function of covariate(s)
     CNRL, including Johnson-Neyman technique   cnrl data   cnrl analysis (extended)

Primary Readings
Ancova and extensions   
Rogosa, D. R. (1980). Comparing nonparallel regression lines.   Psychological Bulletin, 88, 307-321. [a better quality scan from the APA site]
R resources (below).

Moderation research examples:
       Gender differences in effectiveness of aspirin.
 Aspirin may be less effective heart treatment for women than men
Publication:      Aspirin Resistance in Patients with Stable Coronary Artery Disease, in the Annals of Pharmacotherapy April 2007
     Moderating variables can be your friend (statistics is the only friend you need)           music: I've got friends in low places
Wash Post: Why smart people are better off with fewer friends .
Publication: Country roads, take me home... to my friends: How intelligence, population density, and friendship affect modern happiness.   British Journal of Psychology 2016
    ATI research
Snow R.E. (1978) Aptitude-Treatment Interactions in Educational Research. In: Pervin L.A., Lewis M. (eds) Perspectives in Interactional Psychology. Springer, Boston, MA.
     Family SES as a moderating variable in nature/nuture:
Why Rich Parents Don't Matter  UTexas press release: Being Poor Can Suppress Children's Genetic Potentials    Publication: Emergence of a Gene x Socioeconomic Status Interaction on Infant Mental Ability Between 10 Months and 2 years DOI: 10.1177/0956797610392926 Psychological Science published online 17 December 2010 Elliot M. Tucker-Drob, Mijke Rhemtulla, K. Paige Harden, Eric Turkheimer and David Fask

R implementations and Resources
package probemod    manual
package interactions    intro     vignette: Exploring interactions with continuous predictors in regression models    manual

Additional Resources,  Ancova and extensions
Improving Present Practices in the Visual Display of Interactions Advances in Methods and Practices in Psychological Science
      analysis of covariance: Background/historical papers:
Covariance Adjustment in Randomized Experiments and Observational Studies Paul R. Rosenbaum Statistical Science, Vol. 17, No. 3. (Aug., 2002), pp. 286-304.   Jstor
Some Aspects of Analysis of Covariance, A Biometrics Invited Paper with Discussion. D. R. Cox; P. McCullagh Biometrics, Vol. 38, No. 3, (Sep., 1982), pp. 541-561.   Jstor
Analysis of Covariance: Its Nature and Uses William G. Cochran Biometrics, Vol. 13, No. 3, Special Issue on the Analysis of Covariance. (Sep., 1957), pp. 261-281. Jstor
The Use of Covariance in Observational Studies W. G. Cochran Applied Statistics, Vol. 18, No. 3. (1969), pp. 270-275. Jstor
Estimation of the Slope and Analysis of Covariance when the Concomitant Variable is Measured with Error James S. Degracie; Wayne A. Fuller Journal of the American Statistical Association, Vol. 67, No. 340. (Dec., 1972), pp. 930-937. Jstor
Deep background Neter-Wasserman text (Applied linear statistical models. Neter, Kutner, Nachtsheim and Wasserman 1996. Fifth edition. Homewood IL: Irwin, Inc.) chapters 22 and 8.
     Johnson-Neyman technique and aptitude-treatment interaction (ATI)
Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. Irvington
Regions of Significant Criterion Differences in Aptitude-Treatment-Interaction Research Leonard S. Cahen; Robert L. Linn American Educational Research Journal, Vol. 8, No. 3. (May, 1971), pp. 521-530. Jstor
Identifying Regions of Significance in Aptitude-by-Treatment-Interaction Research Ronald C. Serlin; Joel R. Levin American Educational Research Journal, Vol. 17, No. 3. (Autumn, 1980), pp. 389-399. Jstor
Defining Johnson-Neyman Regions of Significance in the Three-Covariate ANCOVA Using Mathematica Steve Hunka; Jacqueline Leighton Journal of Educational and Behavioral Statistics, Vol. 22, No. 4. (Winter, 1997), pp. 361-387.  Jstor
discussion of substantive issues: Trait-Treatment Interaction and Learning David C. Berliner; Leonard S. Cahen Review of Research in Education, Vol. 1. (1973), pp. 58-94. Jstor

Week 2 Review Questions

Question 1. Background: standard analysis of covariance.(no moderating variable)

A researcher is studying the effect of an incentive on the retention of subject matter and is also interested in the role of time devoted to study.
Subjects are randomly assigned to two groups, one receiving (C3 = 1) and the other not receiving (C3 = 0) an incentive. Within these groups, subjects are randomly assigned to 5, 10, 15, or 20 minutes of study (C2) of a passage specifically prepared for the experiment. At the end of the study period, a test of retention is administered.
Treat the study time as a covariate for investigating the differential effects of the incentive.   Does using the covariate improve precision in estimating the effect of incentive?
Does the ancova assumption of a constant treatment effect at levels of StudyMin appear reasonable?
full data are in file retention.dat  formerly located at
note:  As of January 2022 Statistics Dept. servers eliminated--files linked at[file] or[file] now reside at[file].
Linked materials resolve to seemlessly but to read in data files to R requires using the new file location.
update: statweb file locations will read.table in R sucessfully; the older equivalent www-stat almost surely will not.
Solution for question 1

Question 2. Revisit High School and Beyond ancova from Week 2 lecture

In the class example we used school level (mean, gradient) outcomes and used school mean ses as a covariate. Investigate the usefulness of that covariate by comparing the ancova in class example with just a simple t-test (sector) on these school level outcomes. What is the difference in precision between using the covariate or not? As this is not an RCT (revisit in Unit 2), also look at differences in the estimate of the sector effect (bias?).
Solution for question 2

Question 3. Comparing Regressions (demonstration data, not an RCT)

Let's give recognition to the guys who made S (and R) and take some data from Venables, W. N. and Ripley, B. D. (1999) Modern Applied Statistics with S-PLUS. Third Edition. Springer (now up to 4th edition). Chap 6 section 1 considers analysis of the data set whiteside (available as part of MASS subset of VR package) to access
> library(MASS) # do need to load library, MASS is part of base R > data(whiteside) > ?whiteside
Mr Derek Whiteside of the UK Building Research Station recorded the weekly gas consumption and average external temperature at his own house in south-east England for two heating seasons, one of 26 weeks before, and one of 30 weeks after cavity-wall insulation was installed. The object of the exercise was to assess the effect of the insulation on gas consumption.
Format The whiteside data frame has 56 rows and 3 columns.:
Insul A factor, before or after insulation.
Temp Purportedly the average outside temperature in degrees Celsius. (These values is far too low for any 56-week period in the 1960s in South-East England. It might be the weekly average of daily minima.)
Gas The weekly gas consumption in 1000s of cubic feet.
Source. A data set collected in the 1960s by Mr Derek Whiteside of the UK Building Research Station. Reported by Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993) A Handbook of Small Data Sets. Chapman & Hall, p. 69.

carry out a comparing regressions analysis with Insul as the group variable, Gas as outcome, and Temp as within-group predictor.
construct a 95% confidence interval for the effect of insul on on gas with temp = 4 (pick-a-point procedure)
for what values of temp does there appear to be an effect of Insul on Gas (simultaneous region of significance)
Solution for question 3

Question 4.   R packages interactions and probemod
In lecture there was short mention of these two R-packages that whose main functions are to carry out the pick-a-point and Johnson-Neyman claculations, which are developed in Rogosa(1980).
Try out these functions using the cnrl dataset (also from Rogosa,1980) which we worked out in the lecture materials.
Solutions spoiler alert: no joy from these packages.
Solution for question 4

Week 3

Lecture slides, week 3 (pdf)
week 3, part a (pdf)
week 3, part b (pdf)
Audio companion, week 3
parta   partb
1. Compliance in RCT

Lecture topics
1. Compliance background: Intent-to-treat analyses, CACE estimators, research examples
2. Compliance and Dose-response data analysis (Efron-Feldman)
3. Rubin-Holland approach via Booil Jo presentation: Potential Outcomes Approach: A Brief Introduction
    Class handouts:   Compliance examples     Compliance overview     Compliance math notes     Little-Rubin Ann Rev Pub Health formulation

Primary Readings
Compliance Background: Intent-to-Treat (ITT), the FDA mandate.    simple definitions: wiki    Encyclopedia of epidemiology, Volume 1  (google books)
Potential outcomes formulation (CACE): Causal Effects in Clinical and Epidemiological Studies Via Potential Outcomes: Concepts and Analytical Approaches Roderick J. Little and and Donald B. Rubin Vol. Annual Review of Public Health, 21: 121-145, May 2000.
Epidemiology exposition:   An introduction to instrumental variables for epidemiologists, Sander Greenland, International Journal of Epidemiology 2000;29:722-729

Compliance research examples.
     Clofibrate in Coronary Drug Project
Influence of adherence to treatment and response of cholesterol on mortality in the coronary drug project. New England Journal of Medicine Volume 303:1038-1041 October 30, 1980 Number 18
    Vitamin A in Central America
  An introduction to instrumental variables for epidemiologists, Sander Greenland, International Journal of Epidemiology 2000;29:722-729
    Cholestyramine in Cholesterol trial (measured compliance)
Compliance as an Explanatory Variable in Clinical Trials. B. Efron; D. Feldman Journal of the American Statistical Association, Vol. 86, No. 413. (Mar., 1991), pp. 9-17. Jstor
    Draft Lottery and Vietnam Service
Joshua D. Angrist; Guido W. Imbens; Donald B. Rubin "Identification of Causal Effects Using Instrumental Variables" Journal of the American Statistical Association, Vol. 91, No. 434. (Jun., 1996), pp. 444-455. JStor

Additional resources
Compliance as an Explanatory Variable in Clinical Trials. B. Efron; D. Feldman Journal of the American Statistical Association, Vol. 86, No. 413. (Mar., 1991), pp. 9-17. Jstor
David Freedman on Compliance Adjustments:      Statistical Models for Causation: What Inferential Leverage Do They Provide?  Evaluation Review 2006; 30: 691-713.       On regression adjustments to experimental data  Advances in Applied Mathematics vol. 40 (2008) pp. 180-93.
Intent-to-treat Analysis of Randomized Clinical Trials Michael P. LaValley Boston University ACR/ARHP Annual Scientific Meeting Orlando 10/27/2003
Intention to treat--who should use ITT? J. A. Lewis and D. Machin Br J Cancer. 1993 October; 68(4): 647-650.   
Compliance analyses, R-implementations: Imai experiment package     Package icsw,   Inverse Compliance Score Weighting
What is meant by intention to treat analysis? Survey of published randomised controlled trials Sally Hollis and Fiona Campbell British Medical Journal 1999;319;670-674
Booil Jo, Dept of Psychiatry   Estimation of Intervention Effects with Noncompliance Journal of Educational and Behavioral Statistics
   Compliance Publications based on Neyman-Rubin causal models:
Direct and Indirect Causal Effects via Potential Outcomes Donald B. Rubin Scandinavian Journal of Statistics Volume 31, Issue 2, Page 161-170, Jun 2004 .
Imbens GW and Rubin DB (1997) Bayesian Inference for Causal Effects in Randomized Experiments with Noncompliance The Annals of Statistics, 25, 305-327.
Principal Stratification in Causal Inference  Constantine E. Frangakis and Donald B. Rubin, Biometrics, 2002, 58, 21–29.
Addressing Complications of Intention-to-Treat Analysis in the Combined Presence of All-or-None Treatment-Noncompliance and Subsequent Missing Outcomes. Constantine E. Frangakis; Donald B. Rubin Biometrika, Vol. 86, No. 2. (Jun., 1999), pp. 365-379. Jstor link
    Additional Case Studies
Principal Stratification Approach to Broken Randomized Experiments: A Case Study of School Choice Vouchers in New York City Barnard, Frangakis, Hill, and Rubin Journal of the American Statistical Association June 2003, Vol. 98, No. 462, Applications and Case Studies
The British Journal of Psychiatry (2003) 183: 323-331 Estimating psychological treatment effects from a randomised controlled trial with both non-compliance and loss to follow-up graham dunn, and mohammad maracy

2. Regression Discontinuity Designs (systematic assignment)

Lecture Topics
Non-random assignment on the basis of the covariate, such as regression discontinuity designs.
    Regression Discontinuity handout     Example from rdd manual    ascii version

Primary Readings
Regression Discontinuity Designs  Useful primers by Wm Trochin: William Trochim's Knowledge Base
Rubin, D. B., (1977), "Assignment to a Treatment Group on the Basis of a Covariate", Journal of Educational Statistics, 2, 1-26.   Jstor link

Regression Discontinuity Research Examples
    The original: PSAT and National Merit
Thistlewaite, D., and D. Campbell (1960): "Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment," Journal of Educational Psychology, 51, 309-317.
    Class size, Maimonides' Rule
In Rosenbaum, Design of Observational Studies (linked on main page).    sections 1.3, 3.2, 5.2.3, 5.3 DOS text
Angrist-Lavy Maimondes (class size) data   Angrist and Lavy, 1999.               read data ang = read.dta("")

R implementations and Resources
R-package--rdd;   Regression Discontinuity Estimation Author Drew Dimmery
Also Package rdrobust Title Robust data-driven statistical inference in Regression-Discontinuity designs
   RJournal for rdrobust,   rdrobust: An R Package for Robust Nonparametric Inference in Regression-Discontinuity Designs

Additional Resources: Regression Discontinuity Designs
Journal of Econometrics (special issue) Volume 142, Issue 2, February 2008, The regression discontinuity design: Theory and applications      Regression discontinuity designs: A guide to practice, Guido W. Imbens, Thomas Lemieux
    Also from Journal of Econometrics (special issue) Volume 142, Issue 2, February 2008, The regression discontinuity design: Theory and applications  Waiting for Life to Arrive: A history of the regression-discontinuity design in Psychology, Statistics and Economics, Thomas D Cook
the original paper: Thistlewaite, D., and D. Campbell (1960): "Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment," Journal of Educational Psychology, 51, 309-317.
Trochim W.M. & Cappelleri J.C. (1992). "Cutoff assignment strategies for enhancing randomized clinical trials." Controlled Clinical Trials, 13, 190-212.  pubmed link
Capitalizing on Nonrandom Assignment to Treatments: A Regression-Discontinuity Evaluation of a Crime-Control Program Richard A. Berk; David Rauma Journal of the American Statistical Association, Vol. 78, No. 381. (Mar., 1983), pp. 21-27. Jstor
Berk, R.A. & de Leeuw, J. (1999). "An evaluation of California's inmate classification system using a generalized regression discontinuity design." Journal of the American Statistical Association, 94(448), 1045-1052.  Jstor
 another econometric treatment

Week 3 Review Questions

Regression Discontinuity

Question 1. Regression Discontinuity, classic "Sharp" design.

Replicate the package rdd toy example: cutpoint = 0, sharp design, with treatment effect of 3 units (instead of 10). Try out the analysis of covariance (Rubin 1977) estimate and compare with rdd output and plot. Pick off the observations used in the Half-BW estimate and verify using t-test or wilcoxon.
Extra: try out also the rdrobust package for this sharp design.       
 Solution for Review Question 1

Question 2. Systematic Assignment, "fuzzy design". Probabilistic assignment on the basis of the covariate.

i. Create artificial data with the following specification. 10,000 observations; premeasure (Y_uc in my session) gaussian mean 10 variance 1. Effect of intervention (rho) if in the treatment group is 2 (or close to 2) and uncorrelated with Y_uc. Probability of being in the treatment group depends on Y_uc but is not a deterministic step-function ("sharp design"): Pr(treatment|Y_uc) = pnorm(Y_uc, 10,1) . Plot that function.
ii. Try out analysis of covariance with Y_uc as covariate. Obtain a confidence interval for the effect of the treatment.
iii. Try out the fancy econometric estimators (using finite support) as in the rdd package. See if you find that they work poorly in this very basic fuzzy design example.
Extra: try out also the rdrobust package for this fuzzy design.       
 Solution for Review Question 2

Question 3. Controlled Assignment (class example)

From Rubin, D. B., (1977), "Assignment to a Treatment Group on the Basis of a Covariate", linked on course page

From page 16 Rubin
              7. A SIMPLE EXAMPLE

Table I presents the raw data from an evaluation of a computer-
aided program designed to teach mathematics to children in fourth 
grade. There were 25 children in Program 1 (the computer-aided 
program) and 47 children in Program 2 (the regular program). All 
children took a Pretest and Posttest, each test consisting of 20 
problems, a child's score being the number of problems correctly 
solved. These data will be used to illustrate the estimation 
methods discussed in Sections 4, 5, and 6. We do not attempt a 
complete statistical analysis nor do we question the assumption 
of no interference between units.


Raw Data for 25 Program 1 Children and 47 Program 2 Children
Pretest              Posttest Scores
         Program 1             Program 2
10            15                 6,7
9             16                 7,11,12
8             12                 5,6,9,12
7            8,11,12             6,6,6,6,7,8
6        9,10,11,13,20           5,5,6,6,6,6,6,6,6,8,8,8,9,10
5          5,6,7,16              3,5,5,6,6,7,8
4           5,6,6,12             4,4,4,5,7,11
3           4,7,8,9,12           0,5,7
2             4                   4
1              -                   -
0              -                   7

Does assignment appear to be random or is this appear to be Assignment on the Basis of Pretest?
Try to estimate the asignment rule, presuming it is based on pretest How does this differ from a regression discontinuity design (simplest version)?
Assuming that assignment to Program 1 or Program 2 was solely on the basis of pretest (plus perhaps a probabilistic component) estimate the effect of program (new vs regular).
note data in table 1 exist in a more convenient form in file hw5rubin.dat and data file included in the solutions
 Solution for Review Question 3

Compliance in RCT

Question 4 Non-compliance. Class example week 3.

Adapted from (linked on class page): An introduction to instrumental variables for epidemiologists, Sander Greenland, International Journal of Epidemiology 2000;29:722-729
Additional Reference: Sommer and Zeger (1991). On Estimating Efficacy from Clinical Trials. Statistics in Medicine

Greenland discusses randomized trials with non-compliance where Z indicates treatment assignment, which is randomized; X indicates treatment received, which is affected but not fully determined by assignment Z.

To illustrate Greenland presents in his Table 1 individual one- year mortality data from a cluster-randomized trial of vitamin A supplementation in childhood. Of 450 villages, 229 were assigned to a treatment in which village children received two oral doses of vitamin A; children in the 221 control villages were assigned none. This protocol resulted in 12,094 children assigned to the treatment (Z = 1) and 11,588 assigned to the control (Z = 0). Only children assigned to treatment received the treatment; that is, no one had Z = 0 and X = 1. Unfortunately, 2419 (20%) of those assigned to the treatment did not receive the treatment (had Z = 1 and X = 0), resulting in only 9675 receiving treatment (X = 1). Class handout has depiction and Greenland's table of results. Use as the outcome measure Y, the Deaths per 100,000 within one year (labeled Risk in Greenland's Table 1).

Part 1, using data summary from class handout
a. Give the ITT (intent-to-treat) estimate of the effect of vitamin A on Risk
b. What is the compliance rate in the treatment group (Z=1)? In the control group (Z=0)?
c. What is the instrumental variables estimate (following Angrist Imbens Rubin) of the effect of vitamin A on Risk?
What interpretation is given to this estimate (c.f. Booil Jo presentation)? Compare with part (a) result and comment.

Don Rubin has a great overview talk For Objective Causal Inference, Design Trumps Analysis Don Rubin, posted at
Starting pdf page 21 Rubin takes up noncompliance using the Viamin A data (slightly different tabulated values than in the Greenland paper handout)
d. Recreate the calculations (ITT As-treated, Per Protocol) shown on pdf p.23; refer to Booil Jo handout
e. also CACE estimate pdf p.24
The Bayesian estimates (Imbens and Rubin 1997) pdf page 25 onward are implimented in part in the experiment package (Imai) mentioned in class and class materials.

Solution for question 4
Question 5
From the Booil Jo presentation slides in lecture, consider the JHU PIRC Intervention Study: N=284
Estimate Intervention Effects With Noncompliance
The Johns Hopkins Public School Preventive Intervention Study was conducted by the Johns Hopkins University Preventive Intervention Research Center (JHU PIRC) in 1993-1994 (lalongo et al., 1999~ The study was designed to improve academic achievement and to reduce early behavioral problems of school children. Teachers and first-grade children were randomly assigned to intervention conditions. The control condition and the Family-School Partnership Intervention condition are compared in this example. In the intervention condition, parents were asked to implement 66 take-home activities related to literacy and mathematics over a six-month period. One of the major outcome measures in the JHU PIRC preventive trial was the TOCA-R (Teacher Observation of Classroom Adaptation)
• Completed at least 45 activities = compliers.
• Outcome: change score (baseline - followup) of anti-social behavior .
From the means and compliance data given in the class materials (also linked Booil talk) compute treatment effect estimate of change in anti-social behavior: give ITT estimate and CACE estimate

Solution for question 5

Question 6   Broken RCT: Compliance, measured or binary

Compliance as a measured variable. In Stat209 week 3 we examine compliance adjustments; both those based on a dichotomous compliance variable and the much much more common measured compliance (often unwisely dichotomized to match Rubin formulation). The Efron-Feldman study ( handout description) used a continuous compliance measure. An artificial data set a data frame containing Compliance, Group, and Outcome for Stat209 is constructed so that ITT for cholesterol reduction is about 20 (compliance .6) and effect of cholestyramine for perfect compliance is about 35.
Try out some IV estimators for CACE. Obtain ITT estimate of group (treatment) effect with a confidence interval. Try using G as an instrument for the Y ~ comp regression. What does that produce?
Alternatively use the Rubin formulation with a dichotomous compliance indicator defined as TRUE for compliance > .8 in these data. What is your CACE estimate. What assumptions did you make? Compare with ITT estimate. In this problem the ivreg function from AER package is used for IV estimation.       
 Solution for Review Question 6
More Question 6   1. Compliance data, IV analysis, imitating Efron-Feldman cholestyramine trial. Solution showed you the widely used ivreg function from package AER package. Redo the ivreg analyses using functions from the ivmodel package.       
 Solution for more Review Question 6

Week 4