STAT266/CHPR266/HRP292-- Course Files, Readings, Examples


Week 1--Course Introduction; Matching Methods part 1 (intro and theory)

Lecture Topics             Lecture 1 slide deck (companion audio)
1. Course outline and logistics
2. A matched observational study (DOS, Chap 7)
3. Study design versus inference
4. Basic tools of multivariate matching (DOS, Secs 8.1-8.4)

Text Readings
Rosenbaum DOS: Chapters 7 and 8 (8.1-8.4)
Additional Resources
Observational Studies according to Donald B. Rubin
   For objective causal inference, design trumps analysis Annals of Applied Statistics, Volume 2, Number 3 (2008), 808-840.    Rubin talk .   Another Rubin overview of matching: Matching Methods for Causal Inference Stuart, E.A. and Rubin, D.B. (2007). Best Practices in Quasi-Experimental Designs: Matching methods for causal inference. Chapter 11 (pp. 155-176) in Best Practices in Quantitative Social Science. J. Osborne (Ed.). Thousand Oaks, CA: Sage Publications.

Computing Corner: see 2019 course website

Week 1 Review Questions
From Computing Corner
1.  In Week 1 Computing Corner with the Lalonde data (effect of job training on earnings), we started out (see R-session) by showing the ubiquitous [epidemiology to economics] analysis for observational data of an analysis of covariance, aka tossing the treatment variable and all the confounders into a regression equation predicting outcome and hoping for the best (c.f 2016 Week 1 in the news analyses: mom fish consumption on child cognition). The statement made in class (technical details week 1 stat209) is that regression does not "control" for confounders; instead the coefficient of treament (putative causal effect) is obtained from a straight-line regression of outcome on the residuals from a prediction of treatment by all the other predictors in the regression. Demonstrate that equivalence using the ancova in CC1.       
 Solution for Review Question 1
2. RQ1 uses the Week 1 Computing Corner Lalonde data (effect of job training on earnings) analysis of covariance: tossing the treatment variable and all the confounders into a regression equation predicting outcome and hoping for the best. Compare that ancova with an ancova the uses just the significant predictors of re78. Also compare with an ancova which uses the single available covariate/confounder having the highest correlation with outcome. Are these analyses consistent?       
 Solution for Review Question 2
From Lecture
3. We will be working a lot with matching based techniques. One of the best thinkers/writers on the topic of matching is Elizabeth Stuart from Johns Hopkins. For this problem, take a look at her paper: "Matching Methods for Causal Inference: A Review and a Look Forward." In lecture 01 you were introduced to "balance tables" (a.k.a. "Table 1") which summarizes the covariate distribution of the observations. A handful of questions: (a) as concisely as possible, state why we focus on balance assessments as part of our argumentation when attempting to perform causal inference, (b) in addition to a balance table, name other tools used to report balance, (c) why do we use standardized mean differences instead of p-values to assess balance when assessing the quality of a match design?, and (d) why is it kinda weird to use a p-value of the covariates in a randomized trial to assess balance?       
 Solution for Review Question 3
4. In lecture 1 we quickly outlined some of the big challenges to causal inference when using observational data (see slide 41, "There should be strong effort to show the two groups are similar..."). These challenges include: inclusion/exclusion of observations, observational units that may be completely missing (censored, survival bias), missing data, imbalances in observed data, and imbalances in unobserved data. We'll address each of these at different points in the course. But let's focus on the decision to include/exclude observations. What we're doing when matching -- i.e., removing observations that do not have adequate counterparts in the contrast group -- may seem a bit subversive. The intuition is: why "throw away" data? I think there are two reasons people worry about "throwing away data." First, it seems like limiting the kinds of observations in our study we may be losing the ability to generalize our conclusions to a wider swath of the population. The counter to that is: yes, we are trading off the ability to generalize (i.e., external validity) for the ability to make stronger claims about a candidate causal effect (i.e., internal validity). The second concern is that it seems like more data is better. Formulate a response to this concern. (Note: OMG, this question seems so nebulous. Yup. That's how this works; you're playing Big Kid academics now. We made sure to mention this argument during lecture 01, so you know it. It's a common statistical argument nowadays. If you want to read your way out of this one... here's a good paper.)       
 Solution for Review Question 4
From Computing Corner
5. Exercise in pair matching. In DOS Sec 2.1, Rosenbaum works with the randomized experiment data from NSW. In Week 1,2 Computing Corner we used the constructed observational study version of these data. Use the observational study data to do a version of the 1:1 matching in DOS section 2.1. Compare the balance improvement achieved from nearest neighbor matching with the full matching results in Computing Corner Week 1,2.       
 Solution for Review Question 5
6. For the fullmatch analysis done in the Lalonde class presentation weeks 1 and 2, the outcome comparison was carried out using lmer to average the treatment effects over the 104 subclasses. A hand-wavy analogy to the paired t-test here would be to use the mean difference within each subclass. Show that (because some of the subclasses are large) this simplified analysis doesn't well replicate the lmer results.       
 Solution for Review Question 6
7. optmatch package, fullmatch, lalonde.
MatchIt uses the optmatch package fullmatch command for its "full" option, as used in the class example. Using the raw optmatch (without the matchit wrapper) allows additional specifications and controlls for the full or optimal matching.
For lalonde data try out optmatch fullmatching and compare results for subclasses and balance with the class example using optmatch through MatchIt.       
 Solution for Review Question 7