Statistics 209 / HRP 239/ Education 260A
                                        Winter 2019

Statistical Methods for Group Comparisons and Causal Inference


David Rogosa

Lecture: WF, 2:30 - 4  Sequoia 200

course web page at http://rogosateaching.com/stat209/


                To see full course materials from Winter 2018 go here

Instructor. David Rogosa, Sequoia 224,  rag {AT} stanford {DOT} edu .
                   Office hours W,F 4 - 4:45.
TA   Claire Donnat     cdonnat {AT} stanford {DOT} edu
       Office Hours Th 4:30 - 6PM in Sequoia Hall Library (first floor)

Registrar's Information
STATS 209: Statistical Methods for Group Comparisons and Causal Inference (EDUC 260A, HRP 239)
Description
Critical examination of statistical methods in social science and life sciences applications, especially for cause and effect determinations. 
Topics: mediating and moderating variables, potential outcomes framework, encouragement designs, multilevel models, heterogeneous treatment effects,
matching and propensity score methods, analysis of covariance, instrumental variables, compliance, path analysis and graphical models, 
group comparisons with longitudinal data. 
Prerequisite: intermediate-level statistical methods.
Terms: Win | Units: 3 | Grading: Letter or Credit/No Credit

2017-2018 Winter
STATS 209 | 3 units | Class # 28885 | Section 01 | Grading: Letter or Credit/No Credit | LEC | 
01/07/2019 - 03/15/2019 Wed, Fri 2:30 PM - 4:20 PM at Sequoia Hall 200 with Rogosa, D. (PI) 
Instructors: Rogosa, D. (PI)
Course Overview
For students who have had intermediate-level instruction in statistical methods including multiple regression, logistic regression, log-linear models.
At the very least, the content of the course should provide some consolidation of previous instruction in statistical methods.
The goal is also to instill some introspection and critical analysis for the uses of statistical methods common in social science and medical applications, for experimental and observational studies.  
The focus of the course is on understanding what useful information statistical modeling can provide in experimental and especially non-experimental social science settings.

Quick Course Outline
Week 1. Course Introduction;  properties of regression models
Week 2. Experiments vs observational studies;  Neyman-Rubin-Holland formulation; encouragement designs;
Week 3. Path analysis and causal modeling, multiple regression with pictures. Graphical models.
Week 4. Multilevel data. Contextual effects, aggregation bias, mixed effects models
Week 5. The many uses and forms of analysis of covariance, including heterogeneous treatment effects and regression discontinuity designs
Week 6. Instrumental variable methods, simultaneous equations, reciprocal effects
Week 7. Compliance and experimental protocols;  intent to treat and compliance adjustments
Week 8. Matching and propensity score methods
Week 9. Time-1, Time-2 group comparisons for Experimental designs and Observational studies 
Dead Week. Overflow and course summary. 
Course Readings, Files and Examples

Relevent Texts (optional).    
  Statistical Models: Theory and Practice David Freedman (2005) Revised edition (2009).
This course was created in 2005 around David Freedman's text, and covers that material using auxiliary texts and online materials.
One intent of this course is for students to read some statistical literature and actual research reports to augment the texts (on that theme Freedman's text actually includes reprints of four published empirical research papers which are also available through Jstor).
A Primary resource for R and data analysis.
  Data analysis and graphics using R (2007) J. Maindonald and J. Braun, Cambridge 2nd edition 2007. 3rd edition 2010    short draft version in CRAN 
     Text resource page           R-packages for Text Data Sets etc    R-Package DAAG    R-Package DAAGxtras  
Primary additional resources,
Design of observational studies. Rosenbaum, Paul R. New York : Springer, c2010. Stanford access
Causal Inference in Statistics, Social and Biomedical Sciences: An Introduction, Guido Imbens and Don Rubin, 1st Edition (Cambridge University Press)   Stanford access
And more Data analysis and regression: A second course in statistics. Mosteller, F. and Tukey, J. W. (1977) (the green book)
Matched Sampling for Causal Effects, Donald B. Rubin Cambridge University Press 2006
Observational Studies Paul R. Rosenbaum, Publisher: Springer; 2 edition (January 8, 2002)
David Freedman Statistical Models and Causal Inferencee Cambridge 2010 ISBN 978-0-521-19500-3
Regression Analysis : A Constructive Critique  Richard A Berk (2003). Table of contents
     Jan de Leeuw, Preface to Berk's "Regression Analysis: A Constructive Critique"  


Grading, Homework and Exams.
Weekly homework assignments following class content will be posted, along with solutions. Homeworks are not graded.
Assessment. Two take home problem sets will be scheduled:
TH1 covering content weeks 1-4.
TH2 covering content weeks 5-8.
In class exam, Exam 3 scheduled by registrar, exam week. My best reading of the Registrar's chart indicates Monday March 18 2019 at 3:30 PM (in our classroom). If needed, Exam 3 can be taken remotely.
See also
Class Calendar

Course Assignments Page

Note to auditors. We should have plenty of room in Sequoia 200 for auditors.
The Registrar does have a form (no-fee) for faculty, staff, post-docs: Application for Auditor or Permit to Attend (PTA) Status   

Statistical computing
Class presentation will be in, and students are encouraged to use, R, (with occasional reference to SAS, Mathematica, and Matlab).
We have a set of 4 computer labs to supplement lecture materials (weeks 2, 4, 6, 8).
Lab 1. Multiple regression basics  Lab1 posted 1/17/19
Lab 2. Multilevel analysis (mixed-effects models) High School and Beyond example.          Lab 2 posted 1/29/19
Lab 2 has evolved in three pieces.
a.   Lab2, exposition and commands provides a full write up (annotated) of the analyses
b.    Lab 2, Rogosa R-session (nlme legacy version)
c.    Lab2 (priority Rogosa session) redone using lme4, lmer  (with additional plots)   Lecture slide, lme lmer for Bryk data
    For those who are strapped for time or otherwise saturated, I provide a full single Bryk dataset that skips over the data manipulation portion of the activity
Additional materials for HSB analyses are posted in Week 4 Lecture topics, sec 3(iii)
Lab 3, Instrumental Variables.
  Lab3, exposition and commands     
  Lab 3, Rogosa R-session        Mroz87 data description     Lab3
note: I triple-checked and the dataset is where the description indicates and read.table("http://statweb.stanford.edu/~rag/stat209/Mroz87.dat", header = T) reads in the 753 cases.
Lab 4 Matching and propensity scores. Lalonde job training data
This lab is arranged in pieces
a.   Lab4, exposition and commands   
b.   Lab 4, Rogosa R-session, Base (sections 1-3)  
c.   Lab 4, Rogosa R-session, additional matching exercises (incl secs 4-6)  
d.   Lab 4, Rogosa R-session: not done until ancova is run  

Current version of R is R version 3.5.2 (Eggshell Igloo) released on 2018-12-20. (I'm currently running 3.4.4 I see). For references and software: The R Project for Statistical Computing   Closest download mirror is Berkeley
The CRAN Task View: Statistics for the Social Sciences provides an overview of relevant R packages. Also of interest are CRAN Task View: Psychometric Models and Methods and CRAN Task View: Design of Experiments (DoE) and Analysis of Experimental Data
A handbook of statistical analyses using R (second edition). Brian Everitt, Torsten Hothorn CRC Press, Index of book chapters   Stanford access      Data sets etc Package 'HSAUR2' August 2014, Title A Handbook of Statistical Analyses Using R (2nd Edition).   There is now a third edition of HSAUR, but full text not yet available in crcnetbase.com.    CRAN HSAUR3 page  with Vignettes (chapter pieces) and data in reference manual
In prior fall qtrs I did short 5 week intro R-course intended for users of other statistical packages; see Ed401 page   
Among the infinite number of introduction to R resources is John Verzani's page A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li.   Another version
Even more stuff:   According to Peter Diggle: "The best resource for R that I have found is Karl Broman's Introduction to R page."
Wm. Revelle who develops the psych package also has a draft text which covers standard statistics plus specialized measurement topics (plus other R intros)
For those with a life sciences background a useful resource may be the book Analysis of epidemiological data using R and Epicalc and the Epicalc package.