Statistical Methods for Group Comparisons and Causal Inference

Lecture: WF, 2:30 - 4 Sequoia 200

course web page at http://rogosateaching.com/stat209/

To see full course materials from Winter 2016 go here

Instructor. David Rogosa, Sequoia 224, rag {AT} stanford {DOT} edu .

Office hours W,F 4 - 4:45.

TA Michael Sklar sklarm {AT} stanford {DOT} edu

Office Hours TH 2 - 3:30 in Sequoia Hall 220

STATS 209: Statistical Methods for Group Comparisons and Causal Inference (EDUC 260A, HRP 239) Description Critical examination of statistical methods in social science and life sciences applications, especially for cause and effect determinations. Topics: mediating and moderating variables, potential outcomes framework, encouragement designs, multilevel models, heterogeneous treatment effects, matching and propensity score methods, analysis of covariance, instrumental variables, compliance, path analysis and graphical models, group comparisons with longitudinal data. Prerequisite: intermediate-level statistical methods. Terms: Win | Units: 3 | Grading: Letter or Credit/No Credit 2017-2018 Winter STATS 209 | 3 units | Class # 31306 | Section 01 | Grading: Letter or Credit/No Credit | LEC | 01/08/2018 - 03/16/2018 Wed, Fri 2:30 PM - 4:20 PM at Sequoia Hall 200 with Rogosa, D. (PI) Instructors: Rogosa, D. (PI)

For students who have had intermediate-level instruction in statistical methods including multiple regression, logistic regression, log-linear models.

At the very least, the content of the course should provide some consolidation of previous instruction in statistical methods.

The goal is also to instill some introspection and critical analysis for the uses of statistical methods common in social science and medical applications, for experimental and observational studies.

The focus of the course is on understanding what useful information statistical modeling can provide in experimental and especially non-experimental social science settings.

Week 1. Course Introduction; properties of regression models Week 2. Experiments vs observational studies; Neyman-Rubin-Holland formulation; encouragement designs; Week 3. Path analysis and causal modeling, multiple regression with pictures. Graphical models. Week 4. Multilevel data. Contextual effects, aggregation bias, mixed effects models Week 5. The many uses and forms of analysis of covariance, including heterogeneous treatment effects and regression discontinuity designs) Week 6. Instrumental variable methods, simultaneous equations, reciprocal effects Week 7. Compliance and experimental protocols; intent to treat and compliance adjustments Week 8. Matching and propensity score methods Week 9. Time-1, Time-2 group comparisons for Experimental designs and Observational studies Dead Week. Overflow and course summary.

This course was created in 2005 around David Freedman's text, and covers that material using auxiliary texts and online materials.

One intent of this course is for students to read some statistical literature and actual research reports to augment the texts (on that theme Freedman's text actually includes reprints of four published empirical research papers which are also available through Jstor).

A Primary resource for R and data analysis.

Text resource page R-packages for Text Data Sets etc R-Package DAAG R-Package DAAGxtras

Additional resources,

Causal Inference in Statistics, Social and Biomedical Sciences: An Introduction, Guido Imbens and Don Rubin, 1st Edition (Cambridge University Press) Stanford access

David Freedman

Jan de Leeuw, Preface to Berk's "Regression Analysis: A Constructive Critique"

Weekly homework assignments following class content will be posted, along with solutions. Homeworks are not graded.

TH1 covering content weeks 1-4.

TH2 covering content weeks 5-8.

In class exam,

See also

Course Assignments Page

The Registrar does have a form (no-fee) for faculty, staff, post-docs: Application for Auditor or Permit to Attend (PTA) Status

Class presentation will be in, and students are encouraged to use, R, (with occasional reference to SAS, Mathematica, and Matlab).

1/7/09. NYTimes endorses R: Data Analysts Captivated by R's Power

We have a set of 4 computer labs to supplement lecture materials (weeks 2, 4, 6, 8).

Lab 2 has evolved in three pieces.

a. Lab2, exposition and commands provides a full write up (annotated) of the analyses

b. Lab 2, Rogosa R-session (nlme legacy version)

c. Lab2 (priority Rogosa session) redone using lme4, lmer (with additional plots) Lecture slide, lme lmer for Bryk data

For those who are strapped for time or otherwise saturated, I provide a full single Bryk dataset that skips over the data manipulation portion of the activity

Additional materials for HSB analyses are posted in Week 4 Lecture topics, sec 3(iii)

Lab3, exposition and commands

Lab 3, Rogosa R-session Mroz87 data description Lab3 posted 2/15/18

note: I triple-checked and the dataset is where the description indicates and

This lab is arranged in pieces

a. Lab4, exposition and commands posted

b. Lab 4, Rogosa R-session, Base (sections 1-3) posted

c. Lab 4, Rogosa R-session, additional matching exercises (incl secs 4-6) posted

d. Lab 4, Rogosa R-session: not done until ancova is run posted

Current version of R is R version 3.4.3. R version 3.4.3 (Kite-Eating Tree) has been released on 2017-11-30. (I'm currently running 3.3.3 I see). For references and software: The R Project for Statistical Computing Closest download mirror is Berkeley

The CRAN Task View: Statistics for the Social Sciences provides an overview of relevant R packages. Also of interest are CRAN Task View: Psychometric Models and Methods and CRAN Task View: Design of Experiments (DoE) and Analysis of Experimental Data

In prior fall qtrs I did short 5 week intro R-course intended for users of other statistical packages; see Ed401 page

Among the infinite number of introduction to R resources is John Verzani's page A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li. Another version

Even more stuff: According to Peter Diggle: "The best resource for R that I have found is Karl Broman's Introduction to R page." And a remarkably useful set of R-resources from Murray State

Wm. Revelle who develops the psych package also has a draft text which covers standard statistics plus specialized measurement topics (plus other R intros)

For those with a life sciences background a useful resource may be the book Analysis of epidemiological data using R and Epicalc and the Epicalc package.

For categorical data, especially if you've had a course using Agresti, the lengthy guide by Laura Thompson has more than you want to know.