Statistical Methods for Group Comparisons and Causal Inference

previous title: Understanding Statistical Models and their Social Science Applications

rag {AT} stat {DOT} stanford {DOT} edu

course web page at http://web.stanford.edu/~rag/stat209/

To see full course materials from Winter 2014 go here

Instructor. David Rogosa, Sequoia 224, rag {AT} stanford {DOT} edu .

Office hours T 2:30-3:15.

TA Wenfei Du Office hour Fri 10-12:00, room 221 wdu {AT} stanford {DOT} edu

Description Critical examination of statistical methods in social science and life sciences applications, especially for cause and effect determinations. Topics include: matching and propensity score methods, analysis of covariance, instrumental variables, compliance, path analysis, multilevel models, longitudinal data, mediating and moderating variables. Prerequisite: intermediate-level statistical methods

For students who have had intermediate-level instruction in statistical methods including multiple regression, logistic regression, log-linear models.

At the very least, the content of the course should provide some consolidation of previous instruction in statistical methods.

The goal is also to instill some introspection and critical analysis for the uses of statistical methods common in social science and medical applications, especially for observational studies.

The focus of the course is on understanding what useful information statistical modeling can provide in experimental and especially non-experimental social science settings.

Week 1. Course Introduction; properties of regression models Week 2. Experiments vs observational studies; Neyman-Rubin-Holland formulation Week 3. Path analysis and causal modeling, multiple regression with pictures. Graphical models. Week 4. Multilevel data. Contextual effects, aggregation bias, random effects models Week 5. The many uses and forms of analysis of covariance (including regression discontinuity designs) Week 6. Instrumental variable methods, simultaneous equations, reciprocal effects Week 7. Compliance and experimental protocols; encouragement designs; intent to treat Week 8. Matching and propensity score methods Week 9. Time-1, Time-2 group comparisons for experimental and non-experimental designs: Dead Week. Overflow and course summary.

The course was created around David Freedman's text, and covers that material using auxiliary texts and online materials.

One intent of this course is for students to read some statistical literature and actual research reports to augment the texts (on that theme Freedman's text actually includes reprints of four published empirical research papers which are also available through Jstor).

Primary resource for R and data analysis.

Text resource page UCLA DAAG page R-packages for Text Data Sets etc R-Package DAAG R-Package DAAGxtras

Auxiliary texts, also on reserve at Math/CS library.

Jan de Leeuw, Preface to Berk's "Regression Analysis: A Constructive Critique"

David Freedman

Weekly homework assignments following class content will be posted, with solutions posted the next class cycle. Homeworks are not graded.

TH1 covering content weeks 1-4.

TH2 covering content weeks 5-8.

In class exam,

See also class calendar

The Registrar does have a form (no-fee) for faculty, staff, post-docs: Application for Auditor or Permit to Attend (PTA) Status

Class presentation will be in, and students are encouraged to use, R, (with occasional reference to SAS, Mathematica, and Matlab).

1/7/09. NYTimes endorses R: Data Analysts Captivated by R's Power

We have a set of 4 computer labs to supplement lecture materials (weeks 2, 4, 6, 8).

Lab 2 has evolved in three pieces.

a. Lab2, exposition and commands provides a full write up (annotated) of the analyses

b. Lab 2, Rogosa R-session (nlme legacy version)

c. Lab2 (abbreviated version) using lme4, lmer (with additional plots) Lecture slide, lme lmer for Bryk data

For those who are strapped for time or otherwise saturated, I provide a full single Bryk dataset that skips over the data manipulation portion of the activity

Lab 2 posted 1/30/15

Lab3, exposition and commands

Lab 3, Rogosa R-session Mroz87 data description Lab3 posted 2/14/15

note: I triple-checked and the dataset is where the description indicates and

This lab is arranged in pieces

a. Lab4, exposition and commands posted 2/27/15

b. Lab 4, Rogosa R-session, Base (sections 1-3) posted 2/27/15

c. Lab 4, Rogosa R-session, additional matching exercises (incl secs 4-6) posted 2/27/15

d. Lab 4, Rogosa R-session: not done until ancova is run posted 2/27/15

Current version of R is R version R 3.1.2 (Pumpkin Helmet) release October 31, 2014. . For references and software: The R Project for Statistical Computing Closest download mirror is Berkeley

The CRAN Task View: Statistics for the Social Sciences provides an overview of relevant R packages. Also of interest are CRAN Task View: Psychometric Models and Methods and CRAN Task View: Design of Experiments (DoE) and Analysis of Experimental Data

This past fall qtr I did short 5 week intro R-course intended for users of other statistical packages; see Ed401 page:

Among the infinite number of introduction to R resources is John Verzani's page A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li. Another version

Even more stuff: According to Peter Diggle: "The best resource for R that I have found is Karl Broman's Introduction to R page." And a remarkably useful set of R-resources from Murray State

Wm. Revelle who develops the psych package also has a draft text which covers standard statistics plus specialized measurement topics (plus other R intros)

For those with a life sciences background a useful resource may be the book Analysis of epidemiological data using R and Epicalc and the Epicalc package.

An additional R resource that is efficient if you are experienced with another statistical package is a presentation An Introduction to R, John Verzani For categorical data, especially if you've had a course using Agresti, the lengthy guide by Laura Thompson has more than you want to know.