Statistics 196A, Education 401D Spring 2022
Multilevel Modeling Using R
Spring 2022 Flipped Instruction
Instructor. David Rogosa
Sequoia 224, rag{AT}stanford{DOT}edu
Course web page: http://rogosateaching.com/stat196//
Course Welcome and Logistics (first day stuff, posted in February, call it Week0)
Lecture slides, week 0 (pdf)
Audio companion, week 0
For recreation of in-classroom experience, linked below are youtube versions of the music I play
before starting lecture and after lecture concludes. Some may wish to reverse that ordering.
To see full course materials from Spring 2021 go here
From explorecourses
STATS 196A (EDUC 401D): Multilevel Modeling Using R
See http://rogosateaching.com/stat196/ . Multilevel data analysis examples using R.
Topics include: two-level nested data, growth curve modeling, generalized linear models for counts and categorical data,
nonlinear models, three-level analyses.
Terms: Spr | Units: 1
STATS 196A | 1 units | Class # 15678 | Section 01 | Grading: Satisfactory/No Credit
03/28/2022 - 06/01/2022 Wed 2:45 PM - 4:45 PM at Sequoia Hall 200 with Rogosa, D. (PI)
Course Schedule and Structure, Flipped Instruction
Four (2hr) class lecture presentations to be provided as posted pdf slides plus audio companions.
Weekly postings scheduled for the beginning of each week.
Topics for Weeks 1 - 4.
a. Introduction: Basic analyses for two-level nested data, normal models (UK Exam data)
b. Additional two-level (normal) models: experimental designs (Dyestuff), longitudinal data (growth curves, sleepstudy),
observational data (High School and Beyond)
c. Generalized linear mixed models for counts and categorical outcomes
d. Three-level analyses (nested data and longitudinal data)
e. Specialized applications (as time permits): regression diagnostics, power calculations and design, ecological inference, survival analysis,
nonlinear functional forms, mediation analysis, propensity scores and matching, imputation, item response theory
In person component starting Wednesday April 6, 2:45 onward in Sequoia 200.
Health restrictions permitting, I will be in class on Wednesdays starting April 6
for recap and discussion of the weekly material and discussion of relevant student research activities and/or plans for the "Week 5" presentations.
Week 5 scheduled for May 4 in Sequoia 200. Student presentations of multilevel data analyses.
For the 1-unit enrollment in this mini-course, students are expected to engage in (i.e. consume) the four presentation class sessions.
For the fifth session, held this year on May 4 in Seq200, each student makes a short (~10 min) presentation.
Most often that presentation content is a relevant data analysis the has conducted.
In the remote asynchronous years of Spring 2021 and Spring 2020, students instead submitted the small project online (via rpubs or google drive).
Core Sources
Many of the example presented in this short course are described in Examples from Multilevel Software Comparative Reviews Douglas Bates.
Code version of MlmSoftRev
R-package containing mlmRev data examples. Bates talk on mlmRev U Bristol documentation
Additional examples in core package lme4
Another set of examples: lmer for SAS PROC MIXED Users Douglas Bates Department of Statistics University of Wisconsin Madison
Data sets from SAS System for Mixed Models
Overviews and additional examples from Doug Bates:
lme4: Mixed-effects modeling with R February 17, 2010 Springer (book chapters).
A merged updated version of Bates book lme4: Mixed-effects modeling with R May 2020
R Journal intro Fitting linear mixed models in R Using the lme4 package Douglas Bates (pp.27-30)
Collection of all Doug Bates lme4 talks
lme4 vignette: Douglas Bates Martin Machler Ben Bolker. Fitting linear mixed-effects models using lme4, Journal of Statistical Software
Technical topics: Mixed models in R using the lme4 package Part 4: Theory of linear mixed models
HSB and growth curve examples in John Fox lme tutorial
Another nice lmer exposition with life sciences examples: Mixed-effects models, Remko Duursma, Jeff Powell
Hawkesbury Institute for the Environment, Western Sydney University. September 2016. HIE Datasets
Current version of R is R version 4.1.2 (Bird Hippie) released on 2021-11-01.
For references and software: The R Project for Statistical Computing
Berkeley mirror is no longer, choose a mirror from the main R page (first link, I use TN).
A recent text (potentially) provides more infrastructure for this short course, but sadly it has many shortcomings.
This text has free access at Stanford via crcnetbase.com
Multilevel Modeling Using R http://www.crcpress.com/product/isbn/9781466515857
Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences Published:June 23, 2014 by CRC Press
Journal of Statistical Software Book Review Book website, including data
2022 Content, Flipped Instruction
Week 1
1. Introductory Example. Nested data, two-levels. Goldstein Exam Data.
Exam {mlmRev} UK HS data. Subset: Coed schools exam data, mixed schools, cleaned
a. Introductory descriptive approaches for gender gap analysis (Smart First Year Student analyses using lmList, additional plots).
b. Various lmer analyses for gender gap.
Gender gap data analysis: scanned class handout
Rogosa R-session basic plots models used
2020 isSingular fix
c. Residual plots, add-on regression diagnostics: packages HLMdiag , influence.ME
Rogosa session with Exam data (week 1) (ascii) resulting plots
d. more P-values, tests add-ons to lmer.
afex package with Exam data ggaplmer2
Faraway text addendums: Inferential Methods for Linear Mixed Models
e. Plus Plots for random, fixed effects.
2. Matrix Formulation for Mixed Effects Models (growth curves and nested data).
Week 2
1. Recap Introductory Example. Nested data, two-levels. Goldstein Exam Data.
merMod objects from lmer
Add-on package merTools merTools vignette
prediction with lmer : predict with lme4 predictInterval from merTools prediction vignette
Rogosa session. plots
2. Common/canonical two-level examples (measured outcome)
A. Growth Curve models and analysis. Bates Sleepstudy example (week2 Stat222).
Chap. 4 Bates book [more Doug Bates Slides (pdf pages 8-28) ]
Sleepstudy class handout, pdf scan Sleepstudy, 2018 clean ascii Individual plots (frame-by-frame) Plot of straight-line fits
Reduced/constrained models: growth curve example
B. Two-level nested (normal) data recap. Brief overview of HSB (High School and Beyond) analysis (from Stat209): plots and model.
a full single Bryk dataset (longform) (abbreviated) Rogosa R-session Bryk data plots, Rogosa R-session
Caution froma prior year: side-by-side boxplot creation and lmList subset issue
3. Growth Curve Modeling exercise, Brain Volume Data Analysis.
analyses from "Variation in longitudinal trajectories of regional brain volumes of healthy men and women (ages 10 to 85 years) measured with atlas-based parcellation of MRI" cartoon plot of Lateral Ventricles data; actual data plot of Lateral Ventricles data; development of lmer (mixed effect) growth models
4. Data from designed experiments.(basics).
a. Dyestuff data, Bates book, Chapter 1 (sec 1.2, 1.3) Rogosa Dyestuff session
b. Penicillin data (also Pastes, ratbrain), Bates book, Chapter 2.
  From Doug Bates presentation Rogosa R-session
Random effects anova recap (see Bates book Chap1, Chap2).
Week 3
Main topic: Generalized Linear Mixed Models: counts and proportions.
nice overview: Generalized Linear Mixed Models from Encyclopedia of Statistics in Behavioral Science.
Github GLMM Faq (Ben Bolker)
1. Dichotomous outcomes, glmer analysis examples.
a. Respiratory clinical trial from HSAUR. lmList does logistic, introducing glmer lmList, glmer for respiration data (placebo group)
b. Contraception (Bangladesh) use from Bates review Rogosa R-session glmer model slide
c. Test scores (pass/fail outcome) from Ch 8, Multilevel Modeling Using R. Rogosa R-session
2. Count outcome, GLMM poisson models.
a. Count data: Contagious bovine pleuropneumonia, data(cbpp) in lme4.
Rogosa R-session herd plots
b. Factorial design, Count outcome. From HIE Sydney. EucFACE ground cover data Rogosa glmer session.
c. Another count data example from mlmRev package,
data(Mmmec): Malignant Melanoma Mortality in the European Community associated with the impact of UV radiation exposure.
Rogosa glmer session more Rogosa Mmmec session
glmer.nb Fits a generalized linear mixed-effects model (GLMM) for the negative binomial family, building on glmer, and initializing via theta.ml from MASS.
3. hglm -- a different package for fitting hierarchical generalized linear models.
R Journal December 2010. manual vignette
Addendum. A little more on Overdispersion in Generalized Linear Mixed Models
pdf slides companion audio Rogosa R-session
Resources: Basic Overdispersion Overview: Overdispersion, and how to deal with it in R
Overdispersion in Github GLMM Faq (Ben Bolker)
R-package RVAideMemoire function overdisp.glmer man page 1 2
Week 4
1. Three-level (and above) lmer examples.
Measured outcomes
a. Achieve data from Multilevel Modeling Using R book. Rogosa R-session
b. example from mlmRev package, data(Chem97): Scores on A-level Chemistry in 1997. Rogosa R-session
Count outcome.
c. data(Mmmec): Malignant Melanoma Mortality in the European Community associated with the impact of UV radiation exposure.   Rogosa 3-level session
2. Missing Data and Imputation Methods for Multilevel Data Analysis
Vignette: Analyzing Imputed Data with Multilevel Models and merTools
Rogosa R-session for vignette
Missing data wide-form imputation: mice multiple regression example, nhanes data in package mice R-session using mice package
Missing data background. Multiple Imputation.
Nhanes data example (mice primer) in van Buuren S and Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67.
See also Flexible Imputation of Missing Data. Stef van Buuren Chapman and Hall/CRC 2012. Chapter 9, Longitudinal Data Sec 3.8 Multilevel data. He is the originator of mice book extras
R resources. Missing Data Task View, esp packages mice
New package: hmi: hierarchical multiple imputation, vignette
Addendum. Study Design (power) for Mixed Models
Most activity and resources are for longitudinal designs (c.f Week2, topic 2A): How often and How many subjects?
In my Stat222 course longitudinal experimental design is taken up in Week 5, Lecture topic 4.
Audio for the topic starts at partc that week, with the lecture content for design starting at pdf page 93.
Resources for that Stat222, Week5 content Power Calculations for Longitudinal Group Comparsions.
R-package longpower Vignettes found by "browseVignettes(package = "longpower")" . Functions in MBESS package--ss.power.pcm.
R-package: powerlmm
Background pubs: Power for linear models of longitudinal data
with applications to Alzheimer's Disease Phase II study design Michael C. Donohue, Steven D. Edland, Anthony C. Gamst
Sample Size Planning for Longitudinal Models:
Accuracy in Parameter Estimation for Polynomial Change Parameters Ken Kelley Notre Dame Joseph R. Rausch Psychological Methods 2011
Additional Resources:
basic R analogues, power.t.test power.anova.test
Power calculations and Design packages;
Package pamm Title Power Analysis for Random Effects in Mixed Models
Package simr Title Power Analysis for Generalised Linear Mixed Models by Simulation.
Package MultiRR Title Bias, Precision, and Power for Multi-Level Random Regression
more powerlmm
intro: Introducing 'powerlmm' an R package for power calculations for longitudinal multilevel models
vignette: Power Analysis for Two-level Longitudinal Models with Missing Data
more simulation approaches to mixed-model power calculations:
Power for Multilevel Analysis Power Analysis in R for Multilevel Models
Simulation approaches for mixed-model power calculations