Statistics 222,   Education 351A  Autumn 2018
    Statistical Methods for Longitudinal Research

David Rogosa Sequoia 224,   rag{AT}stat{DOT}stanford{DOT}edu   
       Office Hours after class (plus additional TBA)
Course web page: http://rogosateaching.com/stat222//



                To see full course materials from Autumn 2017 go here
Registrar's information
STATS 222 (Same as EDUC 351A): Statistical Methods for Longitudinal Research   Units: 2-3
Lecture  Th 3:00PM - 5:15PM  Sequoia 200
Rogosa Office Hour: 5:15 - 5:50PM,  Sequoia 224 
Grading Basis: Letter or Credit/No Credit

Course Description:
 STATS 222: Statistical Methods for Longitudinal Research (EDUC 351A)
Research designs and statistical procedures for time-ordered (repeated-measures) data. 
The analysis of longitudinal panel data is central to empirical research on learning, development, aging, and the effects of interventions. 
Topics include: measurement of change, growth curve models, analysis of durations including survival analysis, 
experimental and non-experimental group comparisons, reciprocal effects, stability. 
See http://rogosateaching.com/stat222/. Prerequisite: intermediate statistical methods
Terms: Aut | Units: 2-3 | Grading: Letter or Credit/No Credit
Instructors: Rogosa, D. (PI) 


Preliminary Course Outline
    Week 1. Course Overview, Longitudinal Research; Analyses of Individual Histories and Growth Trajectories
    Week 2. Introduction to Data Analysis Methods for assessing Individual Change for Collections of Growth Curves (mixed-effects models)
    Week 3. Analysis of Collections of growth curves: linear, generalized linear and non-linear mixed-effects models
    Week 4. Special case of time-1, time-2 data; Traditional measurement of change for individuals and group comparisons
    Week 5. Assessing Group Growth and Comparing Treatments: Traditional Repeated Measures Analysis of Variance and Linear Mixed-effects Models
    Week 6. Comparing group growth continued: Power calculations, Cohort Designs, Cross-over Designs, Methods for missing data, Observational studies.
    Week 7. Analysis of Durations: Introduction to Survival Analysis and Event History Analysis
    Weeks 8-9. Further topics in analysis of durations: Diagnostics and model modification; Interval censoring, Time-dependence, Recurrent Events, Frailty Models, Behavioral Observations and Series of Events (renewal processes)
    Dead Week. Assorted Special Topics (enrichment) and Overflow (weeks 1-8): Assessments of Stability (including Tracking), Reciprocal Effects, (mis)Applications of Structural Equation Models, Longitudinal Network Analysis

Texts and Resources for Course Content
1. Garrett M. Fitzmaurice Nan M. Laird James H. Ware Applied Longitudinal Analysis (Wiley Series in Probability and Statistics; 2nd ed 2011)
  Text Website   second edition website     Text lecture slides   
2. Judith D. Singer and John B. Willett . Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence New York: Oxford University Press, March, 2003.
  Text web page    Text data examples at UCLA IDRE    Powerpoint presentations   good gentle intro to modelling collections of growth curves (and survival analysis) is Willett and Singer (1998)
3. Douglas M. Bates. lme4: Mixed-effects modeling with R  February 17, 2010 Springer (chapters). A merged version of Bates book: lme4: Mixed-effects modeling with R January 11, 2010 has been refound
Manual for R-package lme4    and   mlmRev, Bates-Pinheiro book datasets.    
    Additional Doug Bates materials. Collection of all Doug Bates lme4 talks      Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Douglas Bates 8th International Amsterdam Conference on Multilevel Analysis 2011-03-16    another version
Original Bates-Pinheiro text (2000).  Mixed-Effects Models in S and S-PLUS (Stanford access). Appendix C has non-linear regression models.
Fitting linear mixed-effects models using lme4, Journal of Statistical Software Douglas Bates Martin Machler Ben Bolker.       Technical topics: Mixed models in R using the lme4 package Part 4: Theory of linear mixed models
4. A handbook of statistical analyses using R (second edition). Brian Everitt, Torsten Hothorn CRC Press, Index of book chapters   Stanford access     Longitudinal chapters: Chap11   Chap12  Chap13. Data sets etc Package 'HSAUR2' August 2014, Title A Handbook of Statistical Analyses Using R (2nd Edition)
   There is now a third edition of HSAUR, but full text not yet available in crcnetbase.com.    CRAN HSAUR3 page  with Vignettes (chapter pieces) and data in reference manual
5. Peter Diggle , Patrick Heagerty, Kung-Yee Liang , Scott Zeger. Analysis of Longitudinal Data 2nd Ed, 2002
   Amazon page     Peter Diggle home page    Book data sets
     A Short Course in Longitudinal Data Analysis Peter J Diggle, Nicola Reeve, Michelle Stanton (School of Health and Medicine, Lancaster University), June 2011     earlier version    associated exercises:  Lab 1  Lab2  Lab3
6. Longitudinal and Panel Data: Analysis and Applications for the Social Sciences by Edward W. Frees (2004). Full book available    and book data and programs (mostly SAS).
7. Growth Curve Analysis and Visualization Using R. Daniel Mirman Chapman and Hall/CRC 2014 Print ISBN: 978-1-4665-8432-7    Stanford Access       Mirman web page (including data links).
8. Longitudinal Data Analysis    Edited by Geert Verbeke , Marie Davidian , Garrett Fitzmaurice , and Geert Molenberghs Chapman and Hall/CRC 2008.   online supplement for LDA book  .
9. Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. New-York: Springer.  Extended presentation: Introduction to Longitudinal Data Analysis A shorter exposition: Methods for Analyzing Continuous, Discrete, and Incomplete Longitudinal Data
10. Survival analysis Rupert G. Miller. Available as Stanford Tech Report
11. Event History Analysis with R (Stanford access). Goran Brostrom CRC Press 2012. R-package   eha
12. John D. Kalbfleisch , Ross L. Prentice The Statistical Analysis of Failure Time Data 2nd Ed
  Amazon page    online from Wiley
13. Advanced survival analysis topics.
   Interval-Censored Time-to-Event Data Methods and Applications Chapman and Hall/CRC 2012 (esp Chap 14--glrt).
   Recurrent Events: Chapter 9 of Kalbfleisch and Prentice (2nd edition), "Modeling and Analysis of Recurrent Event Data".
      Cook, R. J. and Lawless, J. F. (2007).  The Statistical Analysis of Recurrent Events. (Stanford access) Springer, New. York.
    Joint Models for Longitudinal and Time-to-Event Data. With Applications in R. Dimitris Rizopoulos. Chapman and Hall/CRC 2012(Stanford access)    Book website

Additional Specialized Resources
Harvey Goldstein. The Design and Analysis of Longitudinal Studies: Their Role in the Measurment of Change (1979). Elsevier
  Amazon page    Goldstein Chap 6 Repeated measures data      Multilevel Statistical Models by Harvey Goldstein with data sets   
David Roxbee Cox, Peter A. W. Lewis The statistical analysis of series of events. Chapman and Hall, 1966
  Google books    Poisson process computing program
David J Bartholomew. Stochastic Models for Social Processes, Chichester 3rd edition: John Wiley and Sons.
   David J Bartholomew web page


Grading, Exams, and Credit Units
Stat222/Ed351A is listed as Letter or Credit/No Credit grading (Stat MS students should check whether S/NC is a viable option for their degree program.)
Grading (for the 2-unit base) will be based on two components:
  Each week I will post a few exercises for that week's content--towards the end of the qtr I'll identify a subset of those exercises to be turned in.
  During the Autumn qtr exam period we will have an in-class (all materials available, "open" everything) exam.
My reading of the Registrar's chart indicates    Tuesday, December 11, 2018   3:30-6:30 p.m.  Location: Sequoia 200 (Statistics).
           see Class Calendar for details
The Registrar requires clear identification of the requirements for incremental units. The additional requirement for a 3-unit registration (the one unit above 2-units) is satisfied by a student presentation: a mini-lecture, approximately 15 minutes with handout. These are done with Rogosa in Sequoia 224, which has worked out well. Good topics would include empirical longitudinal research, such as a data set or set of studies you are involved with, or an extension of class lecture topics such as preparing an additional data analysis example or a report on some technical readings. Discussion with Rogosa is encouraged.

Note to auditors. The Registrar does have a form (no-fee) for faculty, staff, post-docs: Application for Auditor or Permit to Attend (PTA) Status   

Course Problem Set 2018    to be posted xxx
  
Cumulative Collection of Course Handounts 2018 to be posted Dec 2018

Statistical computing
Class presentation will be in, and students are encouraged to use, R (occasionally, some references to SAS and Mathematica).
Current version of R is R version 3.5.1 (Feather Spray) released 2018-07-02.
    For references and software: The R Project for Statistical Computing   Closest download mirror is Berkeley
The CRAN Task View: Statistics for the Social Sciences provides an overview of some relevant R packages. Also the new CRAN Task View: Psychometric Models and Methods and CRAN Task View: Survival Analysis and CRAN Task View: Computational Econometrics.
A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li.   Another version
A Stat209 text, Data analysis and graphics using R (2007) J. Maindonald and J. Braun, Cambridge 2nd edition 2007. 3rd edition 2010   has available a short version in CRAN .
According to Peter Diggle: "The best resource for R that I have found is Karl Broman's Introduction to R page."



                  Course Content: Files, Readings, Examples

9/27. First class: Course Overview, Analysis of Individual Trajectories.
In the news
1. Now they tell us.... Daily aspirin may be harmful for healthy older adults, large study finds     Publication: Effect of Aspirin on Disability-free Survival in the Healthy Elderly   and related articles. The New England Journal of Medicine nejm.org September 2018.
2. From 2017   Sedentary behavior can cause death (Daily Mail).  Publication: Patterns of Sedentary Behavior and Mortality in U.S. Middle-Aged and Older Adults: A National Cohort Study. Ann Intern Med. 2017.


Lecture Topics
A.    Initial meet-and-greet. Class logistics and longitudinal research overview
B.     Examples, illustrations for longitudinal research overview, taken from course resources above:
         Laird,Ware (#1) slides 1-16;    Diggle (#5) slides 4-14, 22-28    Verbeke (#9) slides from Ch 2 and Sec3.3
C.     Data Analysis Examples of Model Fitting for Individual Trajectories and Histories.
    Motto: Individual trajectories are the proper starting point for longitudinal data analysis
         ascii version of class handout     annotated version       pdf version with plots     datasets
               Starting up R-addendum: installing packages and obtaining data (sleepstudy in lme4)
  Additional materials for the trajectory examples
            For Count Data (glm) example. Link functions for generalized linear mixed models (GLMMs), Bates slides (pdf pages 11-18)
     AIDS in Belgium example, (from Simon Wood) single trajectory, count data using glm. Rogosa R session for aids data
        aditional expositions of AIDS data, Poisson regression:  Duke   Kentucky
    A very comprehensive introduction to analysis of count data Regression Models for Count Data in R Achim Zeileis Christian Kleiber Simon Jackman (Stanford University)
        Non-linear models, esp logistic. From week 1, also week 3 Self-Starting Logistic model      SSlogis help page, do ?SSlogis   post of annotated logistic curve with SSlogis arguments   
           Trend in Proportions: College fund raising example     prop.trend.test help page ?prop.trend.test in R-session.
          Trend in proportions, group growth, Cochran-Armitage test. Expository paper: G. Salanti and K. Ulm (2003): Tests for Trend in Binary Response (SU access)


WEEK 1 Review Questions
1. For the straight-line (constant rate of change) fit example to subj 372 in the sleepstudy data. Obtain a confidence interval for the rate of change from the OLS fit. Now compare the OLS fit with day-to-day differences. Under the constant rate of change model these 9 day to day differences also estimate the rate of change. Obtain a estimate of the mean and a confidence interval for rate of change from these first differences. Compare with OLS results.
Solution for question 1
2. Revisit the Belgium Aids data example (counts of new cases by year). Use the parameter estimates for am2 (quadratic in time glm fit) to compute by hand (or calculator) the values of the glm fit at year = 5 and year = 9. Compare those values with results from the model am2 using predict
Solution for question 2
3. Paul Rosenbaum has a little data set on growth in vocabulary that I grabbed from his Wharton coursesite. Following the chicks class example, plot these data and try to fit a logistic growth curve to these data. What is the estimate of the final vocab level (asymptote)? Compare the data and the fits from the logistic growth curve.
For reference,       Self-Starting Logistic model      SSlogis help page, do ?SSlogis   post of annotated logistic curve with SSlogis arguments       additional tools in the grofit package
Solution for question 3
4. More on autocorrelation[extension/enrichment].   In standard regression courses you may have seen in addition to Durbin-Watson test for AR(1) (dwtest()), versions of the Cochrane-Orcutt procedure for remediation. Uses a first difference transformation of the data with an estimate of the autocorrelation (therefore hopeless when you have 3,4 5 observations per unit). To illustrate the statements in class and the similarities to OLS result, the solution to this problem does the straight-line and polynomial examples from the Week 1 class handout using the R-package orcutt
Solution for question 4
WEEK 1 Exercises
1. Straight-line fits for NC Fem data: North Carolina Achievement Data (see Williamson, Applebaum, Epanchin, 1991). These education data are eight yearly observations on achievement test scores in math (Y), for 277 females each followed from grade 1 to grade 8, with a verbal ability background measure (W).
North Carolina, female math performance (also in Rogosa-Saner)    North Carolina data (wide format);         NC data (long)
a. Here we will use the 8 yearly observations on female ID 705810, which you can obtain from either the long form or wide form of these data.
For that female, what is the rate of improvement over grades 1 through 8? Compare the observed improvement for grades 1 through 8 (the difference score) with the amount of improvement indicated by the model fit. Obtain a 95% confidence interval for each (if possible).
b. More on OLS and the difference score. Refer to an old publication: A growth curve approach to the measurement of change. Rogosa, David; Brandt, David; Zimowski, Michele Psychological Bulletin. 1982 Nov Vol 92(3) 726-748 APA record   direct link;  Equation 4, page 728, shows a useful form for the OLS slope. (actually reading the first three pages of that pub is a decent intro to the growth curve topic.) For equally spaced data, that Eq (4) gives a useful equivalence between difference scores (amounts of change) and OLS slopes (multiply rates of change by time interval). For the part a NC data show that the OLS slope can be expressed as a weighted sum of the four differences: { 8-1,7-2,6-3,5-4}. [to say that better {score at time 8 minus score at time 1; score at time 7 minus score at time 2; ...} and so forth]
Seperately, consider three observations at taken at equally spaced time intervals: What is a simple expression for the OLS slope (rate of change)?

2. Revisit the Berkeley Growth Data example from week 1 lecture. Consider the quadratic (polynomial degree 2) fit to these data, and also a (innapropriate?) constant-rate-of-change (straight-line) fit to these data. Then refer to Seigel, D. G. Several approaches for measuring average rates of change for a second degree polynomial. The American Statistician, 1975, 29, 36-37. JStor Link for equivalences for the slope of the straight-line fit to an average rate of change for the quadratic fit. Compare Seigel 'Approach 3" to 'Approach 1'.


10/4. Analysis of collections of growth curves (Mixed-effects Models, lmer)   Constant rate of change models

Lecture Topics. Analyses of collections of growth curves.
1. Plots, description and SFYS (smart first year student) analyses.
2. Mixed effects models using lmer .     Growth modelling handout

Class Examples
1. Data frame sleepstudy available in lme4 package.
           Music to accompany long-distance truck driver data: 1971 The Flying Burrito Brothers "Six Days on the Road"
   a. Published Treatments, Sleepstudy example
Source Publication: Belenky, G., Wesensten, N. J., Thorne, D. R., Thomas, M. L., Sing, H. C., Redmond, D. P., Russo, M., & Balkin, T. (2003). Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: A sleep dose-response study. Journal of Sleep Research, 12(1), 1-12.
Sleepstudy data analysis from Doug Bates lme4 book lme4: Mixed-effects modeling with R February 17, 2010 (draft chapters) Chapter 4: Sleepstudy example or a set of Bates slides for the sleepstudy example
Why lmer (lme4) does not provide p-values for fixed effects : Doug Bates    lmer, p-values and all that    There are a number of add-on packages.(see Review Question 1)
   b. Class Materials, Sleepstudy example
     Individual plots (frame-by-frame)     Plot of straight-line fits     Initial descriptive analyses (SFYS)
        Sleepstudy, Bates Ch 4, lme4 analyses, ascii      Sleepstudy class handout, pdf scan
     more Doug Bates Slides (pdf pages 8-28)    
2.    North Carolina, female math performance (also in Rogosa-Saner)
   North Carolina data (wide format);     NC data (long)    plots for NC data
       Data formatting: wide to long    North Carolina data (wide format);     making the "Long" version   
     North Carolina example. wide-form descriptives, background, plots         Initial SFYS analyses of NC data, ascii
     Model Comparisons for North Carolina, female math performance     ascii version      NC class handout, pdf scan      model ncCon2 without redundent model term      NC bootstrap results (SAS)

3. Brain Volume Data, in-class modeling exercise: analyses from "Variation in longitudinal trajectories of regional brain volumes of healthy men and women (ages 10 to 85 years) measured with atlas-based parcellation of MRI"     cartoon plot of Lateral Ventricles data;     actual data plot of Lateral Ventricles data;    development of lmer (random effect) growth models


Background and Resources
Technical Formulation and extensions
Estimation in lmer.
Fitting linear mixed-effects models using lme4, Journal of Statistical Software Douglas Bates Martin Machler Ben Bolker    also Rnews_2005 pp.27-30
Bates book, Chapter 5, Computational Methods.   Bates talk slides:    Mixed models in R using the lme4 package Part 4: Theory of linear mixed models
Extensions and Alternatives, lmer.
Plots and diagnostics:   Package Influence.me    RJournal intro    Package merTools  An Introduction to merTools   Also, Prediction Intervals
Non-Gaussian modelling. Hierarchical Generalized Linear Models, Package hglm   Hierarchical Generalized Linear Models, R Journal December 2010.
Extensions of lme4 modeling: Package npmlreg Nonparametric Maximum Likelihood (NPML) estimation;
Package robustlmm: An R Package for Robust Estimation of Linear Mixed-Effects Models
    Package RLRsim Title Exact (Restricted) Likelihood Ratio Tests for Mixed and Additive Models

Data Examples North Carolina Data also in (with full development of the modelling) Longitudinal Data Analysis Examples with Random Coefficient Models. David Rogosa; Hilary Saner . Journal of Educational and Behavioral Statistics, Vol. 20, No. 2, Special Issue: Hierarchical Linear Models: Problems and Prospects. (Summer, 1995), pp. 149-170. Jstor
Douglas Bates class resource item #3, Texts and Resources. Other Doug Bates materials: Three packages, "SASmixed", "mlmRev" and "MEMSS" with examples and data sets for mixed effect models
North Carolina Data also in (with full development of the modelling) Longitudinal Data Analysis Examples with Random Coefficient Models. David Rogosa; Hilary Saner . Journal of Educational and Behavioral Statistics, Vol. 20, No. 2, Special Issue: Hierarchical Linear Models: Problems and Prospects. (Summer, 1995), pp. 149-170. Jstor    Data sets for Rogosa-Saner
Additional talk materials: An Assortment of Longitudinal Data Analysis Examples and Problems 1/97, Stanford biostat.      Overview and Implementation for Basic Longitudinal Data Analysis CRESST Sept '97.    Another version (short) of the expository material is from the Timepath '97 (old SAS progranms) site: Growth Curve models ;    Data Analysis and Parameter Estimation ; Derived quantities for properties of collections of growth curves and bootstrap inference procedures

WEEK 2 Review Questions
1. More sleepstudy. Confidence interval and p-values. Add on, extension to class example.
I start by fitting the lmer model for the collection of growth curves: sleeplmer = lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy).
Then try out confint from lme4 (link to manual using likelihood profile or bootstrap methods.
Then look at the pvalues entry in the manual and try out add-on packages, esp for p-values for the fixed effects.       
 Solution for Review Question 1       2017 redo/update using 3.3.3 (barebones)
2. Ramus Data example. Example consists of 4 longitudinal observations on each of 20 cases. The measurement is the height of the mandibular ramus bone (in mm) for boys each measured at 8, 8.5, 9, 9.5 years of age. These data, which have been used by a number of authors (e.g., Elston and Grizzle 1962), can be found in Table 4.1 of Goldstein (1979).      Ramus data example      long form for Ramus data   tutorial on creating long form data manipulation   and   2017 redo/check of widetolong.     Use lmList to obtain the 20 OLS fits, with the initial time set to 8 years of age, i.e. intercepts are fits for the time of initial measurement (not t=0). Fit the lmer model for the collection of growth curves (using initial time = 8); verify that fixed effects are the sample means (over persons) of the lmList intercepts and slopes. Verify that the random effects variance for "age" (i.e. slopes) is the method-of-moments estimate for Var(theta). Compare the random effect estimates (ranef) which borrow strength for each subject with the OLS estimates from lmList (c.f. Bates Chap 4 discussion of sleepstudy data)       
 Solution for Review Question 2
3.   Artificial data example (used in Myths chapter to illustrate time-1,time-2 data analysis)    Two part artificial data example.   The bottom frame (the X's) is 40 subjects each with three equally spaced time observations (here in wide form).For these the fallible "X" measurements (constructed by adding noise to the Xi measurements). Follow the class examples 'wide-to-long' and obtain the plot showing each subject's data and straight-line fit. Use lmList to obtain the 40 slopes for the straight-line fits.       
 Solution for Review Question 3
4. More with North Carolina data
a. identify the fastest and slowest growth among the 277 females. Compare medians of growth rates for females with verbal ability (Z) at or above 106 with that for females with verbal ability below 106. Show side-by-side boxplots.
b. In the class handout version of the NC analyses (and other postings, but not all) the first thing to do was make the 'time' variable have intitial value = 0 (making the intercept of a straight line fit correspond to level at initial time): i.e. 1 to 8 becomes 0 to 7. Obtain lmList results and fit the ncUnc lmer model (straight-line growth, no Z) using time 1 to 8. Comment on differences of these analyses with those using timeInt in the class handout. In particular, look at the correlation of change and initial status. The correlation between observed change and observed initial status using timeInt was .279 from lmer (Correlation of Fixed Effects) and also from lmList (you should confirm that). What is the result you obtain using time rather than timeInt? The mle of of the correlation of 'true' change and 'true' initial status is  .651 using timeInt. What do you obtain using time?.       
 Solution for Review Question 4

5. xyplot with large sample sizes.
North Carolina data has 277 subjects, a frame-by-frame display of individuals requires subsampling. Construct a plot for 24 (arbitrary) individuals data trajectories.       
 Solution for Review Question 5


WEEK 2 Exercises
1. Tolerance data [note: 10/12/17 data location updated]
A subsample of data from the National Youth Survey is obtained in long-form by
read.table("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/tolerance1_pp.txt", sep=",", header=T)
and in wide form by
read.table("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/tolerance1.txt", sep=",", header=T)
Yearly observations from ages 11 to 15 on the tolerance measure (tolerance to deviant behavior e.g. cheat, drug, steal, beat; larger values indicates more tolerance on a 1to4 scale). Also in this data set are gender (is_male) and an exposure measure obtained at age 11 (self report of close friends involvement in deviant behaviors). note: the time measure is age - 11.
i. obtain individual OLS fits (tolerance over time) and plot the collection of those straight-lines. Provide descriptive statistic summaries for the rate of change in tolerance and initial level.
ii. fit a mixed effects model for tolerance over time (unconditional) for this collection of individuals. Obtain interval estimates for the fixed and random effects. Show that the fixed effects estimates correspond to quantities obtained in part i. Explain.
iii. Investigate whether the exposure measure is a useful predictor of level or rate of change in tolerance. What appears to be the best fitting mixed model for these data using these measures? Show specifics.
2. lmer/lme vs lm
 Consider the sleepstudy and Ramus examples, collections of growth trajectories with no exogenous variable. Ramus Data example. Example consists of 4 longitudinal observations on each of 20 cases. The measurement is the height of the mandibular ramus bone (in mm) for boys each measured at 8, 8.5, 9, 9.5 years of age. These data, which have been used by a number of authors (e.g., Elston and Grizzle 1962), can be found in Table 4.1 of Goldstein (1979).      Ramus data example      long form for Ramus data   tutorial on creating long form data manipulation .   
  Fitting the lmer models with Formula: Reaction ~ 1 + Days + (1 + Days | Subject) or Formula: ramus ~ I(age - 8) + (age | subj) has motivated the student question, what is going on here beyond what lm would do? So let's look at what lm would do in these examples. Verify (or disprove) the assertion that the fixed effects from lmer, which we have seen are the averages of the individual fit parameter estimates (i.e. lmList), and therefore the coefficients of the average growth curve are identical to the fit from lm (which ignores the existence of individual trajectories). Compare the results of the lm and lmer analyses for these two data sets.
3. Early Education data (From Bates and Willett-Singer).
Data on early childhood cognitive development described in Doug Bates talk materials (pdf pages 49-52). Obtain these data from the R-package "mlmRev" or the Willett-Singer book site (in our week 1 intro links). Data are in long form and consist of 3 observations 58 treatment and 45 control children; see the Early entry in the mlmRev package docs. Produce the plot of individual trajectories shown pdf p.49, Bates talk. (note:Bates does connect-the-dots, we have done straight-line fit, your choice). Show five-number summaries of rates of impovement in cognitive scores for treatment and control groups. Develop and fit the fm12 lmer model shown in Bates pdf p.50 (note fm12 allows trt to effect rates of improvement but not level;). Interpret results. Note: this moves us into the comparing groups topics, where the individual attribute is group membership.
4.   Standardizing is always a bad idea is a good motto for life, especially with longitudinal data.
   Artificial data example from Review Question 3 (used in Myths chapter to illustrate time-1,time-2 data analysis)    Start out with the "X" data, and standardize (i.e. transform to mean 0, var 1) at each of the 3 time points. Note "scale" will do this for you (in wide form). For the standardized data obtain the plot showing each subject's data and straight-line fit. What do you have here? Compare the results the mixed-effects models fitting the collection of straight-line growth curves for the measured and standardized data.


10/11.  Collections of growth curves continued: linear and non-linear mixed-effects models

In the news
Study confirms link between violent video games and physical aggression      Publication: Metaanalysis of the relationship between violent video game play and physical aggression over time.   PNAS October 2, 2018 115 (40) 9882-9888.

Lecture Topics
1. Review model formulation, North Carolina example (week 2 handouts),
a. plotting residuals ,   ascii session.
b. lm and gee alternatives (ignore individual growth). ascii session
2. General formulation of mixed effects model in terms of growth trajectories pdf pages 7-8, handout in An Assortment of Longitudinal Data Analysis Examples and Problems , Stanford biostat (pp.7-8).   Also Ware-Laird ALA slides 234-240.    also resource item #3, Douglas M. Bates lme4: Mixed-effects modeling with R, section 1.4
3. Individual effects: fixed and random (and BLUPs).    sleepstudy Rsession    session plots
4. lmList does logistic (respiration data week5); introducing glmer      lmList, glmer for respiration data
5. Brain Volume Data, in-class modeling exercise: analyses from "Variation in longitudinal trajectories of regional brain volumes of healthy men and women (ages 10 to 85 years) measured with atlas-based parcellation of MRI"     cartoon plot of Lateral Ventricles data;     actual data plot of Lateral Ventricles data;    development of lmer (random effect) growth models
6.   Beyond Straight-line Growth: Polynomial and Non-linear Models.
Polynomial examples: The book by Mirman, resource item 7   Growth Curve Analysis and Visualization Using R   not surprisingly has some good data examples; see RQ#4.
Logistic Curve Example: Orange Tree growth.     Data from MEMSS package Data sets and sample analyses from Pinheiro and Bates, Mixed effects Models in S and S-PLUS (Springer, 2000).
   Plots and nlmer analysis, Orange tree data
   Doug Bates Slides Orange trees analysis (pdf pages 8-16), Logistic SS (pdf p.6), pharmacokinetics ex (pdf pages 7, 17-24)   Bates NLMM.Rnw      From week 1 SSlogis (Self-Starting Logistic model)  links and materials.        another analysis of Orange Trees in the ASReml package manual section 8.9
Also LDA book Chapter 5. Chapter 5. Non-linear mixed-effects models Marie Davidian
    additional tools in the grofit package and nlmeODE package Title Non-linear mixed-effects modelling in nlme using differential equations

WEEK 3 Review Questions
1. Constrained models.
A somewhat common practice (which I'm not that fond of) is to constrain individuals differences in an lmer model--e.g. force all individuals to have the same rate of change. For purposes here use the sleepstudy data to fit a mixed-model with all individuals having the same time gradient. Compare to the model in class allowing slopes and levels to differ.       
 Solution for Review Question 1
2. Orange tree extras. Take the fixed effects from the orange tree nlmer model, "m1" in the class materials, as the parameters of the "average" growth curve for this group of trees. Plot that logistic growth curve (either use a formula for logistic or the growfit package has a simple function). Compare the fixed effects from nlmer to the results from nls for these data. More challenging Try to superimpose the group logistic curve (above) onto the plots of the individual tree trajectories (you may want to refer to the plots week1 Aids data).       
 Solution for Review Question 2
3. Asymptotic regression, SSasymp slide (pdf p.5 of Bates slides, Nonlinear mixed models talk linked in Week 3, Topic 4). Data are from Neter-Wasserman text in file CH13TA04.txt. The outcome variable is manufacturing relative efficiency (RelEff) over 90 weeks duration for two different locations. Plot the RelEff outcome against week for the two locations. Use the SSasymp function for a nlmer fit (or nls if needed) to see whether the asymptote differs for the two locations.       
 Solution for Review Question 3
4. Quadratic (polynomial) Trends.   The book by Mirman resource item 7   Growth Curve Analysis and Visualization Using R   not surprisingly has some good data examples (primarily psychological learning experiments). Here we use the Chapter 3 data set (sec 3.4) Word Learning. Data at http://www.danmirman.org/gca/WordLearnEx.txt. Use the subset TP == Low. How many subjects in that subset? How many observations on each? Accuracy is the outcome measure, the time ordered measure is Block (see Fig 3.7). Investigate a linear trend versus a quadratic trend using mixed effect models.       
 Solution for Review Question 4

WEEK 3 Exercises
1. Teen age drinking. [note: data location updated 10/12/17]
The UCLA data archive has a comma delimited file (access by read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", header=T, sep=",")  .
Measurements on 82 adolescents (initial age 14) included 3 time-ordered observations on alcohol use and two background (exogenous) variables: dichotomous coa (child of an alcoholic) and measured variable peer (alcohol use by target's peers). Describe the collection of time trajectories in alcohol use. Fit an unconditional mixed model to this collection of time-trajectories and obtain interval estimates for the random and fixed effects. Show a plot for the random effects (subjects) and interpret the fixed effects. Now consider the two exogenous variables. Using conditional models, identify the best fitting model. Interpret the fixed effects for the best fitting model.
2.  Vocabulary learning data from test results on file in the Records Office of the Laboratory School of the University of Chicago. Source D R Bock, MSMBR. The data consist of scores, obtained from a cohort of pupils at the eigth through eleventh gade level on alternative forms of the vocabulary section of the Cooperative Reading Tests." There are 64 students in all, 36 male, 28 female (ordered) each with four equally spaced observations (test scores). Wide form of these data are in BOCKwide.dat and I kindly also made a long-form version BOCKlong.dat . Construct the usual collection of individual trajectory displays (either connect-the-dots or compare to a straight-line). Obtain the means (over persons) and plot the group growth curve. Does there appear to be curvature (i.e. deceleration in vocabulary skill growth)?
a. Construct an lmer model with the individual growth curve a quadratic function of grade (year), most convenient to use uncorrelated predictors grade - mean(grade) and (grade - mean(grade))^2. Fit the lmer model and interpret the fixed and random effects you obtain. Compare the results with a lmer model in which the individual trajectories are straight-line. Use the anova model comparison functionality in R (e.g. anova(modLin, modQuad) to test whether the quadratic function for individual growth produces a better model fit.
b. Investigate (via lmer model) gender differences (isMale) in vocabulary growth. Fit appropriate lmer models and interpret results,
3. Data on the growth of chicks on different diets. Hand and Crowder (1996), Table A.2, p. 172 Hand, D. and Crowder, M. (1996), Practical Longitudinal Data Analysis, Chapman and Hall, London. The dataset is available as a .R file; easiest to bring this page down to your machine and then load into your R-session (or try to load remotely). Here we consider the 20 chicks on Diet 1. (select these). Construct the plots analogous to those for the class example Orange trees: individual chicks frame-by-frame and all chicks on one plot. Fit a nlmer model that allows final weight (asymptote) to differ over chicks (other params fixed). Use ranef (individual estimates) to identify the largest asymptote value and smallest value. Plot the "average" growth curve under diet 1. Compare that nlmer maodel with a model that does not allow asymptotes to differ. What is your conclusion. Also compare with a nls model that ignores repeated measurements structure (i.e. ignores individual chicks). Compare the average growth curves.



10/18. Special case of time-1, time-2 data; Traditional measurement of change and more

Lecture Topics
1. Properties of Collections of Growth Curves. class handout
2. Time-1, time-2 data. (paired data)
     The R-package PairedData has some interesting plots and statistical summaries for "before and after" data;
          here is a McNeil plot for Xi.1, Xi.5 in data example
     Paired dichotomous data, McNemar's test (in R, mcnemar.test {stats}), Agresti (2nd ed) sec 10.1
      Also see R-package PropCIs       Prime Minister example
3. Issues in the Measurement of Change. Class lecture covers Myths 1-6+.
     Slides from Myths talk    . Class Handout, Companion for Myths talk
4. Examples for Exogenous Variables and Correlates of Change (use of lagged dependent variables)
   Time-1,time-2 data analysis examples    Measurement of change: time-1,time-2 data
      data example for handout    scan of regression handout      ascii version of data analysis handout    
   Extra material for Correlates and predictors of change: time-1,time-2 data
    Rogosa R-session to replicate handout, demonstrate wide-to-long data set conversion, and descriptive fitting of individual growth curves. Some useful plots from Rogosa R-session
        Technical results: Section 3.2.2 esp Equation 27 in Rogosa, D. R., & Willett, J. B. (1985). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 203-228.      Talk slides
5. Comparing groups on time-1, time-2 measurements: repeated measures anova vs lmer OR the t-test
Comparative Analyses of Pretest-Posttest Research Designs, Donna R. Brogan; Michael H. Kutner, The American Statistician, Vol. 34, No. 4. (Nov., 1980), pp. 229-232.   JSTOR link
     urea synthesis, BK data       data, long-form
    BK plots (by group)     BK overview
    2017 Analysis handout     Extended BK lmer analysis
Additional stuff
     BK repeated measures analysis      pdf version
    Stat141 analysis
    archival example analyses. SAS and minitab

Background Readings and Resources
Myths Chapter. Rogosa, D. R. (1995). Myths and methods: "Myths about longitudinal research," plus supplemental questions. In The analysis of change, J. M. Gottman, Ed. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 3-65.
Myths Talk. Rogosa, D. R. (1983)
More stuff (if you don't like the ways I said it)   
I noticed John Gottman did a pub rewriting the myths: Journal of Consulting and Clinical Psychology 1993, Vol. 61, No. 6,907-910 The Analysis of Change: Issues, Fallacies, and New Ideas
Also John Willett did a rewrite of the Myths 'cuz I didn't want to reprint it again (or write a new version): Questions and Answers in the Measurement of Change REVIEW OF RESEARCH IN EDUCATION 1988 15: 345
Reliability Coefficients: Background info. Short primer on test reliability    Informal exposition in Shoe Shopping and the Reliability Coefficient    extensive technical material in Chap 7 Revelle text
A growth curve approach to the measurement of change. Rogosa, David; Brandt, David; Zimowski, Michele Psychological Bulletin. 1982 Nov Vol 92(3) 726-748 APA record   direct link
Rogosa, D. R., & Willett, J. B. (1985). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 203-228.
available from John Willet's pub page
Demonstrating the Reliability of the Difference Score in the Measurement of Change. David R. Rogosa; John B. Willett Journal of Educational Measurement, Vol. 20, No. 4. (Winter, 1983), pp. 335-343. Jstor
Maris, Eric. (1998). Covariance Adjustment Versus Gain Scores--Revisited. Psychological Methods, 3(3) 309-327. apa link  
A good R-primer on repeated measures (a lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li.   Another version
Multilevel package   has behavioral scienes applications including estimates of within-group agreement, and routines using random group resampling (RGR) to detect group effects.
More repeated measures resources: Background primer on analysis of variance (with R); see sections 6.8, 6.9 of Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li.   Pdf version    The ez package provides extended anova capabities.   Examples (blog notes) : Repeated measures ANOVA with R (functions and tutorials)   Repeated Measures ANOVA using R    Obtaining the same ANOVA results in R as in SPSS - the difficulties with Type II and Type III sums of squares
Application publications, time-1, time-2 Experimental Group Comparisons:
a.  Mere Visual Perception of Other People's Disease Symptoms Facilitates a More Aggressive Immune Response Psychological Science, April 2010   Pre-post data and difference scores (see Table 1)
b. Guns and testosterone. Guns Up Testosterone, Male Aggression
Guns, Testosterone, and Aggression: An Experimental Test of a Mediational Hypothesis Klinesmith, Jennifer; Kasser, Tim; McAndrew, Francis T,   Psychological Science. Vol 17(7), Jul 2006, pp. 568-571.


WEEK 4 Review Questions
1. Time1-time2 regressions; Class example
Repeat the handout demonstration regressions using the fallible measures (the X's) from the bottom half of the linked data page. The X's are simply error-in-variable versions of the Xi's: X = Xi + error, with error having mean 0 and variance 10. Compare 5-number summaries for the amount of change from the earliest time "1" to the final observation "5" using the "Xi" measurements (upper frame) and the fallible "X" observations (lower frame).       
 Solution for Review Question 1
2. (more challenging). Use mvrnorm to construct a second artificial data example (n=100) mirroring the week 4 myths data class handout BUT with the correlation between true individual rate of change and W set to .7 instead of 0. Carry out the corresponding regression demonstration.        
 Solution for Review Question 2
3. Reliability versus precision demonstration
  Consider a population with true change between time1 and time2 distributed Uniform [99,101] and measurement error Uniform [-1, 1]. If you used discrete Uniform in this construction then you could say measurement of change is accurate to 1 part in a hundred.
Calculate the reliability of the difference score.
Also try error Uniform [-2,2], accuracy one part in 50.
A similar demonstration can be found in my Shoe Shopping and the Reliability Coefficient
      
 Solution for Review Question 3
4. Revisit Brogan-Kutner data analysis.
a. Demonstrate the Brogan-Kutner Section 5 equivalences (from paper, shown in class) for repeated measures anova and/or BK lmer analyses.
b. Is amount of gain/decline related to initial status? For the 8 new procedure patients and for the 13 old procedure patients, seperately, estimate the correlation between change and initial status and obtain a confidence interval if possible.
c. Analysis of Covariance. For the Brogan-Kutner data carry out an analysis of covariance (using premeasure as covariate) for the relative effectiveness of the surgery methods. Compare with class analyses.
Slides 203-204 in the Laird-Ware text materials purport to demonstrate that analysis of covariance produces a more precise treatment effect estimate than difference scores (repeated measures anova). What very limiting assumption is slipped into their analysis? Can you create a counter-example to their assertion/proof?       
 Solution for Review Question 4
                                                 part c. Solution Notes on the ALA (Laird-Ware) assertion
5.  Repeat Brogan-Kutner lmer analyses from lecture. Just another repitition of BK class handout.
Use lmer (or lme) to determine the comparative efficacy of the surgical methods on liver function. Investigate whether a model allowing for pretest differences is helpful.       
 Solution for Review Question 5


WEEK 4 Exercises
1. Captopril and Blood pressure
The file captopril.dat contains the data shown in Section 2.2 of Verbeke, Introduction to Longitudinal Data Analysis, slides. Captopril is an angiotensin-converting enzyme inhibitor (ACE inhibitor) used for the treatment of hypertension.
a. Smart First Year Student analyses. Use the before and after Spb measurements to examine the improvement (i.e. decrease) in blood pressure. Obtain a five-number summary for observed improvement. What is the correlation between change and initial blood pressure measurement? Obtain a confidence interval for the correlation and show the corresponding scatterplot. What special challenges are present in this analysis?
b. lmer analyses. Try to obtain a good confidence interval for the amount of decline. Obtain a point and interval estimate for the correlation beween initial status and change in Spb.
2. Regression toward the mean? Galton's data on the heights of parents and their children
In the "HistData" or "psych" packages reside the "galton" dataset, the primordial regression toward mean example.
Description: Galton (1886) presented these data in a table, showing a cross-tabulation of 928 adult children born to 205 fathers and mothers, by their height and their mid-parent's height. A data frame with 928 observations on the following 2 variables. parent Mid Parent heights (in inches) child Child Height. Details: Female heights were adjusted by 1.08 to compensate for sex differences. (This was done in the original data set)
Consider "parent" as time1 data and "child" as time2 data and investigate whether these data indicate regression toward the mean according to either definition (metric or standardized)? Refer to Section 4 of the Myths chapter supplement (pagination 61-63) for an assessment of regression toward the mean (i.e. counting up number of subjects satisfying regression-toward-mean).
Aside: if you like odd plots, look at the sunflowerplot code in the docs for the galton data.
3. Paired and unpaired samples, continuous vs categorical measurements.
Let's use again the 40 subjects in the Review Question 1 "X" data.
a. Measured data. Take the time1 and time5 observations and obtain a 95% Confidence Interval for the amount of change. Compare the width of that interval with a confidence interval for the difference beween the time5 and time1 means if we were told a different group of 40 subjects was measured at each of the time points (data no longer paired).
b. Dichotomous data. Instead look at these data with the criterion that a score of 50 or above is a "PASS" and below that is "FAIL". Carry out McNemar's test for the paired dichotomous data, and obtain a 95% CI for the difference between dependent proportions. Compare that confidence interval with the "unpaired" version (different group of 40 subjects was measured at each of the time points) for independent proportions.
4. Beat the Blues from Chap 12 of HSAUR 2nd ed (resource # 4).
Data in wide form: data("BtheB", package = "HSAUR2"). Chap. 12 describes the cognitive behavioural program and conducts various analyses. We will use the pretest and the two-month followup (additional followups have lots of missing data).
Investigate the effectiveness of Beat the Blues from these 2-wave data. Follow the various descriptive and modelling strategies shown in the BK class example.


10/25.  Experimental Protocols and Comparing Group Growth


Crossover Designs in the news
1. Does nutrition science know anything?     Is white or whole wheat bread 'healthier?' Depends on the person    Publication: Bread Affects Clinical Parameters and Induces Gut Microbiome-Associated Personal Glycemic Responses Cell Metabolism, Korem et al DOI: 10.1016/j.cmet.2017.05.002
2. This time with 3 conditions   For Exercise, Nothing Like the Great Outdoors   Publication: Niedermeier M, Einwanger J, Hartl A, Kopp M (2017) Affective responses in mountain hiking-- randomized crossover trial focusing on differences between indoor and outdoor activity. PLoS ONE 12(5): e0177719. https://doi.org/10.1371/journal.pone.0177719



Lecture Topics
1. Cross-over designs (usually time-1, time-2). Laird-Ware text slides (pdf pages 135-150). Crossover design data from slide 137,    anova for crossover design ex       ascii version, anova for crossover design ex   
   R-resources for crossover designs. package Crossover    Crossover vignette     package crossdes   see Rnews Vol. 5/2, November 2005         also see slides 5-14 Repeated Measures Design Mark Conaway
2. Multi-wave growth example: Bock Vocabulary data. Historical note: Repeated Measures anova (with linear, quadratic, cubic contrasts): class example. (note Mirman text uses orthogonal polynomials)
3.  Group Comparisons for Longitudinal Experimental Designs. Group growth and Experimental comparisons for count and dichotomous outcomes(examples From HSAUR 2ndEd, Ch.13).
Link functions for generalized linear mixed models (GLMMs), Bates slides (pdf pages 11-18)
A Handbook of Statistical Analyses Using R, Second Edition Torsten Hothorn and Brian S . Everitt Chapman and Hall/CRC 2009. Analysing Longitudinal Data II -- Generalised Estimation Equations and Linear Mixed Effect Models: Treating Respiratory Illness and Epileptic Seizures (Stanford access)
     Data sets etc Package 'HSAUR2' August 2014, Title A Handbook of Statistical Analyses Using R (2nd Edition)
  A.    Analysis of Count data.      Epilepsy example, group comparisons, collection of individual trajectories. HSAUR chap 13    Rogosa R-session using gee and lmer     class handout
   Recap Group Comparisons, Epilepsy example. Comparison of lmer models
For SAS (and GEE) fans another analysis
  B.    Binary Response, dichotomous outcomes. Respiratory Illness Data from HSAUR package. Data and description also at the ALA (Laird-Ware) site   Rogosa R-session using lmer     class handout
        
4. Study Design: Power Calculations for Longitudinal Group Comparsions.
   R-package longpower Vignettes found by "browseVignettes(package = "longpower")" .    Functions in MBESS package--ss.power.pcm.
   Background pubs:  Power for linear models of longitudinal data with applications to Alzheimer's Disease Phase II study design Michael C. Donohue, Steven D. Edland, Anthony C. Gamst
Sample Size Planning for Longitudinal Models: Accuracy in Parameter Estimation for Polynomial Change Parameters Ken Kelley Notre Dame Joseph R. Rausch Psychological Methods 2011
        basic R analogues, power.t.test   power.anova.test
5. Missing Data Concerns.
   Nontechnical overviews:
  Phil Lavori et al. Psychiatric Annals, Volume 38, Issue 12, December 2008 Missing Data in Longitudinal Clinical Trials, Part A    Part B
   Robin Henderson,   Missing Data in Longitudinal Studies pdf pages 89-93
Technical review: Missing data methods in longitudinal studies: a review   Joseph G. Ibrahimcorresponding author and Geert Molenberghs
More on Missing data and imputation, including mice week 10 topic.     Flexible Imputation of Missing Data. Stef van Buuren Chapman and Hall/CRC 2012. Chapter 9, Longitudinal Data Sec 3.8 Multilevel data. He is the originator of mice

WEEK 5 Review Questions
1. Power (sample size) calculations for experimental group comparisons.
a. Longpower package (vignette). Reconstruct the sample size calculation for the Alzheimer's disease trial (7 waves) on p.4 of the vignette.
b. MBESS package. Recreate the sample size calculation for width of confidence interval for differential growth using ss.aipe.pcm function in the example used in Kelley and Rausch appendix (and MBESS manual)       
 Solution for Review Question 1
2. Revisit Respiration example.
a. try to do lmList on these data to get odds(good) for each of the each 111 subjects. Investigate effectiveness of treatment.
b Use lmer analyses to compare treament and placebo. Obtain a confidence interval for effectiveness of treament. Investigate gender differences in response to the intervention (i.e. the treatment)
c. Extend the lmer model in part b by adding the age and baseline measurements to the level 2 model. Compare with part b results.       
 Solution for Review Question 2
3. Revisit Epilepsy example.
To supplement the longitudinal texts (HSAUR, ALA etc) full model for the epilepsy data, lets try to build up the analysis from basic description comparing placebo vs drug up through some basic some basic glmer models.
A somewhat similar effort was made in the second class posting "Recap group comparisons (epcomp)" linked above. In this exercise treat period as a time measurement (1,2,3,4) rather than an ordered factor.
How many subjects in placebo and drug groups? Use lmList to obtain slopes and intercepts for fits of time trends to seizures for each subject and compare drug and placebo groups.
Fit and compare glmer models with treatment as the only level 2 predictor (for intercept) without and with a time trend. Compare.
Add the baseline to the glmer models above (in level 2 model for intercept; is effect of the drug significant (use confint)? Does adding age help this model?       
 Solution for Review Question 3
4. Extensions for the epilepsy example: residuals, diagnostics       
 Solution for Review Question 4
5. Revisit cross-over design, class example, Lecture item 1. The class example used repeated measures analysis of variance for estimation the effect of the drug in the dialysis example, (I messed up the medical context in class). Repeat that analysis using lmer and show identical results to class example analysis. Also examine the effectiveness, increase in precision, resulting from each subject functioning as their own comparison, rather than having two separate (randomly assigned) treatment and control groups.       
 Solution for Review Question 5

WEEK 5 Exercises
1. We use a subset of the Baumann data from the car package, which I was nice enought to put in longform at http://rogosateaching.com/stat222/readlongdat .
These data are from a study of reading from Purdue. We use the data to compare two methods: Basal, traditional method of teaching; DRTA, an innovative method; coded 1 and 2 respectively in the data. Random assignment placed twenty-two students in each group; reading test measures were obtained pre and post instruction.
The Directed Reading Thinking Activity (DRTA) is a strategy that guides students in asking questions about a text, making predictions, and then reading to confirm or refute their predictions. The DRTA process encourages students to be active and thoughtful readers, enhancing their comprehension.
Use descriptive and inferential statistical methods to assess the relative efficacy DRTA method.
2.Treatment of Lead Exposed Children (TLC) Trial. Data (wide form) and description: data here
Start out by just using the subset of the longitudinal data Lead Level Week 0 and Week 6. Carry out the repeated measures anova for the relative effectiveness of chelation treatment with succimer or placebo (A,P). Show the three equivalences in the Brogan-Kutner paper between the repeated measures anova results and simple t-tests for these data. Next compare with a lmer fit following the B-K class example (posted). Finally use all 4 longitudinal measures (weeks 0,1,4,6) for a Active vs Placebo comparison using lmer. Compare with the results that use only 2 observations.
3. Crossover Design. The dataset consists of safety data from a crossover trial on the disease cerebrovascular deficiency. The response variable is not a trial endpoint but rather a potential side effect. In this two-period crossover trial, comparing the effects of active drug to placebo, 67 patients were randomly allocated to the two treatment sequences, with 34 patients receiving placebo followed by active treatment, and 33 patients receiving active treatment followed by placebo. The response variable is binary, indicating whether an electrocardiogram (ECG) was abnormal (Y=1) or normal (Y=0). Each patient has a bivariate binary response vector.
Data set is available at http://www.hsph.harvard.edu/fitzmaur/ala/ecg.txt (needs to be cut-and-paste into editor). Carry out the basic analysis of variance for this crossover design following week 5 Lecture topic 2. You may want to use glm to take into account the binary outcome. Does the treatment increase the probability of abnormal ECG? Give a point estimate and significance test for the treatment effect.
4. Data on Amenorrhea from Clinical Trial of Contracepting Women. Source: Table 1 (page 168) of Machin et al. (1988). Reference: Machin D, Farley T, Busca B, Campbell M and d'Arcangues C. (1988). Assessing changes in vaginal bleeding patterns in contracepting women. Contraception, 38, 165-179.
Data in long form  and   a wide-form version
Description: The data are from a longitudinal clinical trial of contracepting women.In this trial women received an injection of either 100 mg or 150 mg of depot-medroxyprogesterone acetate (DMPA) on the day of randomization and three additional injections at 90-day intervals. There was a final follow-up visit 90 days after the fourth injection, i.e., one year after the first injection.
Throughout the study each woman completed a menstrual diary that recorded any vaginal bleeding pattern disturbances. The diary data were used to determine whether a women experienced amenorrhea, the absence of menstrual bleeding for a specified number of days. A total of 1151 women completed the menstrual diaries and the diary data were used to generate a binary sequence for each woman according to whether or not she had experienced amenorrhea in the four successive three month intervals.
In clinical trials of modern hormonal contraceptives, pregnancy is exceedingly rare (and would be regarded as a failure of the contraceptive method), and is not the main outcome of interest in this study. Instead, the outcome of interest is a binary response indicating whether a woman experienced amenorrhea in the four successive three month intervals. A feature of this clinical trial is that there was substantial dropout. More than one third of the women dropped out before the completion of the trial. In the linked data, missing data are designated by "."  [note: in the week 6 terminology consider the dropouts to be missing at random, not necessarily a correct assumption.]
The purpose of this analysis is to assess the influence of dosage on the risk of amenorrhea and any individual differences in the risk of amenorrhea.
Show your model for these data and the results. Provide significance tests and/or interval estimates for the odds of amenorrhea as a function of dose. Display and interpret individual differences in response by showing the random effects within each experimental group.
5. Chick Data, finale. One more use of the chick data (week 3, problem 2; week 1 class lecture). Use the data for all 4 Diets to construct a nlmer model that allows asymptotes to differ across the four diets. Do the diets produce significantly different results? Which diet produces the heaviest 'mature' chick weight?
6. Missing Data. Wide-form longitudinal data
   Artificial data example from week 2 RQ3 and Week 4 Lecture item 4 (used in Myths examples to illustrate time-1,time-2 data analysis)    Two part artificial data example.   The top frame (the Xi's) is 40 subjects each with three equally spaced time observations (here in wide form). For these these perfectly measured "Xi" measurements each subject's observation fall on a straight-line.
   a. Use data set W6prob1a , for which about 15% of the observations have been made missing. Use these data (with lm) to recreate the multiple regression demonstration in Week 4 lecture, part 4: "Correlates and predictors of change: time-1,time-2 data" . Compare with the results for the full data on 40 subjects. What does lm do with missing data?
   b. Repeat part a with data set W6prob1b. Can you find any reason to doubt a "missing at random" assumption for this data set?
Note: if we don't get to it in Week 5, then in Week 10 (DW) we will demonstrate multiple imputation procedures (mice) for wide-form data, at least.



11/1. Comparing Group Growth, continued. Observational Studies, Cohort Designs.