Statistical Methods for Longitudinal Research

Office Hours after class (plus additional TBA)

Course web page: http://rogosateaching.com/stat222//

To see full course materials from Autumn 2017 go here

Registrar's informationSTATS 222 (Same as EDUC 351A): Statistical Methods for Longitudinal Research Units: 2-3 Lecture Th 3:00PM - 5:15PM Sequoia 200 Rogosa Office Hour: 5:15 - 5:50PM, Sequoia 224 Grading Basis: Letter or Credit/No Credit Course Description: STATS 222: Statistical Methods for Longitudinal Research (EDUC 351A) Research designs and statistical procedures for time-ordered (repeated-measures) data. The analysis of longitudinal panel data is central to empirical research on learning, development, aging, and the effects of interventions. Topics include: measurement of change, growth curve models, analysis of durations including survival analysis, experimental and non-experimental group comparisons, reciprocal effects, stability. See http://rogosateaching.com/stat222/. Prerequisite: intermediate statistical methods Terms: Aut | Units: 2-3 | Grading: Letter or Credit/No Credit Instructors: Rogosa, D. (PI)

Week 1. Course Overview, Longitudinal Research; Analyses of Individual Histories and Growth Trajectories

Week 2. Introduction to Data Analysis Methods for assessing Individual Change for Collections of Growth Curves (mixed-effects models)

Week 3. Analysis of Collections of growth curves: linear, generalized linear and non-linear mixed-effects models

Week 4. Special case of time-1, time-2 data; Traditional measurement of change for individuals and group comparisons

Week 5. Assessing Group Growth and Comparing Treatments: Traditional Repeated Measures Analysis of Variance and Linear Mixed-effects Models

Week 6. Comparing group growth continued: Power calculations, Cohort Designs, Cross-over Designs, Methods for missing data, Observational studies.

Week 7. Analysis of Durations: Introduction to Survival Analysis and Event History Analysis

Weeks 8-9. Further topics in analysis of durations: Diagnostics and model modification; Interval censoring, Time-dependence, Recurrent Events, Frailty Models, Behavioral Observations and Series of Events (renewal processes)

Dead Week. Assorted Special Topics (enrichment) and Overflow (weeks 1-8): Assessments of Stability (including Tracking), Reciprocal Effects, (mis)Applications of Structural Equation Models, Longitudinal Network Analysis

1. Garrett M. Fitzmaurice Nan M. Laird James H. Ware Applied Longitudinal Analysis (Wiley Series in Probability and Statistics; 2nd ed 2011)

Text Website second edition website Text lecture slides

2. Judith D. Singer and John B. Willett . Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence New York: Oxford University Press, March, 2003.

Text web page Text data examples at UCLA IDRE Powerpoint presentations good gentle intro to modelling collections of growth curves (and survival analysis) is Willett and Singer (1998)

3. Douglas M. Bates. lme4: Mixed-effects modeling with R February 17, 2010 Springer (chapters). A merged version of Bates book: lme4: Mixed-effects modeling with R January 11, 2010 has been refound

Manual for R-package lme4 and mlmRev, Bates-Pinheiro book datasets.

Additional Doug Bates materials. Collection of all Doug Bates lme4 talks Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Douglas Bates 8th International Amsterdam Conference on Multilevel Analysis 2011-03-16 another version

Original Bates-Pinheiro text (2000). Mixed-Effects Models in S and S-PLUS (Stanford access). Appendix C has non-linear regression models.

Fitting linear mixed-effects models using lme4,

4. A handbook of statistical analyses using R (second edition). Brian Everitt, Torsten Hothorn CRC Press, Index of book chapters Stanford access Longitudinal chapters: Chap11 Chap12 Chap13. Data sets etc Package 'HSAUR2' August 2014, Title A Handbook of Statistical Analyses Using R (2nd Edition)

There is now a third edition of HSAUR, but full text not yet available in crcnetbase.com. CRAN HSAUR3 page with Vignettes (chapter pieces) and data in reference manual

5. Peter Diggle , Patrick Heagerty, Kung-Yee Liang , Scott Zeger. Analysis of Longitudinal Data 2nd Ed, 2002

Amazon page Peter Diggle home page Book data sets

A Short Course in Longitudinal Data Analysis Peter J Diggle, Nicola Reeve, Michelle Stanton (School of Health and Medicine, Lancaster University), June 2011 earlier version associated exercises: Lab 1 Lab2 Lab3

6. Longitudinal and Panel Data: Analysis and Applications for the Social Sciences by Edward W. Frees (2004). Full book available and book data and programs (mostly SAS).

7. Growth Curve Analysis and Visualization Using R. Daniel Mirman Chapman and Hall/CRC 2014 Print ISBN: 978-1-4665-8432-7 Stanford Access Mirman web page (including data links).

8.

9. Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. New-York: Springer. Extended presentation: Introduction to Longitudinal Data Analysis A shorter exposition: Methods for Analyzing Continuous, Discrete, and Incomplete Longitudinal Data

10. Survival analysis Rupert G. Miller. Available as Stanford Tech Report

11. Event History Analysis with R (Stanford access). Goran Brostrom CRC Press 2012. R-package

12. John D. Kalbfleisch , Ross L. Prentice The Statistical Analysis of Failure Time Data 2nd Ed

Amazon page online from Wiley

13. Advanced survival analysis topics.

Interval-Censored Time-to-Event Data Methods and Applications Chapman and Hall/CRC 2012 (esp Chap 14--glrt).

Recurrent Events: Chapter 9 of Kalbfleisch and Prentice (2nd edition), "Modeling and Analysis of Recurrent Event Data".

Cook, R. J. and Lawless, J. F. (2007). The Statistical Analysis of Recurrent Events. (Stanford access) Springer, New. York.

Joint Models for Longitudinal and Time-to-Event Data. With Applications in R. Dimitris Rizopoulos. Chapman and Hall/CRC 2012(Stanford access) Book website

Additional Specialized Resources

Harvey Goldstein. The Design and Analysis of Longitudinal Studies: Their Role in the Measurment of Change (1979). Elsevier

Amazon page Goldstein Chap 6 Repeated measures data Multilevel Statistical Models by Harvey Goldstein with data sets

David Roxbee Cox, Peter A. W. Lewis The statistical analysis of series of events. Chapman and Hall, 1966

Google books Poisson process computing program

David J Bartholomew. Stochastic Models for Social Processes, Chichester 3rd edition: John Wiley and Sons.

David J Bartholomew web page

Stat222/Ed351A is listed as Letter or Credit/No Credit grading (Stat MS students should check whether S/NC is a viable option for their degree program.)

Grading (for the 2-unit base) will be based on two components:

Each week I will post a few exercises for that week's content--towards the end of the qtr I'll identify a subset of those exercises to be turned in.

During the Autumn qtr exam period we will have an in-class (all materials available, "open" everything) exam.

My reading of the Registrar's chart indicates Tuesday, December 11, 2018 3:30-6:30 p.m. Location: Sequoia 200 (Statistics).

see Class Calendar for details

The Registrar requires clear identification of the requirements for incremental units. The additional requirement for a 3-unit registration (the one unit above 2-units) is satisfied by a student presentation: a mini-lecture, approximately 15 minutes with handout. These are done with Rogosa in Sequoia 224, which has worked out well. Good topics would include empirical longitudinal research, such as a data set or set of studies you are involved with, or an extension of class lecture topics such as preparing an additional data analysis example or a report on some technical readings. Discussion with Rogosa is encouraged.

Course Problem Set 2018 to be posted xxx

Cumulative Collection of Course Handounts 2018 to be posted Dec 2018

Class presentation will be in, and students are encouraged to use, R (occasionally, some references to SAS and Mathematica).

Current version of R is R version 3.5.1 (Feather Spray) released 2018-07-02.

For references and software: The R Project for Statistical Computing Closest download mirror is Berkeley

The CRAN Task View: Statistics for the Social Sciences provides an overview of some relevant R packages. Also the new CRAN Task View: Psychometric Models and Methods and CRAN Task View: Survival Analysis and CRAN Task View: Computational Econometrics.

A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li. Another version

A Stat209 text, Data analysis and graphics using R (2007) J. Maindonald and J. Braun, Cambridge 2nd edition 2007. 3rd edition 2010 has available a short version in CRAN .

According to Peter Diggle: "The best resource for R that I have found is Karl Broman's Introduction to R page."

1. Now they tell us.... Daily aspirin may be harmful for healthy older adults, large study finds Publication: Effect of Aspirin on Disability-free Survival in the Healthy Elderly and related articles. The New England Journal of Medicine nejm.org September 2018.

2.

A. Initial meet-and-greet. Class logistics and longitudinal research overview

B. Examples, illustrations for longitudinal research overview, taken from course resources above:

Laird,Ware (#1) slides 1-16; Diggle (#5) slides 4-14, 22-28 Verbeke (#9) slides from Ch 2 and Sec3.3

C. Data Analysis Examples of Model Fitting for Individual Trajectories and Histories.

ascii version of class handout annotated version pdf version with plots datasets

Starting up R-addendum: installing packages and obtaining data (sleepstudy in lme4)

For Count Data (glm) example. Link functions for generalized linear mixed models (GLMMs), Bates slides (pdf pages 11-18)

AIDS in Belgium example, (from Simon Wood) single trajectory, count data using glm. Rogosa R session for aids data

aditional expositions of AIDS data, Poisson regression: Duke Kentucky

A

Non-linear models, esp logistic. From week 1, also week 3 Self-Starting Logistic model SSlogis help page, do

Trend in Proportions: College fund raising example prop.trend.test help page

Trend in proportions, group growth, Cochran-Armitage test. Expository paper: G. Salanti and K. Ulm (2003): Tests for Trend in Binary Response (SU access)

1. For the straight-line (constant rate of change) fit example to subj 372 in the sleepstudy data. Obtain a confidence interval for the rate of change from the OLS fit. Now compare the OLS fit with day-to-day differences. Under the constant rate of change model these 9 day to day differences also estimate the rate of change. Obtain a estimate of the mean and a confidence interval for rate of change from these first differences. Compare with OLS results.

For reference, Self-Starting Logistic model SSlogis help page, do

North Carolina, female math performance (also in Rogosa-Saner) North Carolina data (wide format); NC data (long)

For that female, what is the rate of improvement over grades 1 through 8? Compare the observed improvement for grades 1 through 8 (the

Seperately, consider three observations at taken at equally spaced time intervals: What is a simple expression for the OLS slope (rate of change)?

Music to accompany long-distance truck driver data: 1971 The Flying Burrito Brothers "Six Days on the Road"

a.

Source Publication: Belenky, G., Wesensten, N. J., Thorne, D. R., Thomas, M. L., Sing, H. C., Redmond, D. P., Russo, M., & Balkin, T. (2003). Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: A sleep dose-response study. Journal of Sleep Research, 12(1), 1-12.

Sleepstudy data analysis from Doug Bates lme4 book lme4: Mixed-effects modeling with R February 17, 2010 (draft chapters) Chapter 4: Sleepstudy example or a set of Bates slides for the sleepstudy example

Why lmer (lme4) does not provide p-values for fixed effects : Doug Bates lmer, p-values and all that There are a number of add-on packages.(see Review Question 1)

Individual plots (frame-by-frame) Plot of straight-line fits Initial descriptive analyses (SFYS)

Sleepstudy, Bates Ch 4, lme4 analyses, ascii Sleepstudy class handout, pdf scan

more Doug Bates Slides (pdf pages 8-28)

North Carolina data (wide format); NC data (long) plots for NC data

Data formatting:

North Carolina example. wide-form descriptives, background, plots Initial SFYS analyses of NC data, ascii

Model Comparisons for North Carolina, female math performance ascii version NC class handout, pdf scan model ncCon2 without redundent model term NC bootstrap results (SAS)

Estimation in lmer.

Fitting linear mixed-effects models using lme4, Journal of Statistical Software Douglas Bates Martin Machler Ben Bolker also Rnews_2005 pp.27-30

Bates book, Chapter 5, Computational Methods. Bates talk slides: Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Extensions and Alternatives, lmer.

Plots and diagnostics: Package

Non-Gaussian modelling. Hierarchical Generalized Linear Models, Package

Extensions of lme4 modeling: Package

Package

Package

Douglas Bates class resource item #3, Texts and Resources. Other Doug Bates materials: Three packages, "SASmixed", "mlmRev" and "MEMSS" with examples and data sets for mixed effect models

North Carolina Data also in (with full development of the modelling) Longitudinal Data Analysis Examples with Random Coefficient Models. David Rogosa; Hilary Saner . Journal of Educational and Behavioral Statistics, Vol. 20, No. 2, Special Issue: Hierarchical Linear Models: Problems and Prospects. (Summer, 1995), pp. 149-170. Jstor Data sets for Rogosa-Saner

Additional talk materials: An Assortment of Longitudinal Data Analysis Examples and Problems 1/97, Stanford biostat. Overview and Implementation for Basic Longitudinal Data Analysis CRESST Sept '97. Another version (short) of the expository material is from the Timepath '97 (old SAS progranms) site: Growth Curve models ; Data Analysis and Parameter Estimation ; Derived quantities for properties of collections of growth curves and bootstrap inference procedures

I start by fitting the lmer model for the collection of growth curves:

Then try out

Then look at the

a. identify the fastest and slowest growth among the 277 females. Compare medians of growth rates for females with verbal ability (Z) at or above 106 with that for females with verbal ability below 106. Show side-by-side boxplots.

b. In the class handout version of the NC analyses (and other postings, but not all) the first thing to do was make the 'time' variable have intitial value = 0 (making the intercept of a straight line fit correspond to level at initial time): i.e. 1 to 8 becomes 0 to 7. Obtain lmList results and fit the ncUnc lmer model (straight-line growth, no Z) using time 1 to 8. Comment on differences of these analyses with those using timeInt in the class handout. In particular, look at the correlation of change and initial status. The correlation between observed change and observed initial status using timeInt was .279 from lmer (

North Carolina data has 277 subjects, a frame-by-frame display of individuals requires subsampling. Construct a plot for 24 (arbitrary) individuals data trajectories.

A subsample of data from the National Youth Survey is obtained in long-form by

and in wide form by

Yearly observations from ages 11 to 15 on the tolerance measure (tolerance to deviant behavior e.g. cheat, drug, steal, beat; larger values indicates more tolerance on a 1to4 scale). Also in this data set are gender (is_male) and an

i. obtain individual OLS fits (tolerance over time) and plot the collection of those straight-lines. Provide descriptive statistic summaries for the rate of change in tolerance and initial level.

ii. fit a mixed effects model for tolerance over time (unconditional) for this collection of individuals. Obtain interval estimates for the fixed and random effects. Show that the fixed effects estimates correspond to quantities obtained in part i. Explain.

iii. Investigate whether the

Consider the sleepstudy and Ramus examples, collections of growth trajectories with no exogenous variable. Ramus Data example. Example consists of 4 longitudinal observations on each of 20 cases. The measurement is the height of the mandibular ramus bone (in mm) for boys each measured at 8, 8.5, 9, 9.5 years of age. These data, which have been used by a number of authors (e.g., Elston and Grizzle 1962), can be found in Table 4.1 of Goldstein (1979). Ramus data example long form for Ramus data tutorial on creating long form data manipulation .

Fitting the lmer models with

Data on early childhood cognitive development described in Doug Bates talk materials (pdf pages 49-52). Obtain these data from the R-package "mlmRev" or the Willett-Singer book site (in our week 1 intro links). Data are in long form and consist of 3 observations 58 treatment and 45 control children; see the Early entry in the mlmRev package docs. Produce the plot of individual trajectories shown pdf p.49, Bates talk. (note:Bates does connect-the-dots, we have done straight-line fit, your choice). Show five-number summaries of rates of impovement in cognitive scores for treatment and control groups. Develop and fit the

Artificial data example from Review Question 3 (used in Myths chapter to illustrate time-1,time-2 data analysis) Start out with the "X" data, and standardize (i.e. transform to mean 0, var 1) at each of the 3 time points. Note "scale" will do this for you (in wide form). For the standardized data obtain the plot showing each subject's data and straight-line fit. What do you have here? Compare the results the mixed-effects models fitting the collection of straight-line growth curves for the measured and standardized data.

Study confirms link between violent video games and physical aggression Publication: Metaanalysis of the relationship between violent video game play and physical aggression over time. PNAS October 2, 2018 115 (40) 9882-9888.

a. plotting residuals , ascii session.

b. lm and gee alternatives (ignore individual growth). ascii session

Logistic Curve Example: Orange Tree growth. Data from MEMSS package Data sets and sample analyses from Pinheiro and Bates, Mixed effects Models in S and S-PLUS (Springer, 2000).

Plots and nlmer analysis, Orange tree data

Doug Bates Slides Orange trees analysis (pdf pages 8-16), Logistic SS (pdf p.6), pharmacokinetics ex (pdf pages 7, 17-24) Bates NLMM.Rnw From week 1 SSlogis (Self-Starting Logistic model) links and materials. another analysis of Orange Trees in the ASReml package manual section 8.9

Also LDA book Chapter 5. Chapter 5. Non-linear mixed-effects models Marie Davidian

additional tools in the grofit package and nlmeODE package Title Non-linear mixed-effects modelling in nlme using differential equations

A somewhat common practice (which I'm not that fond of) is to constrain individuals differences in an lmer model--e.g. force all individuals to have the same rate of change. For purposes here use the

The UCLA data archive has a comma delimited file (access by

Measurements on 82 adolescents (initial age 14) included 3 time-ordered observations on alcohol use and two background (exogenous) variables: dichotomous

a. Construct an lmer model with the individual growth curve a quadratic function of grade (year), most convenient to use uncorrelated predictors

b. Investigate (via lmer model) gender differences (isMale) in vocabulary growth. Fit appropriate lmer models and interpret results,

1. Properties of Collections of Growth Curves. class handout

2. Time-1, time-2 data. (paired data)

The R-package PairedData has some interesting plots and statistical summaries for "before and after" data;

here is a McNeil plot for Xi.1, Xi.5 in data example

Paired dichotomous data, McNemar's test (in R, mcnemar.test {stats}), Agresti (2nd ed) sec 10.1

Also see R-package

3. Issues in the Measurement of Change. Class lecture covers Myths 1-6+.

Slides from Myths talk . Class Handout, Companion for Myths talk

4. Examples for Exogenous Variables and Correlates of Change (use of lagged dependent variables)

Time-1,time-2 data analysis examples Measurement of change: time-1,time-2 data

data example for handout scan of regression handout ascii version of data analysis handout

Extra material for Correlates and predictors of change: time-1,time-2 data

Rogosa R-session to replicate handout, demonstrate wide-to-long data set conversion, and descriptive fitting of individual growth curves. Some useful plots from Rogosa R-session

Technical results: Section 3.2.2 esp Equation 27 in Rogosa, D. R., & Willett, J. B. (1985). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 203-228. Talk slides

5. Comparing groups on time-1, time-2 measurements: repeated measures anova vs lmer OR the t-test

Comparative Analyses of Pretest-Posttest Research Designs, Donna R. Brogan; Michael H. Kutner,

urea synthesis, BK data data, long-form

BK plots (by group) BK overview

2017 Analysis handout Extended BK lmer analysis

Additional stuff

BK repeated measures analysis pdf version

Stat141 analysis

archival example analyses. SAS and minitab

Myths Chapter. Rogosa, D. R. (1995). Myths and methods: "Myths about longitudinal research," plus supplemental questions. In The analysis of change, J. M. Gottman, Ed. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 3-65.

Myths Talk. Rogosa, D. R. (1983)

I noticed John Gottman did a pub rewriting the myths: Journal of Consulting and Clinical Psychology 1993, Vol. 61, No. 6,907-910 The Analysis of Change: Issues, Fallacies, and New Ideas

Also John Willett did a rewrite of the Myths 'cuz I didn't want to reprint it again (or write a new version): Questions and Answers in the Measurement of Change REVIEW OF RESEARCH IN EDUCATION 1988 15: 345

Reliability Coefficients: Background info. Short primer on test reliability Informal exposition in

A growth curve approach to the measurement of change. Rogosa, David; Brandt, David; Zimowski, Michele Psychological Bulletin. 1982 Nov Vol 92(3) 726-748 APA record direct link

Rogosa, D. R., & Willett, J. B. (1985). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 203-228.

available from John Willet's pub page

Demonstrating the Reliability of the Difference Score in the Measurement of Change. David R. Rogosa; John B. Willett Journal of Educational Measurement, Vol. 20, No. 4. (Winter, 1983), pp. 335-343. Jstor

Maris, Eric. (1998). Covariance Adjustment Versus Gain Scores--Revisited.

A good R-primer on repeated measures (a lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li. Another version

Multilevel package has behavioral scienes applications including estimates of within-group agreement, and routines using random group resampling (RGR) to detect group effects.

Application publications, time-1, time-2 Experimental Group Comparisons:

a. Mere Visual Perception of Other People's Disease Symptoms Facilitates a More Aggressive Immune Response

b. Guns and testosterone. Guns Up Testosterone, Male Aggression

Guns, Testosterone, and Aggression: An Experimental Test of a Mediational Hypothesis Klinesmith, Jennifer; Kasser, Tim; McAndrew, Francis T,

Repeat the handout demonstration regressions using the fallible measures (the X's) from the bottom half of the linked data page. The X's are simply error-in-variable versions of the Xi's: X = Xi + error, with error having mean 0 and variance 10. Compare 5-number summaries for the amount of change from the earliest time "1" to the final observation "5" using the "Xi" measurements (upper frame) and the fallible "X" observations (lower frame).

Consider a population with true change between time1 and time2 distributed Uniform [99,101] and measurement error Uniform [-1, 1]. If you used discrete Uniform in this construction then you could say measurement of change is accurate to 1 part in a hundred.

Calculate the reliability of the difference score.

Also try error Uniform [-2,2], accuracy one part in 50.

A similar demonstration can be found in my

a. Demonstrate the Brogan-Kutner Section 5 equivalences (from paper, shown in class) for repeated measures anova and/or BK lmer analyses.

b. Is amount of gain/decline related to initial status? For the 8 new procedure patients and for the 13 old procedure patients, seperately, estimate the correlation between change and initial status and obtain a confidence interval if possible.

c. Analysis of Covariance. For the Brogan-Kutner data carry out an analysis of covariance (using premeasure as covariate) for the relative effectiveness of the surgery methods. Compare with class analyses.

Slides 203-204 in the Laird-Ware text materials purport to demonstrate that analysis of covariance produces a more precise treatment effect estimate than difference scores (repeated measures anova). What

Use lmer (or lme) to determine the comparative efficacy of the surgical methods on liver function. Investigate whether a model allowing for pretest differences is helpful.

The file captopril.dat contains the data shown in Section 2.2 of Verbeke, Introduction to Longitudinal Data Analysis, slides. Captopril is an angiotensin-converting enzyme inhibitor (ACE inhibitor) used for the treatment of hypertension.

a. Smart First Year Student analyses. Use the before and after Spb measurements to examine the improvement (i.e. decrease) in blood pressure. Obtain a five-number summary for observed improvement. What is the correlation between change and initial blood pressure measurement? Obtain a confidence interval for the correlation and show the corresponding scatterplot. What special challenges are present in this analysis?

b. lmer analyses. Try to obtain a good confidence interval for the amount of decline. Obtain a point and interval estimate for the correlation beween initial status and change in Spb.

In the "HistData" or "psych" packages reside the "galton" dataset, the primordial regression toward mean example.

Description: Galton (1886) presented these data in a table, showing a cross-tabulation of 928 adult children born to 205 fathers and mothers, by their height and their mid-parent's height. A data frame with 928 observations on the following 2 variables. parent Mid Parent heights (in inches) child Child Height. Details: Female heights were adjusted by 1.08 to compensate for sex differences. (This was done in the original data set)

Consider "parent" as time1 data and "child" as time2 data and investigate whether these data indicate

Aside: if you like odd plots, look at the

Let's use again the 40 subjects in the Review Question 1 "X" data.

a. Measured data. Take the time1 and time5 observations and obtain a 95% Confidence Interval for the amount of change. Compare the width of that interval with a confidence interval for the difference beween the time5 and time1 means if we were told a different group of 40 subjects was measured at each of the time points (data no longer paired).

b. Dichotomous data. Instead look at these data with the criterion that a score of 50 or above is a "PASS" and below that is "FAIL". Carry out McNemar's test for the paired dichotomous data, and obtain a 95% CI for the difference between dependent proportions. Compare that confidence interval with the "unpaired" version (different group of 40 subjects was measured at each of the time points) for independent proportions.

Data in wide form:

Investigate the effectiveness of Beat the Blues from these 2-wave data. Follow the various descriptive and modelling strategies shown in the BK class example.

1. Does nutrition science know anything? Is white or whole wheat bread 'healthier?' Depends on the person Publication: Bread Affects Clinical Parameters and Induces Gut Microbiome-Associated Personal Glycemic Responses Cell Metabolism, Korem et al DOI: 10.1016/j.cmet.2017.05.002

2. This time with 3 conditions For Exercise, Nothing Like the Great Outdoors Publication: Niedermeier M, Einwanger J, Hartl A, Kopp M (2017) Affective responses in mountain hiking-- randomized crossover trial focusing on differences between indoor and outdoor activity. PLoS ONE 12(5): e0177719. https://doi.org/10.1371/journal.pone.0177719

R-resources for crossover designs. package

Link functions for generalized linear mixed models (GLMMs), Bates slides (pdf pages 11-18)

A Handbook of Statistical Analyses Using R, Second Edition Torsten Hothorn and Brian S . Everitt Chapman and Hall/CRC 2009. Analysing Longitudinal Data II -- Generalised Estimation Equations and Linear Mixed Effect Models: Treating Respiratory Illness and Epileptic Seizures (Stanford access)

Data sets etc Package 'HSAUR2' August 2014, Title A Handbook of Statistical Analyses Using R (2nd Edition)

Recap Group Comparisons, Epilepsy example. Comparison of lmer models

For SAS (and GEE) fans another analysis

R-package

Background pubs: Power for linear models of longitudinal data with applications to Alzheimer's Disease Phase II study design Michael C. Donohue, Steven D. Edland, Anthony C. Gamst

Sample Size Planning for Longitudinal Models: Accuracy in Parameter Estimation for Polynomial Change Parameters Ken Kelley Notre Dame Joseph R. Rausch

basic R analogues,

Nontechnical overviews:

Phil Lavori et al. Psychiatric Annals, Volume 38, Issue 12, December 2008 Missing Data in Longitudinal Clinical Trials, Part A Part B

Robin Henderson, Missing Data in Longitudinal Studies pdf pages 89-93

Technical review: Missing data methods in longitudinal studies: a review Joseph G. Ibrahimcorresponding author and Geert Molenberghs

More on Missing data and imputation, including

a. Longpower package (vignette). Reconstruct the sample size calculation for the Alzheimer's disease trial (7 waves) on p.4 of the vignette.

b. MBESS package. Recreate the sample size calculation for width of confidence interval for differential growth using

a. try to do lmList on these data to get odds(good) for each of the each 111 subjects. Investigate effectiveness of treatment.

b Use lmer analyses to compare treament and placebo. Obtain a confidence interval for effectiveness of treament. Investigate gender differences in response to the intervention (i.e. the treatment)

c. Extend the lmer model in part b by adding the age and baseline measurements to the level 2 model. Compare with part b results.

To supplement the longitudinal texts (HSAUR, ALA etc) full model for the epilepsy data, lets try to build up the analysis from basic description comparing placebo vs drug up through some basic some basic glmer models.

A somewhat similar effort was made in the second class posting "Recap group comparisons (epcomp)" linked above. In this exercise treat period as a time measurement (1,2,3,4) rather than an ordered factor.

How many subjects in placebo and drug groups? Use lmList to obtain slopes and intercepts for fits of time trends to seizures for each subject and compare drug and placebo groups.

Fit and compare glmer models with treatment as the only level 2 predictor (for intercept) without and with a time trend. Compare.

Add the baseline to the glmer models above (in level 2 model for intercept; is effect of the drug significant (use confint)? Does adding age help this model?

These data are from a study of reading from Purdue. We use the data to compare two methods: Basal, traditional method of teaching; DRTA, an innovative method; coded 1 and 2 respectively in the data. Random assignment placed twenty-two students in each group; reading test measures were obtained pre and post instruction.

The Directed Reading Thinking Activity (DRTA) is a strategy that guides students in asking questions about a text, making predictions, and then reading to confirm or refute their predictions. The DRTA process encourages students to be active and thoughtful readers, enhancing their comprehension.

Use descriptive and inferential statistical methods to assess the relative efficacy DRTA method.

Start out by just using the subset of the longitudinal data Lead Level Week 0 and Week 6. Carry out the repeated measures anova for the relative effectiveness of chelation treatment with succimer or placebo (A,P). Show the three equivalences in the Brogan-Kutner paper between the repeated measures anova results and simple t-tests for these data. Next compare with a lmer fit following the B-K class example (posted). Finally use all 4 longitudinal measures (weeks 0,1,4,6) for a Active vs Placebo comparison using lmer. Compare with the results that use only 2 observations.

Data set is available at http://www.hsph.harvard.edu/fitzmaur/ala/ecg.txt (needs to be cut-and-paste into editor). Carry out the basic analysis of variance for this crossover design following week 5 Lecture topic 2. You may want to use glm to take into account the binary outcome. Does the treatment increase the probability of abnormal ECG? Give a point estimate and significance test for the treatment effect.

Data in long form and a wide-form version

Description: The data are from a longitudinal clinical trial of contracepting women.In this trial women received an injection of either 100 mg or 150 mg of depot-medroxyprogesterone acetate (DMPA) on the day of randomization and three additional injections at 90-day intervals. There was a final follow-up visit 90 days after the fourth injection, i.e., one year after the first injection.

Throughout the study each woman completed a menstrual diary that recorded any vaginal bleeding pattern disturbances. The diary data were used to determine whether a women experienced amenorrhea, the absence of menstrual bleeding for a specified number of days. A total of 1151 women completed the menstrual diaries and the diary data were used to generate a binary sequence for each woman according to whether or not she had experienced amenorrhea in the four successive three month intervals.

In clinical trials of modern hormonal contraceptives, pregnancy is exceedingly rare (and would be regarded as a failure of the contraceptive method), and is not the main outcome of interest in this study. Instead, the outcome of interest is a binary response indicating whether a woman experienced amenorrhea in the four successive three month intervals. A feature of this clinical trial is that there was substantial dropout. More than one third of the women dropped out before the completion of the trial. In the linked data, missing data are designated by "." [note: in the week 6 terminology consider the dropouts to be

The purpose of this analysis is to assess the influence of dosage on the risk of amenorrhea and any individual differences in the risk of amenorrhea.

Show your model for these data and the results. Provide significance tests and/or interval estimates for the odds of amenorrhea as a function of dose. Display and interpret individual differences in response by showing the random effects within each experimental group.

Artificial data example from week 2 RQ3 and Week 4 Lecture item 4 (used in Myths examples to illustrate time-1,time-2 data analysis) Two part artificial data example. The top frame (the Xi's) is 40 subjects each with three equally spaced time observations (here in wide form). For these these perfectly measured "Xi" measurements each subject's observation fall on a straight-line.

a. Use data set W6prob1a , for which about 15% of the observations have been made missing. Use these data (with lm) to recreate the multiple regression demonstration in Week 4 lecture, part 4: "Correlates and predictors of change: time-1,time-2 data" . Compare with the results for the full data on 40 subjects. What does

b. Repeat part a with data set W6prob1b. Can you find any reason to doubt a "missing at random" assumption for this data set?

Note: if we don't get to it in Week 5, then in Week 10 (DW) we will demonstrate multiple imputation procedures (