Course Problem Set 2020 (partial posting: oct 30; complete posting Nov 10)

Due November 20, 2020 

Usual Honor Code procedures: You may use any of your own inanimate resources--no collaboration or assistance from others. This work is done under Stanford's Honor Code.

Please ask ( about issues of question interpretation, and especially in regard to any materials you feel you need but don't have access to.
Any issues that come up (wording, interpretation) I will post a note here, so it would be good to check this page intermittently.

Submission of work.
Given that these problems are untimed, some care should be taken in presentation, clarity, format.
Especially important is to give full and clear answers to questions, not just to submit unannotated computer output, although relevant output should be included.
PLEASE check that you have answered all the parts and subparts. Please start each problem on a new page and keep all material for a problem contiguous.
In prior years solutions for these problems sets were submitted in hard-copy form at the final class meeting.
Things are different this year.
I certainly will accept hard copy either mailed to my Palo Alto home or left on my front porch (as a student did last spring).
I expect most students will seek some form of electronic submision.
Last spring quarter, students used a variety of methods: posting to their Google drive (e.g. pdf files) and giving me access, or mailing me rendered html files. Most popular, posting an html file on rpubs was great for a project course, but doesn't provide adequate security for an exam file.
We will address submission options more towards end of quarter. I am quite interested in any useful suggestions.
Of course, and I probably don't need to say this, do not mail me a Word binary file (you shouldn't do that to anyone).

Course Problem Set, Item 1.
   Change over time, measured outcome
Week 2, Exercise 1
Tolerance data
A small subsample of data (16 respondents) from the National Youth Survey is obtained in long-form by
read.table("", sep=",", header=T)
and in wide form by
read.table("", sep=",", header=T)
Yearly observations from ages 11 to 15 on the tolerance measure (tolerance to deviant behavior e.g. cheat, drug, steal, beat; larger values indicates more tolerance on a 1to4 scale). Also in this data set are gender (is_male) and an exposure measure obtained at age 11 (self report of close friends involvement in deviant behaviors). note: the time measure is age - 11.
part a. obtain individual OLS fits (tolerance over time) and plot the collection of those straight-lines. Provide descriptive statistic summaries for the rate of change in tolerance and initial level.
part b. fit a mixed effects model for tolerance over time (unconditional) for this collection of individuals. Obtain interval estimates for the fixed and random effects. Show that the fixed effects estimates correspond to quantities obtained in part i. Explain.
part c. Investigate whether the exposure measure is a useful predictor of level or rate of change in tolerance. What appears to be the best fitting mixed model for these data using these measures? Show specifics.

Course Problem Set, Item 2.
   Time1-Time2 Clinical Trial
Week 4, Exercise 5
5. From 2017 In the news
The 3 billion dollar (and counting) change score
(items below clipped from 2017 various press reports; we do not have the data)

Sage Therapeutics (NASDAQ:SAGE) surged in response to its announcement of positive results from a Phase 2 clinical trial assessing SAGE-217 for the treatment of adult patients with moderate-to-severe major depressive disorder (MDD), a Fast Track indication in the U.S.[2020 note: name, Zuranolone]
It is estimated that there are around 16 million people in the United States with MDD.
SAGE-217, a neuroactive steroid, is next-generation GABA modulator. The GABA system, the major inhibitory signaling pathway in the brain and central nervous system (CNS), plays a key role in regulating CNS function. The company intends to advance the program into Phase 3 development.
The phase 2 looked at the effect of the positive allosteric modulator of the gamma-aminobutyric acid (GABA) receptor as compared to placebo in 89 patients with MDD.

About the Placebo-controlled Phase 2 trial of SAGE-217 in MDD:
In the randomized, double-blind, parallel-group, placebo-controlled trial, 89 eligible patients (with a minimum total score of 22 on the Hamilton Rating Scale for Depression) were stratified based on use of antidepressant treatment (current/stable or not treated/withdrawn >= 30 days) and randomized in a 1:1 ratio to receive SAGE-217 Capsules (30mg) (n=45) or matching placebo (n=44). All doses of study drug were administered at night with food. The study consisted of a 14-day treatment period, and a 4-week follow-up period. The mean HAM-D total scores at baseline were 25.2 for the SAGE-217 group and 25.7 for the placebo group (overall range 22-33), representing patients with moderate to severe MDD. Approximately 90 percent of patients in each group completed the study.
Sage Therapeutics (NASDAQ: SAGE), a clinical-stage biopharmaceutical company developing novel medicines to treat life-altering central nervous system (CNS) disorders, today announced positive top-line results from the Phase 2, double-blind, placebo-controlled clinical trial of SAGE-217 in the treatment of 89 adult patients with moderate to severe major depressive disorder (MDD). In the trial, treatment for 14 days with SAGE-217 was associated with a statistically significant mean reduction in the Hamilton Rating Scale for Depression (HAM-D) 17-Item total score from baseline to Day 15 (the time of the primary endpoint) of 17.6 points for SAGE-217, compared to 10.7 for placebo (p<0.0001). Statistically significant improvements were observed in the HAM-D compared to placebo by the morning following the first dose through Week 4 and the effects of SAGE-217 remained numerically greater than placebo through the end of follow-up at Week 6. At Day 15, 64 percent of patients who received SAGE-217 achieved remission, defined as a score of 7 or less on the HAM-D scale, compared with 23 percent of patients who received placebo (p=0.0005).
The 89-subject study met its primary endpoint of a statistically significant average reduction in HAM-D score from baseline to day 15 (p<0.0001) versus placebo. HAM-D is a rating scale for depression. At day 15, 64% of patients in the treatment group achieved remission compared to 23% for placebo (p=0.0005).
There were a total of 89 patients recruited into the study who were either given SAGE-217 or a placebo compound. Patients were treated for a 14 day period and were then measured for clinical outcome using the Hamilton Rating Scale for Depression or HAM-D 17-item total score from baseline. It was shown that SAGE-217 achieved a statistically significant improvement over placebo according to the HAM-D scale. Patients that took SAGE-217 were shown to achieve a 17.6 point improvement at day 15, compared to only a 10.7 point improvement for placebo. That meant that the drug achieved a statistically significant p-value of p < 0.0001. It was also noted that 64% of patients who took SAGE-217 achieved MDD remission, compared to only 23% of placebo patients. MDD remission was classified as patients having a HAM-D score of 7 or less. This was the secondary endpoint of the study, which was also achieved.
Investigators saw a statistically significant improvement in SAGE-217 patients on a depression scale the day after the first dose. By the time the two-week treatment period came to an end, the mean score in the SAGE-217 arm had dropped 17.6 points, as compared to a 10.7 point decline in the control group. That seven-point placebo-adjusted improvement was enough for the trial to hit its primary endpoint with a p value of less than 0.0001.
The positive results continued beyond the end of the treatment period. The mean reduction on the depression scale in the treatment arm remained statistically superior to that of the placebo group two weeks after dosing stopped.
Questions Exercise 5
Consider the remission outcome (secondary) at day 15 (after 14 days of dose).
part a. For these time1-time2 dichotomous data (remission or not), explain what I did below to approximate the results reported by SAGE.
part b. In week 4 (time1-time2 data) materials we introduced some more advanced capabilities for time1-time2 dichotomous data, such as mcnemar.test from base R and from package PropCIs. Comment on the applicability of those functions to the remission study and whether those are preferable here to the basic analysis in part a.
> sage2 = matrix(c(29, 10, 16,  34), nr=2)  # remission counts for the two groups
> sage2
     [,1] [,2]
[1,]   29   16
[2,]   10   34
> prop.test(sage2)
        2-sample test for equality of proportions with continuity correction
data:  sage2
X-squared = 14.078, df = 1, p-value = 0.0001754
alternative hypothesis: two.sided
95 percent confidence interval:
 0.2079003 0.6264431
sample estimates:
   prop 1    prop 2 
0.6444444 0.2272727 

> chisq.test(sage2)
        Pearson's Chi-squared test with Yates' continuity correction
data:  sage2
X-squared = 14.078, df = 1, p-value = 0.0001754
part c.  Consider the primary outcome, change in depression score (HAM-D).
In weeks 4 and 5 we conducted analysis of time1- time2 (and multiwave) outcome data for comparisons of experimental groups. For the SAGE study pretend we have long form data, with time coded 0 for baseline and 1 for Day 15 endpoint, and outcome HAM-D score at the timepoints (0,1) and group indicating T/P. So we have 178 rows, and columns HAM-D group time subj.
If we fit the model in R syntax
sagelmer = lmer(outcome ~ time + time:group + (time|subj), data = sage, control = lmerControl(check.nobs.vs.nRE = "warning"),
from the information you have, give the point estimate for the fixed effects, time and time:group .
Write out the level 1, level 2 model corresponding to the combined model in the lmer statement.

Course Problem Set, Item 3.

   Comparison of experimental groups--placebo-controlled, randomized study
Week 5, Exercise 2
      note: problem text reformatted for clarity
Treatment of Lead Exposed Children (TLC) Trial. Data (wide form) and description reside at Laird-Ware text site
Just wide-form data with no column headers
part a.Start out by just using the subset of the longitudinal data: Lead Level Week 0 and Week 6 (i.e. two-wave data).
(i) Carry out a statistical analysis for estimating the relative effectiveness of chelation treatment (succimer) compared with placebo (A,P) using mixed-effects models or repeated measures anova.
(ii) Show the the equivalence from the Brogan-Kutner paper between the simple t-test on improvement(change) and your analysis in part (i).
part b. Finally use all 4 longitudinal measures (weeks 0,1,4,6) for a Active vs Placebo comparison using lmer. Compare with the results in part (a) that use only 2 observations.

Course Problem Set, Item 4.
   Basic Survival analysis, right censoring.    
Week 7 Exercise 2 continued as Week 8 Exercise 2       note: problem here slightly edited (some deletions) from weekly version.

2. Melanoma data. In package ISwR data melanom {ISwR} Survival after malignant melanoma Description: The melanom data frame has 205 rows and 7 columns. It contains data relating to the survival of patients after an operation for malignant melanoma, collected at Odense University Hospital by K.T. Drzewiecki.
 > str(melanom)
'data.frame':   205 obs. of  6 variables:
 $ no    : int  789 13 97 16 21 469 685 7 932 944 ...
 $ status: int  3 3 2 3 1 1 1 1 3 1 ...
 $ days  : int  10 30 35 99 185 204 210 232 232 279 ...
 $ ulc   : int  1 2 2 2 1 1 1 1 1 1 ...
 $ thick : int  676 65 134 290 1208 484 516 1288 322 741 ...
 $ sex   : int  2 2 2 1 2 2 2 2 1 1 ... , 
We are interested in
days: time on study after operation for malignant melanoma
status: the patient's status at the end of study
Documentation shows the possible values of status are: 1: dead from malignant melanoma 2: alive at end of study 3: dead from other causes. Consider 'dead from other causes' as censored (along with alive). Thus you can either recode status as (1,0) or use a logical for status vector to be status == 1 and the survival object is Surv(days, status == 1) (to do some of the problem for you).
a. How many survival times are censored? Obtain an estimate of the survival curve at each event time (along with CI) using the Kaplan-Meier estimate and plot the survival curve and confidence interval.
b. Does survival differ in men and women? Plot the male and female survival curves. Compare asymptotic (log-rank) and exact tests [see note below] for gender differences. Compare the exact test with a bootstrap approximation.
note part b. Exact tests and memory limits.
In Week 7 class examples (aml/leukemia data from miller) and in Week 7 RQ 2 (rat data) we showed the use of an exact test for 2-group survival comparisons. These were relatively small data sets (rat has 40 subjects). The melanoma data is larger-- 205 subjects, not a giant data set. But depending on the size of your machine (such as a laptop) you may well not have enough memory to carry out the exact test. Even with a moderately large machine, the default maximum memory limit set in R may be too low (easy enough to change, but maybe not worth the trouble for this exercise). So if you run into problems conducting the exact test, it is entirely adequate to show the memory failure and then revert to the approximate bootstrap option shown in the class example and in Week 7 RQ2 (that's kind of what it is there for). Try the bootstrap approximation option with 1000 and then 10000 replications and see if the results match.
c. Use Cox regression to carry out the gender comparison of the survival curves in part b. Obtain a confidence interval for the effect of gender on the hazard.

continued with week 8 problem
2. melanom data from week 7, exercise 2. Define the censoring as was done in that problem. I found it useful to make a 0,1 variable isMale from the integer sex designation and make a 0,1 variable isUlcer from the ulceration variable (careful there).
a. Repeat the gender comparison in parts b or c in Ex 2, week 7, stratifying on ulceration of the tumor (or not). Compare with the result in Ex 2 week 7 and interpret.
b. Carry out a Cox regression using predictors log(thick) and the gender indicator, stratifying on ulceration. Interpret the results. Check the viability of the proportional hazards assumption for this Cox model.

Course Problem Set, Item 5.
Week 8 Exercise 6
6.   Home town data. I live on Channing Ave so these data are irresistible.
Channing House Data. The channing data frame has 462 rows and 5 columns.
Channing House is a retirement centre in Palo Alto, California. These data were collected between the opening of the house in 1964 until July 1, 1975. In that time 97 men and 365 women passed through the centre. For each of these, their age on entry and also on leaving or death was recorded. A large number of the observations were censored mainly due to the resident being alive on July 1, 1975 when the data was collected. Over the time of the study 130 women and 46 men died at Channing House. Differences between the survival of the sexes, taking age into account, was one of the primary concerns of this study.
      sex   A factor for the sex of each resident ("Male" or "Female").
      entry The residents age (in months) on entry to the centre
      exit  The age (in months) of the resident on death, leaving the centre or 
                 July 1, 1975   whichever event occurred first.
      time Length of time (in months) that the resident spent at Channing
      cens Indicator of right censoring. 1 indicates that the resident died at Channing
                   House, 0 indicates that they left the house prior to July 1, 1975 or 
                   that they were still alive and living in the centre at that date.
> head(channing)
   sex entry exit time cens
1 Male   782  909  127    1
2 Male  1020 1128  108    1
3 Male   856  969  113    1
4 Male   915  957   42    1
5 Male   863  983  120    1
6 Male   906 1012  106    1
> 780/12 [1] 65 # in years just for reference

#some quick descriptives
> attach(channing)
> table(cens, sex)
cens Female Male
   0    235   51
   1    130   46
> tapply(time, sex, quantile)
$Female                            $Male                       
  0%  25%  50%  75% 100%             0%  25%  50%  75% 100%  
   0   36   87  137  137              0   33   72  120  137  
> tapply(entry, sex, quantile)
$Female                            $Male                   
  0%  25%  50%  75% 100%             0%  25%  50%  75% 100%
 733  851  899  948 1140            751  863  919  967 1073

#Kaplan-Meier fits and tests
> kmfit = survfit(Surv(time, cens) ~ sex, data = channing)
> kmfit
Call: survfit(formula = Surv(time, cens) ~ sex, data = channing)
           records n.max n.start events median 0.95LCL 0.95UCL
sex=Female     365   365     365    130     NA     124      NA
sex=Male        97    97      97     46    108      82      NA
> plot(kmfit,                                         ## SEE PLOT  

> survdiff(Surv(time, cens) ~ sex, data = channing)
Call: survdiff(formula = Surv(time, cens) ~ sex, data = channing)
             N Observed Expected (O-E)^2/E (O-E)^2/V
sex=Female 365      130    143.1      1.19      6.41
sex=Male    97       46     32.9      5.18      6.41
 Chisq= 6.4  on 1 degrees of freedom, p= 0.0113 
> surv_test(Surv(time, cens) ~ sex, data = channing, distribution=approximate(B = 4000))
        Approximative Logrank Test
data:  Surv(time, cens) by sex (Female, Male)
Z = -2.4251, p-value = 0.015
alternative hypothesis: two.sided

#Cox regressions
> coxfit1 = coxph(Surv(time, cens) ~ sex, data = channing)
> coxfit1
Call: coxph(formula = Surv(time, cens) ~ sex, data = channing)
         coef exp(coef) se(coef)    z     p
sexMale 0.432      1.54    0.172 2.52 0.012
Likelihood ratio test=5.89  on 1 df, p=0.0153  n= 462, number of events= 176 
> summary(coxfit1)
Call: coxph(formula = Surv(time, cens) ~ sex, data = channing)
  n= 462, number of events= 176 
          coef exp(coef) se(coef)     z Pr(>|z|)  
sexMale 0.4321    1.5406   0.1718 2.516   0.0119 *
        exp(coef) exp(-coef) lower .95 upper .95
sexMale     1.541     0.6491       1.1     2.157

Concordance= 0.541  (se = 0.016 )
Rsquare= 0.013   (max possible= 0.985 )
Likelihood ratio test= 5.89  on 1 df,   p=0.01526
Wald test            = 6.33  on 1 df,   p=0.01187
Score (logrank) test = 6.43  on 1 df,   p=0.01123

> coxfit2 = coxph(Surv(time, cens) ~ sex + entry, data = channing)
> coxfit2 Call: coxph(formula = Surv(time, cens) ~ sex + entry, data = channing)
           coef exp(coef) se(coef)    z       p
sexMale 0.38004      1.46  0.17191 2.21 2.7e-02
entry   0.00724      1.01  0.00105 6.90 5.4e-12
Likelihood ratio test=50.9  on 2 df, p=8.67e-12  n= 462, number of events= 176 

> summary(coxfit2)
Call: coxph(formula = Surv(time, cens) ~ sex + entry, data = channing)
  n= 462, number of events= 176 
            coef exp(coef) se(coef)     z Pr(>|z|)    
sexMale 0.380045  1.462350 0.171914 2.211   0.0271 *  
entry   0.007239  1.007266 0.001050 6.896 5.36e-12 ***
        exp(coef) exp(-coef) lower .95 upper .95
sexMale     1.462     0.6838     1.044     2.048
entry       1.007     0.9928     1.005     1.009

Concordance= 0.65  (se = 0.023 )
Rsquare= 0.104   (max possible= 0.985 )
Likelihood ratio test= 50.94  on 2 df,   p=8.671e-12
Wald test            = 53.05  on 2 df,   p=3.02e-12
Score (logrank) test = 54.2  on 2 df,   p=1.703e-12

> anova(coxfit1, coxfit2)
Analysis of Deviance Table
 Cox model: response is  Surv(time, cens)
 Model 1: ~ sex
 Model 2: ~ sex + entry
   loglik  Chisq Df P(>|Chi|)    
1 -972.57                        
2 -950.04 45.056  1 1.915e-11 ***

> cox.zph(coxfit2)
            rho chisq     p
sexMale -0.0304 0.162 0.687
entry    0.0481 0.400 0.527
GLOBAL       NA 0.553 0.759
> plot(cox.zph(coxfit2))

> k = 1:462   > channing$ID = k
> head(channing)
   sex entry exit time cens ID
1 Male   782  909  127    1  1
2 Male  1020 1128  108    1  2
3 Male   856  969  113    1  3
4 Male   915  957   42    1  4
5 Male   863  983  120    1  5
6 Male   906 1012  106    1  6

> coxfit4 = coxme(Surv(time, cens) ~ sex + entry + (1|ID), data = channing)
> print(coxfit4)
Cox mixed-effects model fit by maximum likelihood
  Data: channing
  events, n = 176, 462
  Iterations= 100 503 
                    NULL Integrated    Fitted
Log-likelihood -975.5084  -950.0293 -947.0135

                  Chisq   df          p   AIC   BIC
Integrated loglik 50.96 3.00 4.9937e-11 44.96 35.45
 Penalized loglik 56.99 4.97 4.8849e-11 47.05 31.29

Model:  Surv(time, cens) ~ sex + entry + (1 | ID) 
Fixed coefficients
               coef exp(coef)    se(coef)    z       p
sexMale 0.382989266  1.466662 0.173129158 2.21 2.7e-02
entry   0.007286551  1.007313 0.001058856 6.88 5.9e-12

Random effects
 Group Variable  Std Dev    Variance  
 ID    Intercept 0.13172664 0.01735191

> fixed.effects(coxfit4)                         > exp(fixed.effects(coxfit4))   
    sexMale       entry                           sexMale    entry               
0.382989266 0.007286551                          1.466662 1.007313               
> stem(exp(ranef(coxfit4)[[1]]))
  The decimal point is 2 digit(s) to the left of the |
   96 | 34
   96 | 8
   97 | 1234
   97 | 6689
   98 | 01111222244
   98 | 5555566666666677777778888888999999999999
   99 | 00000000000011111111111111112222222222222222223333333333333333333444+4
   99 | 55555555555555555666666666666666666667777777777777777777777778888888+33
  100 | 00000000000000000000000000000000000011222223334444
  100 | 555666677777778888999999999
  101 | 00000000111111111111111222222222222222233333333333333344444444444444
  101 | 5555555555555555556666666666666667777777777777777777777777

> quantile(exp(ranef(coxfit4)[[1]]))
       0%       25%       50%       75%      100% 
0.9633244 0.9928323 0.9986529 1.0106724 1.0174679 
a. Compare median survival times for males and females obtained by ignoring censoring, with median survival times from Kaplan-Meier. Comment on similarities or differences. Why does KM show NA for female median survival time?
b. Are gender differences in survival statistically significant? Can differences in survival be attributed to gender or Channing House?
c. For the Cox regressions, cox2 adds entry age to the model in cox1. Is entry age a significant predictor in this model?
d. Does the proportional hazards assumption appear to be satisfactory for cox2?
e. What is is the point estimate of the proportional increase or decrease in hazard (give direction) for an increase of 1 year in entry age in the cox2 model?
f. Explain what I am doing in the formulation of the cox4 model.
g. Does cox4 appear to improve the fit compared to cox2? Give a rough test statistic.
h. For the cox4 fit, what is the magnitude of the individual effects--i.e. how much do individual effects impact the estimated hazards?

end 2020 Course Problem Set