Course Problem Set 2015

Due in class Dec 3, 2015 

Usual Honor Code procedures: You may use any of your own inanimate resources--no collaboration or assistance from others. This work is done under Stanford's Honor Code.
Solutions for these problems are to be submitted in hard-copy form. Given that these problems are untimed, some care should be taken in presentation, clarity, format. Especially important is to give full and clear answers to questions, not just to submit unannotated computer output, although relevant output should be included.
PLEASE check that you have answered all the parts and subparts. There are just three problems for this problem set; please start each problem on a new page and keep all material for a problem contiguous. It's fine, for example, to blend notebook paper with printed output, just keep it all together.
Please ask (rag@stanford.edu) about issues of question interpretation, and especially in regard to any materials you feel you need but don't have access to.
Any issues that come up (wording, interpretation) I will post a note here, so it would be good to check this page intermittently.


  
Cumulative Collection of Course Handounts


11/30   Item 2.
      For the Week 5, Exercise 4 the long-form link is all you need. Regarding the link (now) labelled wide(r)-form which I put in for another view of the data. As you don't need to use it, I regret a bit including it, as it is superfluous and may serve as a distraction. That link is better labelled "wider-form" as it puts two occasions in each row for each ID (in a somewhat complex way).

Course Problem Set, Item 1.
Week 2, Exercise 1       
1. Tolerance data
A subsample of data from the National Youth Survey is obtained in long-form by
read.csv("http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1_pp.txt")
and in wide form by
read.csv("http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1.txt")
Yearly observations from ages 11 to 15 on the tolerance measure (tolerance to deviant behavior e.g. cheat, drug, steal, beat; larger values indicates more tolerance on a 1to4 scale). Also in this data set are gender (is_male) and an exposure measure obtained at age 11 (self report of close friends involvement in deviant behaviors). note: in long-form the time measure is age - 11.
a. Obtain individual OLS fits (tolerance over time) and plot the collection of those straight-lines. Provide descriptive statistic summaries for the rate of change in tolerance and initial level.
b. Fit a mixed effects model for tolerance over time (unconditional) for this collection of individuals. Obtain interval estimates for the fixed and random effects. Show that the fixed effects estimates correspond to quantities obtained in part a. Explain.
c. Investigate whether the exposure measure is a useful predictor of level or rate of change in tolerance. What appears to be the best fitting mixed model for these data using these measures? Show specifics.



Course Problem Set, Item 2.
Week 5, Exercise 4       
4. Data on Amenorrhea from Clinical Trial of Contracepting Women. Source: Table 1 (page 168) of Machin et al. (1988). Reference: Machin D, Farley T, Busca B, Campbell M and d'Arcangues C. (1988). Assessing changes in vaginal bleeding patterns in contracepting women. Contraception, 38, 165-179.
Web pages containing these data in long form  and   a wide(r)-form version
also contain descriptions and other useful but non-data notes. Also missing data indicator is "." not NA (SAS based). Best recourse is to modify in your editor; or is.na may work out for you.
Description: The data are from a longitudinal clinical trial of contracepting women. In this trial women received an injection of either 100 mg or 150 mg of depot-medroxyprogesterone acetate (DMPA) on the day of randomization and three additional injections at 90-day intervals. There was a final follow-up visit 90 days after the fourth injection, i.e., one year after the first injection.
Throughout the study each woman completed a menstrual diary that recorded any vaginal bleeding pattern disturbances. The diary data were used to determine whether a women experienced amenorrhea, the absence of menstrual bleeding for a specified number of days. A total of 1151 women completed the menstrual diaries and the diary data were used to generate a binary sequence for each woman according to whether or not she had experienced amenorrhea in the four successive three month intervals.
In clinical trials of modern hormonal contraceptives, pregnancy is exceedingly rare (and would be regarded as a failure of the contraceptive method), and is not the main outcome of interest in this study. Instead, the outcome of interest is a binary response indicating whether a woman experienced amenorrhea in the four successive three month intervals. A feature of this clinical trial is that there was substantial dropout. More than one third of the women dropped out before the completion of the trial. In the linked data, missing data are designated by "."  [note: in the week 6 terminology consider the dropouts to be missing at random, not necessarily a correct assumption.]

The purpose of this analysis is to assess the influence of dosage on the risk of amenorrhea and any individual differences in the risk of amenorrhea.
Show your model for these data and the results. Provide significance tests and/or interval estimates for the odds of amenorrhea as a function of dose. Try to display and interpret individual differences in response by showing the random effects within each experimental group.



Course Problem Set, Item 3.
Week 8, Exercise 1.       
1. Data frame pbc in the survival package: Mayo Clinic Primary Biliary Cirrhosis Data, a randomized placebo controlled trial of the drug Dpenicillamine. Refer to the documentation. As the helpfile tells you: "The first 312 cases in the data set participated in the randomized trial and contain largely complete data. The additional 106 cases [which have value NA for the trt varaible] did not participate in the clinical trial...". So pick out the 312 cases that are the D-penicillmain and placebo groups. see pbc documentation via ?pbc
For these data 'status': 0=censored, 1=liver transplant, 2=death; so status = 2 represents observed values of time; otherwise censored.
a. Use Kaplan-Meier methods to carry out a simple two-group comparison of the effectiveness of the drug, along with any useful plots.
b. Extend the two-group comparison with a Cox regression using additional predictors (chosen as you wish) age edema log(bili) log(protime) log(albumin). Identify a useful model and Interpret results.  Make sure you check the proportional hazards assumptions for your model.


end 2015 Course Problem Set