Usual Honor Code procedures: You may use any of your own inanimate resources--no collaboration or assistance from others. This work is done under Stanford's Honor Code.

Solutions for these problems are to be submitted in hard-copy form. Given that these problems are untimed, some care should be taken in presentation, clarity, format. Especially important is to give full and clear answers to questions, not just to submit unannotated computer output, although relevant output should be included.

PLEASE check that you have answered all the parts and subparts. There are just three problems for this problem set; please start each problem on a new page and keep all material for a problem contiguous. It's fine, for example, to blend notebook paper with printed output, just keep it all together.

Any issues that come up (wording, interpretation) I will post a note here, so it would be good to check this page intermittently.

For the Week 5, Exercise 4 the long-form link is all you need. Regarding the link (now) labelled wide(r)-form which I put in for another view of the data. As you don't

Course Problem Set, Item 1.

A subsample of data from the National Youth Survey is obtained in long-form by

and in wide form by

Yearly observations from ages 11 to 15 on the tolerance measure (tolerance to deviant behavior e.g. cheat, drug, steal, beat; larger values indicates more tolerance on a 1to4 scale). Also in this data set are gender (is_male) and an

a. Obtain individual OLS fits (tolerance over time) and plot the collection of those straight-lines. Provide descriptive statistic summaries for the rate of change in tolerance and initial level.

b. Fit a mixed effects model for tolerance over time (unconditional) for this collection of individuals. Obtain interval estimates for the fixed and random effects. Show that the fixed effects estimates correspond to quantities obtained in part a. Explain.

c. Investigate whether the

Course Problem Set, Item 2.

Web pages containing these data in long form and a wide(r)-form version

also contain descriptions and other useful but non-data notes. Also missing data indicator is "." not NA (SAS based). Best recourse is to modify in your editor; or

Description: The data are from a longitudinal clinical trial of contracepting women. In this trial women received an injection of either 100 mg or 150 mg of depot-medroxyprogesterone acetate (DMPA) on the day of randomization and three additional injections at 90-day intervals. There was a final follow-up visit 90 days after the fourth injection, i.e., one year after the first injection.

Throughout the study each woman completed a menstrual diary that recorded any vaginal bleeding pattern disturbances. The diary data were used to determine whether a women experienced amenorrhea, the absence of menstrual bleeding for a specified number of days. A total of 1151 women completed the menstrual diaries and the diary data were used to generate a binary sequence for each woman according to whether or not she had experienced amenorrhea in the four successive three month intervals.

In clinical trials of modern hormonal contraceptives, pregnancy is exceedingly rare (and would be regarded as a failure of the contraceptive method), and is not the main outcome of interest in this study. Instead, the outcome of interest is a binary response indicating whether a woman experienced amenorrhea in the four successive three month intervals. A feature of this clinical trial is that there was substantial dropout. More than one third of the women dropped out before the completion of the trial. In the linked data, missing data are designated by "." [note: in the week 6 terminology consider the dropouts to be

The purpose of this analysis is to assess the influence of dosage on the risk of amenorrhea and any individual differences in the risk of amenorrhea.

Show your model for these data and the results. Provide significance tests and/or interval estimates for the odds of amenorrhea as a function of dose. Try to display and interpret individual differences in response by showing the random effects within each experimental group.

Course Problem Set, Item 3.

For these data 'status': 0=censored, 1=liver transplant, 2=death; so status = 2 represents observed values of

a. Use Kaplan-Meier methods to carry out a simple two-group comparison of the effectiveness of the drug, along with any useful plots.

b. Extend the two-group comparison with a Cox regression using additional predictors (chosen as you wish)