Usual Honor Code procedures: You may use any of your own inanimate resources--no collaboration or assistance from others. This work is done under Stanford's Honor Code.

Solutions for these problems are to be submitted in hard-copy form. Given that these problems are untimed, some care should be taken in presentation, clarity, format. Especially important is to give full and clear answers to questions, not just to submit unannotated computer output, although relevant output should be included.

PLEASE check that you have answered all the parts and subparts. There are just three problems for this problem set; please start each problem on a new page and keep all material for a problem contiguous. It's fine, for example, to blend notebook paper with printed output, just keep it all together.

Any issues that come up (wording, interpretation) I will post a note here, so it would be good to check this page intermittently.

Sidenote. For those of you enrolled for 3 units we need to set up presentations. I will ask on Thurs about the date 12/4 3:30 onward.

Course Problem Set, Item 1.

Change over time, measured outcome

A small subsample of data (16 respondents) from the National Youth Survey is obtained in long-form by

and in wide form by

Yearly observations from ages 11 to 15 on the tolerance measure (tolerance to deviant behavior e.g. cheat, drug, steal, beat; larger values indicates more tolerance on a 1to4 scale). Also in this data set are gender (is_male) and an

i. obtain individual OLS fits (tolerance over time) and plot the collection of those straight-lines. Provide descriptive statistic summaries for the rate of change in tolerance and initial level.

ii. fit a mixed effects model for tolerance over time (unconditional) for this collection of individuals. Obtain interval estimates for the fixed and random effects. Show that the fixed effects estimates correspond to quantities obtained in part i. Explain.

iii. Investigate whether the

Course Problem Set, Item 2.

Comparison of experimental groups--placebo-controlled, randomized study

note: problem text reformatted for clarity

Treatment of Lead Exposed Children (TLC) Trial. Data (wide form) and description reside at Laird-Ware text site

Just wide-form data with no column headers

(i) Carry out a statistical analysis for estimating the relative effectiveness of chelation treatment (succimer) compared with placebo (A,P) using mixed-effects models or repeated measures anova.

(ii) Show the the equivalence from the Brogan-Kutner paper between the simple t-test on improvement(change) and your analysis in part (i).

Course Problem Set, Item 3.

Basic Survival analysis, right censoring.

> str(melanom) 'data.frame': 205 obs. of 6 variables: $ no : int 789 13 97 16 21 469 685 7 932 944 ... $ status: int 3 3 2 3 1 1 1 1 3 1 ... $ days : int 10 30 35 99 185 204 210 232 232 279 ... $ ulc : int 1 2 2 2 1 1 1 1 1 1 ... $ thick : int 676 65 134 290 1208 484 516 1288 322 741 ... $ sex : int 2 2 2 1 2 2 2 2 1 1 ... ,We are interested in

days: time on study after operation for malignant melanoma

status: the patient's status at the end of study

Documentation shows the possible values of status are: 1: dead from malignant melanoma 2: alive at end of study 3: dead from other causes. Consider 'dead from other causes' as censored (along with alive). Thus you can either recode status as (1,0) or use a logical for status vector to be status == 1 and the survival object is

a. How many survival times are censored? Obtain an estimate of the survival curve at each event time (along with CI) using the Kaplan-Meier estimate and plot the survival curve and confidence interval.

b. Does survival differ in men and women? Plot the male and female survival curves. Compare asymptotic (log-rank) and exact tests [see note below] for gender differences. Compare the exact test with a bootstrap approximation.

--------------------------------

In Week 7 class examples (aml/leukemia data from miller) and in Week 7 RQ 2 (rat data) we showed the use of an exact test for 2-group survival comparisons. These were relatively small data sets (rat has 40 subjects). The melanoma data is larger-- 205 subjects, not a giant data set. But depending on the size of your machine (such as a laptop) you may well not have enough memory to carry out the exact test. Even with a moderately large machine, the default maximum memory limit set in R may be too low (easy enough to change, but maybe not worth the trouble for this exercise). So if you run into problems conducting the exact test, it is entirely adequate to show the memory failure and then revert to the approximate bootstrap option shown in the class example and in Week 7 RQ2 (that's kind of what it is there for). Try the bootstrap approximation option with 1000 and then 10000 replications and see if the results match.

------------------------------

c. Use Cox regression to carry out the gender comparison of the survival curves in part b. Obtain a confidence interval for the effect of gender on the hazard.

a. Repeat the gender comparison in parts b or c in Ex 2, week 7, stratifying on ulceration of the tumor (or not). Compare with the result in Ex 2 week 7 and interpret.

b. Carry out a Cox regression using predictors log(thick) and the gender indicator, stratifying on ulceration. Interpret the results. Check the viability of the proportional hazards assumption for this Cox model.