Stat209/Ed260 D Rogosa 2/15/18
Assignment 5. The many uses and forms of analysis of covariance
---------------------------------------
1.Background: categorical predictors
A study of several hundred professors' salaries in a large
American university in 1969 (AER, 1973, p.469) yielded the following
prediction equation: S = 1900 + 230*B + 18*A + 100*E + 490*D + 190*Y
+ 50*T - 2400*X where S is annual salary, B is number of books
written, A number of ordinary articles, E number of excellent
articles, D number of Ph.D.'s supervised, Y years experience, T = 1
if student evaluations above median, 0 otherwise, X = 1 if female, 0
otherwise.
For a prof with B=A=E=D=X=1 and Y=5, what's the
expected change in salary if she goes from very good to poor student
evaluations?
Mean salaries were $16,100 for males and $11,200 for females.
What is the value of the slope from a simple S on X regression?
----------------------------------------
2. Background: standard analysis of covariance.
A researcher is studying the effect of an incentive on the
retention of subject matter and is also interested in the role of
time devoted to study. Subjects are randomly assigned to two groups,
one receiving (C3 = 1) and the other not receiving (C3 = 0) an
incentive. Within these groups, subjects are randomly assigned to 5,
10, 15, or 20 minutes of study (C2) of a passage specifically
prepared for the experiment. At the end of the study period, a test
of retention is administered. Treat the study time as a
covariate for investigating the differential effects of the
incentive. full data are in file retention.dat
http://www-stat.stanford.edu/~rag/stat209/retention.dat
----------------------------------------
3. Revisit High School and Beyond (HSB, lab 2) ancova
a. in the week 5 example we used school level (mean, gradient)
outcomes and used school mean ses as a covariate.
Investigate the usefulness of that covariate by comparing
the week 5 ancova with just a simple t-test (sector) on these school
level outcomes. What is the difference in precision (also bias)
between using the covariate or not?
b. Back to Lab 2 and week 4. Compare the mixed effects model not
using school mean ses in the level 2 model (intercept and slope modes)
with the model used in Lab 2 (and class presentations)
-----------------------------------------------------
4. Comparing Regressions
Let's give recognition to the guys who made S (and R)
and take some data from
Venables, W. N. and Ripley, B. D. (1999) Modern Applied
Statistics with S-PLUS. Third Edition. Springer
(now up to 4th edition). Chap 6 section 1 considers
analysis of the data set whiteside (available as part
of MASS subset of VR package)
to access
> library(MASS) # do need to load library, MASS is part of base R
> data(whiteside)
> ?whiteside
Description
Mr Derek Whiteside of the UK Building Research Station recorded
the weekly gas consumption and average external temperature at
his own house in south-east England for two heating seasons, one
of 26 weeks before, and one of 30 weeks after cavity-wall
insulation was installed. The object of the exercise was to
assess the effect of the insulation on gas consumption.
Format
The whiteside data frame has 56 rows and 3 columns.:
Insul A factor, before or after insulation.
Temp Purportedly the average outside temperature in degrees
Celsius. (These values is far too low for any 56-week period in
the 1960s in South-East England. It might be the weekly average
of daily minima.)
Gas The weekly gas consumption in 1000s of cubic feet.
Source
A data set collected in the 1960s by
Mr Derek Whiteside of the UK Building Research Station.
Reported by
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993)
A Handbook of Small Data Sets. Chapman & Hall, p. 69.
carry out a comparing regressions analysis with Insul as the group variable,
Gas as outcome, and Temp as within-group predictor.
construct a 95% confidence interval for the effect of insul on on gas
with temp = 4 (pick-a-point procedure)
for what values of temp does there appear to be an effect of Insul
on Gas (simultaneous region of significance)
-----------------------------
5. Assignment based on covariate, potential outcomes
HW2, problem 4 (solutions see "aside for week 5") demonstrates for the artificial
data example that even with non-random assignment (on
the covariate) ancova using that covariate will recover
the treatment effect. Revisit and work through that example
---------------------------------------------------
6. Controlled Assignment (class example)
From Rubin, D. B., (1977),
"Assignment to a Treatment Group on the Basis of a Covariate",
linked on course page, readings week 5
page 16
16 Rubin
7. A SIMPLE EXAMPLE
Table I presents the raw data from an evaluation of a computer-
aided program designed to teach mathematics to children in fourth
grade. There were 25 children in Program 1 (the computer-aided
program) and 47 children in Program 2 (the regular program). All
children took a Pretest and Posttest, each test consisting of 20
problems, a child's score being the number of problems correctly
solved. These data will be used to illustrate the estimation
methods discussed in Sections 4, 5, and 6. We do not attempt a
complete statistical analysis nor do we question the assumption
of no interference between units.
TABLE I
Raw Data for 25 Program 1 Children and 47 Program 2 Children
Pretest Posttest Scores
Scores
Program 1 Program 2
10 15 6,7
9 16 7,11,12
8 12 5,6,9,12
7 8,11,12 6,6,6,6,7,8
6 9,10,11,13,20 5,5,6,6,6,6,6,6,6,8,8,8,9,10
5 5,6,7,16 3,5,5,6,6,7,8
4 5,6,6,12 4,4,4,5,7,11
3 4,7,8,9,12 0,5,7
2 4 4
1 - -
0 - 7
Does assignment appear to be random or is this appear to be
Assignment on the Basis of Pretest?
Try to estimate the asignment rule, presuming it is based on pretest
How does this differ from a regression discontinuity design
(simplest version)?
Assuming that assignment to Program 1 or Program 2 was solely
on the basis of pretest (plus perhaps a probabilistic component)
estimate the effect of program (new vs regular).
note data in table 1 exist in a more convenient form in file hw5rubin.dat
http://www-stat.stanford.edu/~rag/stat209/hw5rubin.dat
and file included in the hw5 solutions
--------------------------------------
7. From lecture materials, Regression Adjustments with Non-equivalent groups
Week 5 Handout
Show the Belson adjustment procedure (using control group slope)
is equivalent to evaluating the vertical distance between the
within-group regression fits (CNRL) at the mean of the treatment group.
Nowadays economists call that quantity ATT (average treatment effect
on the treated).
---------------------------------
END HW5 assignment 2018