Stat209/Ed260 D Rogosa   2/10/19

Assignment 5.  The many uses and forms of analysis of covariance


---------------------------------------
1.Background: categorical predictors

A study of several hundred professors' salaries in a large
American university in 1969 (AER, 1973, p.469) yielded the following
prediction equation:  S = 1900 + 230*B + 18*A + 100*E + 490*D + 190*Y
+ 50*T - 2400*X  where S is annual salary, B is number of books
written, A number of ordinary articles, E number of excellent
articles, D number of Ph.D.'s supervised, Y years experience, T = 1
if student evaluations above median, 0 otherwise, X = 1 if female, 0
otherwise.  

For a prof with B=A=E=D=X=1 and Y=5, what's the
expected change in salary if she goes from very good to poor student
evaluations?
Mean salaries were $16,100 for males and $11,200 for females.  
What is the value of the slope from a simple S on X regression?
----------------------------------------

2. Background: standard analysis of covariance.

 A researcher is studying the effect of an incentive on the
retention of subject matter and is also interested in the role of
time devoted to study. Subjects are randomly assigned to two groups,
one receiving (C3 = 1) and the other not receiving (C3 = 0) an
incentive.  Within these groups, subjects are randomly assigned to 5,
10, 15, or 20 minutes of study (C2) of a passage specifically
prepared for the experiment. At the end of the study period, a test
of retention is administered. Treat the study time as a
covariate for investigating the differential effects of the
incentive. full data are in file retention.dat
http://www-stat.stanford.edu/~rag/stat209/retention.dat

----------------------------------------
3. Revisit High School and Beyond (HSB, lab 2) ancova

a. in the week 5 example we used school level (mean, gradient)
outcomes and used school mean ses as a covariate.
Investigate the usefulness of that covariate by comparing
the week 5 ancova with just a simple t-test (sector) on these school
level outcomes. What is the difference in precision (also bias)
between using the covariate or not?

b. Back to Lab 2 and week 4. Compare the mixed effects model not
using school mean ses in the level 2 model (intercept and slope modes)
with the model used in Lab 2 (and class presentations)


-----------------------------------------------------
4. Comparing Regressions

Let's give recognition to the guys who made S (and R)
and take some data from
Venables, W. N. and Ripley, B. D. (1999) Modern Applied 
Statistics with S-PLUS. Third Edition. Springer

(now up to 4th edition). Chap 6 section 1 considers
analysis of the data set whiteside (available as part
of MASS subset of VR package)
to access
> library(MASS) # do need to load library, MASS is part of base R
> data(whiteside)
> ?whiteside


Description

Mr Derek Whiteside of the UK Building Research Station recorded 
the weekly gas consumption and average external temperature at 
his own house in south-east England for two heating seasons, one 
of 26 weeks before, and one of 30 weeks after cavity-wall 
insulation was installed. The object of the exercise was to 
assess the effect of the insulation on gas consumption.

Format
The whiteside data frame has 56 rows and 3 columns.:

Insul A factor, before or after insulation.

Temp Purportedly the average outside temperature in degrees 
Celsius. (These values is far too low for any 56-week period in 
the 1960s in South-East England. It might be the weekly average 
of daily minima.)

Gas The weekly gas consumption in 1000s of cubic feet.
Source

A data set collected in the 1960s by 
Mr Derek Whiteside of the UK Building Research Station.
Reported by
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993) 
A Handbook of Small Data Sets. Chapman & Hall, p. 69.

carry out a comparing regressions analysis with Insul as the group variable,
Gas as outcome, and Temp as within-group predictor.
 construct a 95% confidence interval for the effect of insul on on gas
with temp = 4 (pick-a-point procedure)
 for what values of temp does there appear to be an effect of Insul
on Gas (simultaneous region of significance)

-----------------------------
5. Assignment based on covariate, potential outcomes

HW2, problem 4 (solutions see "aside for week 5") demonstrates for the artificial
data example that even with non-random assignment (on
the covariate) ancova using that covariate will recover
the treatment effect. Revisit and work through that example

---------------------------------------------------
6. Controlled Assignment (class example)

From Rubin, D. B., (1977), 
"Assignment to a Treatment Group on the Basis of a Covariate",
linked on course page, readings week 5

page 16
16  Rubin

              7. A SIMPLE EXAMPLE

Table I presents the raw data from an evaluation of a computer-
aided program designed to teach mathematics to children in fourth 
grade. There were 25 children in Program 1 (the computer-aided 
program) and 47 children in Program 2 (the regular program). All 
children took a Pretest and Posttest, each test consisting of 20 
problems, a child's score being the number of problems correctly 
solved. These data will be used to illustrate the estimation 
methods discussed in Sections 4, 5, and 6. We do not attempt a 
complete statistical analysis nor do we question the assumption 
of no interference between units.

TABLE I

Raw Data for 25 Program 1 Children and 47 Program 2 Children
Pretest              Posttest Scores
Scores  
         Program 1             Program 2
10            15                 6,7
9             16                 7,11,12
8             12                 5,6,9,12
7            8,11,12             6,6,6,6,7,8
6        9,10,11,13,20           5,5,6,6,6,6,6,6,6,8,8,8,9,10
5          5,6,7,16              3,5,5,6,6,7,8
4           5,6,6,12             4,4,4,5,7,11
3           4,7,8,9,12           0,5,7
2             4                   4
1              -                   -
0              -                   7


Does assignment appear to be random or is this appear to be
Assignment on the Basis of Pretest? 
Try to estimate the asignment rule, presuming it is based on pretest
How does this differ from a regression discontinuity design 
(simplest version)?

Assuming that assignment to Program 1 or Program 2 was solely
on the basis of pretest (plus perhaps a probabilistic component)
estimate the effect of program (new vs regular).

note data in table 1 exist in a more convenient form in file hw5rubin.dat
http://www-stat.stanford.edu/~rag/stat209/hw5rubin.dat
and file included in the hw5 solutions
--------------------------------------
7. From lecture materials, Regression Adjustments with Non-equivalent groups
Week 5 Handout 
Show the Belson adjustment procedure (using control group slope)
is equivalent to evaluating the vertical distance between the
within-group regression fits (CNRL) at the mean of the treatment group.
Nowadays economists call that quantity ATT (average treatment effect
on the treated).
---------------------------------
Additional Systematic Assignment (RD) exersizes at
http://rogosateaching.com/somgen290/exs2018.html
Week 8 materials , 
Review questions 1 (sharp design) and 2 (fuzzy)
solutions linked there





END HW5 assignment 2019