Stat209/Ed260 D Rogosa   3/3/18

Assignment 8 (part 2).   Matching and propensity score methods

   *******************************************
   cf Lab 4 for aditional matching exercises
   *******************************************
also note first part week 8 HW, randomized block review


----------------------------------

Problem 1. 
Recreate the matching demonstration for Ben Hansen's "gender equity"
example (done in the week 8 class handout, posted not hard copy), an example of
optimal full matching. Only one matching variable.
this is Example 2 in Hansen's talk, about p.48 in the linked pdf
here's the data in cut-and-paste form
> geneq
  Grant gender
1   5.7      W
2   4.0      W
3   3.4      W
4   3.1      W
5   5.5      M
6   5.3      M
7   4.9      M
8   4.9      M
9   3.9      M

---------------------------------------------


Problem 2. Multivariate matching
The example shown in lecture, from anderson et al

Example 6.5 Multivariate caliper matching: Consider a hypothetical 
study comparing two therapies effective in reducing blood 
pressure, where the investigators want to match on three 
variables: previously measured diastolic blood pressure (DPB), age, and 
sex. Such confounding variables can be divided into two types: 
categorical variables, such as sex, for which the investigators 
may insist on a perfect match (e = 0); and numerical variables, 
such as age and blood pressure, which require a specific value of 
the caliper tolerances. Let the blood pressure tolerance be 
specified as 5 mm Hg and the age tolerance as 5 years. The data 
contains measurements of these three confounding variables. (The 
subjects are grouped by sex to make it easier to follow the 
example.) 

Data with columns DBP age sex and Grp (Treatment Group or 
Comparison Reservoir)
http://www-stat.stanford.edu/~rag/stat209/matchex.dat

Table 6.6 Hypothetical Measurements on Confounding Variables   
Treatment Group                               Comparison Reservoir
Subject Diastolic Blood                  Subject  Diastolic Blood
Number  Pressure (mm Hg) Age Sex         Number    Pressure (mm Hg)   Age Sex
1          94             39  F            1              80           35  F
2          108            56  F            2              120          37  F
3          100            50  F            3              85           50  F
4          92             42  F            4              90           41  F
5          65             45  M            5              90           47  F
6          90             37  M            6              90           56  F
                                           7              108          53  F
                                           8              94           46  F
                                           9              78           32  F
                                           10             105          50  F
                                           11             88           43  F
                                           12             100          42  M
                                           13             110          56  M
                                           14             100          46  M
                                           15             100          54  M
                                           16             110          48  M
                                           17             85           60  M
                                           18             90           35  M
                                           19             70           50  M
                                           20             90           49  M

a. show preexisting difference between comparison and treatment,
no matching.

b. try to do a match by hand, finding a best match for each of the
treatment subjects. 

c. use the 3 confounding variables to compute a propensity
score (for membership in the treatment); match subjects on the
propensity scores (i.e. nearest comparison to each treatment subject)
by hand, or use optmatch functions to do optimal matching either 1:1
or 1:2. See which provides better (less bad) balance in the covariates.


---------------------------------------------
Problem 3. Extended Example.
Propensity scores versus regression adjustment, single confounder

Artificial data construction 
1. start with 10000 subjects-- outcome measure Y 
2. subjects belong to groups (G=0,1) based on probabilistic assignment
   on a single unobserved variable X normal mean 10 variance 4
   Prob(G = 1|X) = 1 - (1/(1 + 1/exp(-5 + .5*X)) )
3. Outcome measure Y also highly correlated with X.
   Y = 1.2*G + X + u   (u is Normal, mean 0, variance 1.69)
    treatment effect is built in as 1.2 (about half a sd)
4. Besides Y and G, the observable that is available is a version
   of X obscured by measurement error; let Z be a fallible version
   of X with reliability about .72 (i.e. correlation about .85).

a. compare group differences on Z (preexisting diffs)
b. try out regression adjustment estimate for treatment effect--
   Use observable Z as covariate. Compare with using X as covariate.
c. use Z to compute propensity score for each of 10000 subjects.
   stratify into quintiles on propoensity (as in Rubin Arch Int Med
   from lecture). And compute a treatment/control comparison within
   each of the 5 propensity strata. Also get overall comparison from
   main effect in the 2x5 anova.
d. repeat part c using the unobservable X. Does X give better results.
e. which works better in the 1-dimensional case, propensity matching or
   regression adjustment?

---------------------------------------------------------
Problem 4
Cochran subclassificating for confounding varaible

Week 8 class example, age as a confounder on effects of (cig) smoking.

Use lalonde data (Lab4) as play data to show a simple implementation
of subclassification adjustment with re78 as outcome, treatment group comparsion,
and just consider age as the confounder.

> library(MatchIt)
> data(lalonde)
> head(lalonde)
     treat age educ black hispan married nodegree re74 re75       re78
NSW1     1  37   11     1      0       1        1    0    0  9930.0460
NSW2     1  22    9     0      1       0        1    0    0  3595.8940
NSW3     1  30   12     1      0       0        0    0    0 24909.4500
NSW4     1  27   11     1      0       0        1    0    0  7506.1460
NSW5     1  33    8     1      0       0        1    0    0   289.7899
NSW6     1  22    9     1      0       0        1    0    0  4056.4940
> attach(lalonde)
> table(treat) # 185 in job training
treat
  0   1 
429 185

> tapply(re78, treat, mean) # oh my, seems better off with no job training, can the Republicans be right?
       0        1 
6984.170 6349.144 

> tapply(age, treat, mean) # there is an mean age diff
       0        1 
28.03030 25.81622 
> tapply(age, treat, fivenum) # same medians,but some controls older
$`0`
[1] 16 19 25 35 55

$`1`
[1] 17 20 25 29 48

==========================
Lab 4 has extended matching, propensity score activities
===========================
end homework 8 2018