Stat209/Ed260 D Rogosa 3/4/19
Assignment 8 (part 2). Matching and propensity score methods
*******************************************
cf Lab 4 for aditional matching exercises
*******************************************
also note first part week 8 HW, randomized block review
----------------------------------
Problem 1.
Recreate the matching demonstration for Ben Hansen's "gender equity"
example (done in the week 8 class handout, posted not hard copy), an example of
optimal full matching. Only one matching variable.
this is Example 2 in Hansen's talk, about p.48 in the linked pdf
here's the data in cut-and-paste form
> geneq
Grant gender
1 5.7 W
2 4.0 W
3 3.4 W
4 3.1 W
5 5.5 M
6 5.3 M
7 4.9 M
8 4.9 M
9 3.9 M
---------------------------------------------
Problem 2. Multivariate matching
The example shown in lecture, from anderson et al
Example 6.5 Multivariate caliper matching: Consider a hypothetical
study comparing two therapies effective in reducing blood
pressure, where the investigators want to match on three
variables: previously measured diastolic blood pressure (DPB), age, and
sex. Such confounding variables can be divided into two types:
categorical variables, such as sex, for which the investigators
may insist on a perfect match (e = 0); and numerical variables,
such as age and blood pressure, which require a specific value of
the caliper tolerances. Let the blood pressure tolerance be
specified as 5 mm Hg and the age tolerance as 5 years. The data
contains measurements of these three confounding variables. (The
subjects are grouped by sex to make it easier to follow the
example.)
Data with columns DBP age sex and Grp (Treatment Group or
Comparison Reservoir)
http://www-stat.stanford.edu/~rag/stat209/matchex.dat
Table 6.6 Hypothetical Measurements on Confounding Variables
Treatment Group Comparison Reservoir
Subject Diastolic Blood Subject Diastolic Blood
Number Pressure (mm Hg) Age Sex Number Pressure (mm Hg) Age Sex
1 94 39 F 1 80 35 F
2 108 56 F 2 120 37 F
3 100 50 F 3 85 50 F
4 92 42 F 4 90 41 F
5 65 45 M 5 90 47 F
6 90 37 M 6 90 56 F
7 108 53 F
8 94 46 F
9 78 32 F
10 105 50 F
11 88 43 F
12 100 42 M
13 110 56 M
14 100 46 M
15 100 54 M
16 110 48 M
17 85 60 M
18 90 35 M
19 70 50 M
20 90 49 M
a. show preexisting difference between comparison and treatment,
no matching.
b. try to do a match by hand, finding a best match for each of the
treatment subjects.
c. use the 3 confounding variables to compute a propensity
score (for membership in the treatment); match subjects on the
propensity scores (i.e. nearest comparison to each treatment subject)
by hand, or use optmatch functions to do optimal matching either 1:1
or 1:2. See which provides better (less bad) balance in the covariates.
---------------------------------------------
Problem 3. Extended Example.
Propensity scores versus regression adjustment, single confounder
Artificial data construction
1. start with 10000 subjects-- outcome measure Y
2. subjects belong to groups (G=0,1) based on probabilistic assignment
on a single unobserved variable X normal mean 10 variance 4
Prob(G = 1|X) = 1 - (1/(1 + 1/exp(-5 + .5*X)) )
3. Outcome measure Y also highly correlated with X.
Y = 1.2*G + X + u (u is Normal, mean 0, variance 1.69)
treatment effect is built in as 1.2 (about half a sd)
4. Besides Y and G, the observable that is available is a version
of X obscured by measurement error; let Z be a fallible version
of X with reliability about .72 (i.e. correlation about .85).
a. compare group differences on Z (preexisting diffs)
b. try out regression adjustment estimate for treatment effect--
Use observable Z as covariate. Compare with using X as covariate.
c. use Z to compute propensity score for each of 10000 subjects.
stratify into quintiles on propoensity (as in Rubin Arch Int Med
from lecture). And compute a treatment/control comparison within
each of the 5 propensity strata. Also get overall comparison from
main effect in the 2x5 anova.
d. repeat part c using the unobservable X. Does X give better results.
e. which works better in the 1-dimensional case, propensity matching or
regression adjustment?
---------------------------------------------------------
Problem 4
Cochran subclassificating for confounding varaible
Week 8 class example, age as a confounder on effects of (cig) smoking.
Use lalonde data (Lab4) as play data to show a simple implementation
of subclassification adjustment with re78 as outcome, treatment group comparsion,
and just consider age as the confounder.
> library(MatchIt)
> data(lalonde)
> head(lalonde)
treat age educ black hispan married nodegree re74 re75 re78
NSW1 1 37 11 1 0 1 1 0 0 9930.0460
NSW2 1 22 9 0 1 0 1 0 0 3595.8940
NSW3 1 30 12 1 0 0 0 0 0 24909.4500
NSW4 1 27 11 1 0 0 1 0 0 7506.1460
NSW5 1 33 8 1 0 0 1 0 0 289.7899
NSW6 1 22 9 1 0 0 1 0 0 4056.4940
> attach(lalonde)
> table(treat) # 185 in job training
treat
0 1
429 185
> tapply(re78, treat, mean) # oh my, seems better off with no job training, can the Republicans be right?
0 1
6984.170 6349.144
> tapply(age, treat, mean) # there is an mean age diff
0 1
28.03030 25.81622
> tapply(age, treat, fivenum) # same medians,but some controls older
$`0`
[1] 16 19 25 35 55
$`1`
[1] 17 20 25 29 48
==========================
Lab 4 has extended matching, propensity score activities
===========================
end homework 8 2019