Week 9 RQ1 (first part) Solutions Stat266
Artificial data in the image of Efron-Feldman
# cholesterol reduction outcome measure
> library(AER)
> efdat = read.table(file = "http://web.stanford.edu/~rag/stat209/hw7efdata", header = T)
> attach(efdat)
> head(efdat)
comp G Y
1 0.528 0 -9.57
2 0.862 1 58.50
3 0.980 0 10.10
4 0.673 1 33.90
5 0.660 0 3.60
6 0.551 1 29.10
> table(G)
G
0 1
150 150
###### got data
> t.test(Y ~ G) # ITT estimate about 20, (not using compliance info)
# approximating E-F paper analysis
Welch Two Sample t-test
data: Y by G
t = -14.9477, df = 275.264, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-22.52955 -17.28584
sample estimates:
mean in group 1 mean in group 2
9.663234 29.570932
# try an IV with compliance as a measured variable
# and random assignment as instrument (analog to CACE)
# Doesn't seem to work well, but see below
> caceIV = ivreg(Y ~ comp|G)
> confint(caceIV)
2.5 % 97.5 %
(Intercept) -4262.740 2386.813
comp -3991.215 7220.521 # pretty wide
> summary(caceIV)
Call:
ivreg(formula = Y ~ comp | G)
Residuals:
Min 1Q Median 3Q Max
-634.86 -202.04 22.26 196.10 840.55
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -938 1696 -0.553 0.581
comp 1615 2860 0.565 0.573
Residual standard error: 305.4 on 298 degrees of freedom
Multiple R-Squared: -399.6, Adjusted R-squared: -400.9
Wald test: 0.3187 on 1 and 298 DF, p-value: 0.5728
# but as you probably presumed that's not really the right analysis
# the research question is, what is the effect of the drug?
# the correct IV picture is as displayed in the Greenland handout
# in the treatment group dose of the drug is confounded with compliance
# in the control group dose of the drug is 0 (regardless of compliance level)
# so the correct analog to AIR is
> caceIV2 = ivreg(Y ~ I(comp*G)|G)
> summary(caceIV2)
Call:
ivreg(formula = Y ~ I(comp * G) | G)
Residuals:
Min 1Q Median 3Q Max
-28.5645 -6.1078 -0.2812 6.5362 29.6355
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6645 0.8059 11.99 <2e-16 ***
I(comp * G) 33.2189 1.9021 17.46 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.871 on 298 degrees of freedom
Multiple R-Squared: 0.5814, Adjusted R-squared: 0.58
Wald test: 305 on 1 and 298 DF, p-value: < 2.2e-16
> # not far from the E-F paper estimate of 34.5 for ("CACE") effect (cholesterol reduction) for perfect compliers
(these data are an imitation not exact recreation of E-F)
> confint(caceIV2)
2.5 % 97.5 %
(Intercept) 8.084849 11.24406
I(comp * G) 29.490842 36.94686
> # nice, tight interval
################################################
# Now go back to conventional Rubin approach with compliance as 0,1
> efdat$compT = efdat$comp > .8 # comp indicator
# calculate simpler form of CACE with traditional .8 cutpoint,
# yields with low compliance, 15%, maybe not the best choice of cut
> mean(efdat$compT)
[1] 0.15
> 20/.15 # divide ITT by proportion defined as compliant
[1] 133.3333
> # wow
> caceIVT = ivreg(Y ~ compT|G, data = efdat)
> summary(caceIVT)
Call:
ivreg(formula = Y ~ compT | G, data = efdat)
Residuals:
Min 1Q Median 3Q Max
-872.8 131.8 143.8 155.9 185.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -129.7 306.5 -0.423 0.673
compTTRUE 995.3 2039.0 0.488 0.626 # good bit diff than ITT of 20, CACE above is 133
Residual standard error: 353.2 on 298 degrees of freedom
Multiple R-Squared: -534.8, Adjusted R-squared: -536.6
Wald test: 0.2383 on 1 and 298 DF, p-value: 0.6258
> confint(caceIVT)
2.5 % 97.5 %
(Intercept) -730.4593 471.1126
compTTRUE -3001.0986 4991.6386 # here shows CACE can be anywhere for IV version
# but as in above the right IV analysis uses dose (which equals compT in treat, 0 in control)
> efdat$compT = efdat$comp > .8 # comp indicator
> attach(efdat)
> caceIV2 = ivreg(Y ~ I(compT*G)|G) # pretending dose is (0,1)
> summary(caceIV2) # gives about usual case, wayoff from E-F effect for perfect compliance
Call:
ivreg(formula = Y ~ I(compT * G) | G)
Residuals:
Min 1Q Median 3Q Max
-114.173 -4.502 5.436 16.561 46.436
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.664 2.447 3.950 9.78e-05 ***
I(compT * G) 124.409 21.628 5.752 2.18e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 29.97 on 298 degrees of freedom
Multiple R-Squared: -2.858, Adjusted R-squared: -2.871
Wald test: 33.09 on 1 and 298 DF, p-value: 2.184e-08
> confint(caceIV2)
2.5 % 97.5 %
(Intercept) 4.868521 14.46039
I(compT * G) 82.018297 166.79920
>
> cor(efdat$G, efdat$compT)
[1] 0.0280056
# G a pretty weak instrument here
> cor.test(efdat$G, as.numeric(efdat$compT))
Pearson's product-moment correlation
data: efdat$G and as.numeric(efdat$compT)
t = 0.4836, df = 298, p-value = 0.629
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.08550641 0.14079991
sample estimates:
cor
0.0280056
> cor.test(efdat$G, efdat$comp)
Pearson's product-moment correlation
data: efdat$G and efdat$comp
t = 0.5565, df = 298, p-value = 0.5783
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.08131855 0.14493103
sample estimates:
cor
0.03221898