Stat209/Ed260 D Rogosa   2/24/18

Solutions Assignment 7.
Compliance and experimental protocols; intent to treat

Problem 1
Non-compliance.

a. ITT estimate is 639 - 380 (259 excess deaths per 100000 from no Vitamin A)
or reverse it
b. proportion in treatment complying is .8. Control group compliance
was perfect (as stated)
c. IV estimate is ITT/.8, which has the interpretation of
CACE, complier average causal effect,  324 deaths per 100000


---------------------------


Problem 2

> # problem 2
> # mean compliance in JHU PIRC study is reported as .479
> # ITT estimate is difference in group means in change in anti-social behavior
> # ITT =.364
> # CACE estimate is ITT/.479 = .76 ; usual compliance adjustment to ITT
slides for the Booil Jo examples in class handout or linked on the class page at
http://www-stat.stanford.edu/~rag/stat209/jorogosa06.pdf

part 2
shown in
Don Rubin For Objective Causal Inference, Design Trumps Analysis  posted at
http://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/trumps.pdf


====================

Problem 3
Artificial data in the image of Efron-Feldman
# cholesterol reduction outcome measure

> library(AER)
> efdat = read.table(file = "http://web.stanford.edu/~rag/stat209/hw7efdata", header = T)
> attach(efdat)
> head(efdat)
   comp G     Y
1 0.528 0 -9.57
2 0.862 1 58.50
3 0.980 0 10.10
4 0.673 1 33.90
5 0.660 0  3.60
6 0.551 1 29.10
> table(G)
G
  0   1 
150 150 
###### got data

> t.test(Y ~ G)  # ITT estimate about 20, (not using compliance info)
# approximating E-F paper analysis

        Welch Two Sample t-test

data:  Y by G
t = -14.9477, df = 275.264, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -22.52955 -17.28584
sample estimates:
mean in group 1 mean in group 2 
       9.663234       29.570932 

# try an IV with compliance as a measured variable 
# and random assignment as instrument (analog to CACE)
# Doesn't seem to work well, but see below

> caceIV = ivreg(Y ~ comp|G)
> confint(caceIV)
                2.5 %   97.5 %
(Intercept) -4262.740 2386.813
comp        -3991.215 7220.521  # pretty wide
> summary(caceIV)

Call:
ivreg(formula = Y ~ comp | G)

Residuals:
    Min      1Q  Median      3Q     Max 
-634.86 -202.04   22.26  196.10  840.55 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)     -938       1696  -0.553    0.581
comp            1615       2860   0.565    0.573

Residual standard error: 305.4 on 298 degrees of freedom
Multiple R-Squared: -399.6,     Adjusted R-squared: -400.9 
Wald test: 0.3187 on 1 and 298 DF,  p-value: 0.5728 

# but as you probably presumed that's not really the right analysis
# the research question is, what is the effect of the drug?
# the correct IV picture is as displayed in the Greenland handout
# in the treatment group dose of the drug is confounded with compliance
# in the control group dose of the drug is 0 (regardless of compliance level)
# so the correct analog to AIR is
> caceIV2 = ivreg(Y ~ I(comp*G)|G)
> summary(caceIV2)

Call:
ivreg(formula = Y ~ I(comp * G) | G)

Residuals:
     Min       1Q   Median       3Q      Max 
-28.5645  -6.1078  -0.2812   6.5362  29.6355 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.6645     0.8059   11.99   <2e-16 ***
I(comp * G)  33.2189     1.9021   17.46   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.871 on 298 degrees of freedom
Multiple R-Squared: 0.5814,     Adjusted R-squared:  0.58 
Wald test:   305 on 1 and 298 DF,  p-value: < 2.2e-16 

> # not far from the E-F paper estimate of 34.5 for ("CACE") effect (cholesterol reduction) for perfect compliers
(these data are an imitation not exact recreation of E-F)
> confint(caceIV2)
                2.5 %   97.5 %
(Intercept)  8.084849 11.24406
I(comp * G) 29.490842 36.94686
> # nice, tight interval

################################################
# Now go back to conventional Rubin approach with compliance as 0,1

> efdat$compT = efdat$comp > .8  # comp indicator
# calculate simpler form of CACE with traditional .8 cutpoint, 
# yields with low compliance, 15%, maybe not the best choice of cut
> mean(efdat$compT) 
[1] 0.15

> 20/.15  # divide ITT by proportion defined as compliant
[1] 133.3333    
> # wow

> caceIVT = ivreg(Y ~ compT|G, data = efdat)
> summary(caceIVT)
Call:
ivreg(formula = Y ~ compT | G, data = efdat)

Residuals:
   Min     1Q Median     3Q    Max 
-872.8  131.8  143.8  155.9  185.8 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -129.7      306.5  -0.423    0.673
compTTRUE      995.3     2039.0   0.488    0.626  # good bit diff than ITT of 20, CACE above is 133

Residual standard error: 353.2 on 298 degrees of freedom
Multiple R-Squared: -534.8,     Adjusted R-squared: -536.6 
Wald test: 0.2383 on 1 and 298 DF,  p-value: 0.6258 

 
> confint(caceIVT)
                 2.5 %    97.5 %
(Intercept)  -730.4593  471.1126
compTTRUE   -3001.0986 4991.6386       # here shows CACE can be anywhere for IV version

# but as in above the right IV analysis uses dose (which equals compT in treat, 0 in control)
> efdat$compT = efdat$comp > .8  # comp indicator
> attach(efdat)
> caceIV2 = ivreg(Y ~ I(compT*G)|G) # pretending dose is (0,1)

> summary(caceIV2)  # gives about usual cace, wayoff from E-F effect for perfect compliance

Call:
ivreg(formula = Y ~ I(compT * G) | G)

Residuals:
     Min       1Q   Median       3Q      Max 
-114.173   -4.502    5.436   16.561   46.436 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.664      2.447   3.950 9.78e-05 ***
I(compT * G)  124.409     21.628   5.752 2.18e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 29.97 on 298 degrees of freedom
Multiple R-Squared: -2.858,     Adjusted R-squared: -2.871 
Wald test: 33.09 on 1 and 298 DF,  p-value: 2.184e-08 

> confint(caceIV2)
                 2.5 %    97.5 %
(Intercept)   4.868521  14.46039
I(compT * G) 82.018297 166.79920
> 

> cor(efdat$G, efdat$compT)
[1] 0.0280056

# G a pretty weak instrument here
> cor.test(efdat$G, as.numeric(efdat$compT))

        Pearson's product-moment correlation

data:  efdat$G and as.numeric(efdat$compT)
t = 0.4836, df = 298, p-value = 0.629
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.08550641  0.14079991
sample estimates:
      cor 
0.0280056 

> cor.test(efdat$G, efdat$comp)

        Pearson's product-moment correlation

data:  efdat$G and efdat$comp
t = 0.5565, df = 298, p-value = 0.5783
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.08131855  0.14493103
sample estimates:
       cor 
0.03221898 




----------------------------------
end solutions HW7 2018