###### Week 7, RQ1 Compliance example

> library(AER) # current popular ivreg function
> install.packages("ivmodel") # new package from H Kang, CC_7
> library(ivmodel)


>  efdat = read.table(file = "http://web.stanford.edu/~rag/stat209/hw7efdata", header = T)
>  attach(efdat)
>  head(efdat)
   comp G     Y
1 0.528 0 -9.57
2 0.862 1 58.50
3 0.980 0 10.10
4 0.673 1 33.90
5 0.660 0  3.60
6 0.551 1 29.10
> ## compliance data (measured comp) outcome Y, group G
> # see also week 6 RQ3 description

> ## start by using ivreg in AER (also week6RQ3)
> # note Comp*G is the actual dose of cholestyramine received
> caceIV2 = ivreg(Y ~ I(comp*G)|G) # outcome Y, dose comp*G, group G

>  summary(caceIV2)
Call:
ivreg(formula = Y ~ I(comp * G) | G)

Residuals:
     Min       1Q   Median       3Q      Max 
-28.5645  -6.1078  -0.2812   6.5362  29.6355 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.6645     0.8059   11.99   <2e-16 ***
I(comp * G)  33.2189     1.9021   17.46   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.871 on 298 degrees of freedom
Multiple R-Squared: 0.5814,     Adjusted R-squared:  0.58 
Wald test:   305 on 1 and 298 DF,  p-value: < 2.2e-16 

>  confint(caceIV2)
                2.5 %   97.5 %
(Intercept)  8.084849 11.24406
I(comp * G) 29.490842 36.94686
> ## data were built so that treatment effect at perfect compliance about 35, so not bad

> # from ivreg you can get a little more
>  summary(caceIV2, diagnostics = TRUE)

Call:
ivreg(formula = Y ~ I(comp * G) | G)

Residuals:
     Min       1Q   Median       3Q      Max 
-28.5645  -6.1078  -0.2812   6.5362  29.6355 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.6645     0.8059   11.99   <2e-16 ***
I(comp * G)  33.2189     1.9021   17.46   <2e-16 ***

Diagnostic tests:
                 df1 df2 statistic p-value    
Weak instruments   1 298  1450.892 < 2e-16 ***
Wu-Hausman         1 297     7.281 0.00737 ** 
Sargan             0  NA        NA      NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.871 on 298 degrees of freedom
Multiple R-Squared: 0.5814,     Adjusted R-squared:  0.58 
Wald test:   305 on 1 and 298 DF,  p-value: < 2.2e-16 

> # Hausman is this very strange notion that you should worry if IV estimate is far from OLS, even though you believe OLS is bad


> ### Now try out ivmodel, presented in Comp Co #7
> ?ivmodel
starting httpd help server ... done
> cace_IVmodel = ivmodel(Y , I(comp*G), G)
Error in ivmodel(Y, I(comp * G), G) : D is not a numeric vector.
  ## ivmodel doesn't like the simple function (that ivreg takes fine)
> str(efdat)
'data.frame':   300 obs. of  3 variables:
 $ comp: num  0.528 0.862 0.98 0.673 0.66 0.551 0.444 0.654 0.21 0.682 ...
 $ G   : int  0 1 0 1 0 1 0 1 0 1 ...
 $ Y   : num  -9.57 58.5 10.1 33.9 3.6 29.1 -0.374 35.2 26.8 28.6 ...
> D = efdat$comp*efdat$G # ok, I give
> D
  [1] 0.000 0.862 0.000 0.673 0.000 0.551 0.000 0.654 0.000 0.682 0.000 0.416 0.000 0.999 0.000 0.352 0.000
 [18] 0.927 0.000 0.528 0.000 0.886 0.000 0.804 0.000 0.779 0.000 0.308 0.000 0.465 0.000 0.575 0.000 0.427
 [35] 0.000 0.831 0.000 0.852 0.000 0.462 0.000 0.527 0.000 0.703 0.000 0.799 0.000 0.882 0.000 0.487 0.000
 [52] 0.557 0.000 0.322 0.000 0.592 0.000 0.330 0.000 0.543 0.000 0.692 0.000 0.664 0.000 0.657 0.000 0.548
 [69] 0.000 0.468 0.000 0.746 0.000 0.610 0.000 0.492 0.000 0.259 0.000 0.231 0.000 0.418 0.000 0.728 0.000
 [86] 0.631 0.000 0.653 0.000 0.801 0.000 0.790 0.000 0.787 0.000 0.230 0.000 0.615 0.000 0.652 0.000 0.565
[103] 0.000 0.311 0.000 0.771 0.000 0.838 0.000 0.546 0.000 0.448 0.000 0.683 0.000 0.776 0.000 0.887 0.000
[120] 0.453 0.000 0.510 0.000 0.813 0.000 0.510 0.000 0.711 0.000 0.166 0.000 0.844 0.000 0.708 0.000 0.854
[137] 0.000 0.362 0.000 0.722 0.000 0.675 0.000 0.640 0.000 0.737 0.000 0.440 0.000 0.547 0.000 0.365 0.000
[154] 0.542 0.000 0.618 0.000 0.711 0.000 0.483 0.000 0.963 0.000 0.921 0.000 0.440 0.000 0.771 0.000 0.471
[171] 0.000 0.416 0.000 0.569 0.000 0.551 0.000 0.704 0.000 0.605 0.000 0.887 0.000 0.547 0.000 0.715 0.000
[188] 0.423 0.000 0.655 0.000 0.544 0.000 0.757 0.000 0.711 0.000 0.600 0.000 0.723 0.000 0.885 0.000 0.799
[205] 0.000 0.530 0.000 0.465 0.000 0.782 0.000 0.988 0.000 0.458 0.000 0.551 0.000 0.546 0.000 0.683 0.000
[222] 0.573 0.000 0.799 0.000 0.497 0.000 0.602 0.000 0.300 0.000 0.544 0.000 0.551 0.000 0.281 0.000 0.490
[239] 0.000 0.115 0.000 0.771 0.000 0.756 0.000 0.461 0.000 0.988 0.000 0.249 0.000 0.319 0.000 0.861 0.000
[256] 0.578 0.000 0.453 0.000 0.810 0.000 0.479 0.000 0.714 0.000 0.496 0.000 0.437 0.000 0.273 0.000 0.678
[273] 0.000 0.755 0.000 0.414 0.000 0.541 0.000 0.548 0.000 0.361 0.000 0.251 0.000 0.421 0.000 0.519 0.000
[290] 0.580 0.000 0.926 0.000 0.500 0.000 0.225 0.000 0.907 0.000 0.517

> cace_IVmodel = ivmodel( Y = efdat$Y , D = D, Z =efdat$G)
> summary(cace_IVmodel)

Call:
ivmodel(Y = efdat$Y, D = D, Z = efdat$G)
sample size: 300
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

First Stage Regression Result:

F=1450.892, df1=1, df2=298, p-value is < 2.22e-16
R-squared=0.8296064,   Adjusted R-squared=0.8290346
Residual standard error: 0.1362384 on 299 degrees of freedom
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

Coefficients of k-Class Estimators:

             k Estimate Std. Error t value Pr(>|t|)    
OLS     0.0000  35.3104     1.7282   20.43   <2e-16 ***
Fuller  0.9966  33.2273     1.9014   17.48   <2e-16 ***
TSLS    1.0000  33.2189     1.9021   17.46   <2e-16 ***
LIML    1.0000  33.2189     1.9021   17.46   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

Alternative tests for the treatment effect under H_0: beta=0.

Anderson-Rubin test:
F=223.3544, df1=1, df2=298, p-value=< 2.22e-16
95 percent confidence interval:
 [ 29.4376376391079 , 36.9343596107468 ]

Conditional Likelihood Ratio test:
Test Stat=223.3544, p-value=< 2.22e-16
95 percent confidence interval:
 [29.4376406035085, 36.934356748474]

> confint(cace_IVmodel)
           2.5%    97.5%
OLS    31.90932 38.71146
Fuller 29.48546 36.96915
TSLS   29.47564 36.96206
LIML   29.47564 36.96206
AR     29.43764 36.93436
CLR    29.43764 36.93436


# for these small and simple data, pretty much the same
#####################################
# afterthoughts below
> # so you might ask, since dose-response works, why do you need any of this IV stuff? a thought
> ols = lm(Y ~ G) # intent to treat
> summary(ols)
Call:
lm(formula = Y ~ G)

Residuals:
     Min       1Q   Median       3Q      Max 
-31.1399  -7.2358  -0.3272   6.9551  29.6355 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.6645     0.9418   10.26   <2e-16 ***
G            19.9054     1.3319   14.95   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.53 on 298 degrees of freedom
Multiple R-squared:  0.4284,    Adjusted R-squared:  0.4265 
F-statistic: 223.4 on 1 and 298 DF,  p-value: < 2.2e-16

> ols_2 = lm(Y ~ I(G*comp))  # simple dose response, no IV; matched ivmodel result for OLS
> summary(ols_2)

Call:
lm(formula = Y ~ I(G * comp))

Residuals:
     Min       1Q   Median       3Q      Max 
-27.9378  -6.0493   0.0842   6.5473  30.2622 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    9.038      0.769   11.75   <2e-16 ***
I(G * comp)   35.310      1.728   20.43   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.846 on 298 degrees of freedom
Multiple R-squared:  0.5835,    Adjusted R-squared:  0.5821 
F-statistic: 417.5 on 1 and 298 DF,  p-value: < 2.2e-16