###### Week 7, RQ1 Compliance example > library(AER) # current popular ivreg function > install.packages("ivmodel") # new package from H Kang, CC_7 > library(ivmodel) > efdat = read.table(file = "http://web.stanford.edu/~rag/stat209/hw7efdata", header = T) > attach(efdat) > head(efdat) comp G Y 1 0.528 0 -9.57 2 0.862 1 58.50 3 0.980 0 10.10 4 0.673 1 33.90 5 0.660 0 3.60 6 0.551 1 29.10 > ## compliance data (measured comp) outcome Y, group G > # see also week 6 RQ3 description > ## start by using ivreg in AER (also week6RQ3) > # note Comp*G is the actual dose of cholestyramine received > caceIV2 = ivreg(Y ~ I(comp*G)|G) # outcome Y, dose comp*G, group G > summary(caceIV2) Call: ivreg(formula = Y ~ I(comp * G) | G) Residuals: Min 1Q Median 3Q Max -28.5645 -6.1078 -0.2812 6.5362 29.6355 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.6645 0.8059 11.99 <2e-16 *** I(comp * G) 33.2189 1.9021 17.46 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.871 on 298 degrees of freedom Multiple R-Squared: 0.5814, Adjusted R-squared: 0.58 Wald test: 305 on 1 and 298 DF, p-value: < 2.2e-16 > confint(caceIV2) 2.5 % 97.5 % (Intercept) 8.084849 11.24406 I(comp * G) 29.490842 36.94686 > ## data were built so that treatment effect at perfect compliance about 35, so not bad > # from ivreg you can get a little more > summary(caceIV2, diagnostics = TRUE) Call: ivreg(formula = Y ~ I(comp * G) | G) Residuals: Min 1Q Median 3Q Max -28.5645 -6.1078 -0.2812 6.5362 29.6355 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.6645 0.8059 11.99 <2e-16 *** I(comp * G) 33.2189 1.9021 17.46 <2e-16 *** Diagnostic tests: df1 df2 statistic p-value Weak instruments 1 298 1450.892 < 2e-16 *** Wu-Hausman 1 297 7.281 0.00737 ** Sargan 0 NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.871 on 298 degrees of freedom Multiple R-Squared: 0.5814, Adjusted R-squared: 0.58 Wald test: 305 on 1 and 298 DF, p-value: < 2.2e-16 > # Hausman is this very strange notion that you should worry if IV estimate is far from OLS, even though you believe OLS is bad > ### Now try out ivmodel, presented in Comp Co #7 > ?ivmodel starting httpd help server ... done > cace_IVmodel = ivmodel(Y , I(comp*G), G) Error in ivmodel(Y, I(comp * G), G) : D is not a numeric vector. ## ivmodel doesn't like the simple function (that ivreg takes fine) > str(efdat) 'data.frame': 300 obs. of 3 variables: $ comp: num 0.528 0.862 0.98 0.673 0.66 0.551 0.444 0.654 0.21 0.682 ... $ G : int 0 1 0 1 0 1 0 1 0 1 ... $ Y : num -9.57 58.5 10.1 33.9 3.6 29.1 -0.374 35.2 26.8 28.6 ... > D = efdat$comp*efdat$G # ok, I give > D [1] 0.000 0.862 0.000 0.673 0.000 0.551 0.000 0.654 0.000 0.682 0.000 0.416 0.000 0.999 0.000 0.352 0.000 [18] 0.927 0.000 0.528 0.000 0.886 0.000 0.804 0.000 0.779 0.000 0.308 0.000 0.465 0.000 0.575 0.000 0.427 [35] 0.000 0.831 0.000 0.852 0.000 0.462 0.000 0.527 0.000 0.703 0.000 0.799 0.000 0.882 0.000 0.487 0.000 [52] 0.557 0.000 0.322 0.000 0.592 0.000 0.330 0.000 0.543 0.000 0.692 0.000 0.664 0.000 0.657 0.000 0.548 [69] 0.000 0.468 0.000 0.746 0.000 0.610 0.000 0.492 0.000 0.259 0.000 0.231 0.000 0.418 0.000 0.728 0.000 [86] 0.631 0.000 0.653 0.000 0.801 0.000 0.790 0.000 0.787 0.000 0.230 0.000 0.615 0.000 0.652 0.000 0.565 [103] 0.000 0.311 0.000 0.771 0.000 0.838 0.000 0.546 0.000 0.448 0.000 0.683 0.000 0.776 0.000 0.887 0.000 [120] 0.453 0.000 0.510 0.000 0.813 0.000 0.510 0.000 0.711 0.000 0.166 0.000 0.844 0.000 0.708 0.000 0.854 [137] 0.000 0.362 0.000 0.722 0.000 0.675 0.000 0.640 0.000 0.737 0.000 0.440 0.000 0.547 0.000 0.365 0.000 [154] 0.542 0.000 0.618 0.000 0.711 0.000 0.483 0.000 0.963 0.000 0.921 0.000 0.440 0.000 0.771 0.000 0.471 [171] 0.000 0.416 0.000 0.569 0.000 0.551 0.000 0.704 0.000 0.605 0.000 0.887 0.000 0.547 0.000 0.715 0.000 [188] 0.423 0.000 0.655 0.000 0.544 0.000 0.757 0.000 0.711 0.000 0.600 0.000 0.723 0.000 0.885 0.000 0.799 [205] 0.000 0.530 0.000 0.465 0.000 0.782 0.000 0.988 0.000 0.458 0.000 0.551 0.000 0.546 0.000 0.683 0.000 [222] 0.573 0.000 0.799 0.000 0.497 0.000 0.602 0.000 0.300 0.000 0.544 0.000 0.551 0.000 0.281 0.000 0.490 [239] 0.000 0.115 0.000 0.771 0.000 0.756 0.000 0.461 0.000 0.988 0.000 0.249 0.000 0.319 0.000 0.861 0.000 [256] 0.578 0.000 0.453 0.000 0.810 0.000 0.479 0.000 0.714 0.000 0.496 0.000 0.437 0.000 0.273 0.000 0.678 [273] 0.000 0.755 0.000 0.414 0.000 0.541 0.000 0.548 0.000 0.361 0.000 0.251 0.000 0.421 0.000 0.519 0.000 [290] 0.580 0.000 0.926 0.000 0.500 0.000 0.225 0.000 0.907 0.000 0.517 > cace_IVmodel = ivmodel( Y = efdat$Y , D = D, Z =efdat$G) > summary(cace_IVmodel) Call: ivmodel(Y = efdat$Y, D = D, Z = efdat$G) sample size: 300 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ First Stage Regression Result: F=1450.892, df1=1, df2=298, p-value is < 2.22e-16 R-squared=0.8296064, Adjusted R-squared=0.8290346 Residual standard error: 0.1362384 on 299 degrees of freedom _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Coefficients of k-Class Estimators: k Estimate Std. Error t value Pr(>|t|) OLS 0.0000 35.3104 1.7282 20.43 <2e-16 *** Fuller 0.9966 33.2273 1.9014 17.48 <2e-16 *** TSLS 1.0000 33.2189 1.9021 17.46 <2e-16 *** LIML 1.0000 33.2189 1.9021 17.46 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Alternative tests for the treatment effect under H_0: beta=0. Anderson-Rubin test: F=223.3544, df1=1, df2=298, p-value=< 2.22e-16 95 percent confidence interval: [ 29.4376376391079 , 36.9343596107468 ] Conditional Likelihood Ratio test: Test Stat=223.3544, p-value=< 2.22e-16 95 percent confidence interval: [29.4376406035085, 36.934356748474] > confint(cace_IVmodel) 2.5% 97.5% OLS 31.90932 38.71146 Fuller 29.48546 36.96915 TSLS 29.47564 36.96206 LIML 29.47564 36.96206 AR 29.43764 36.93436 CLR 29.43764 36.93436 # for these small and simple data, pretty much the same ##################################### # afterthoughts below > # so you might ask, since dose-response works, why do you need any of this IV stuff? a thought > ols = lm(Y ~ G) # intent to treat > summary(ols) Call: lm(formula = Y ~ G) Residuals: Min 1Q Median 3Q Max -31.1399 -7.2358 -0.3272 6.9551 29.6355 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.6645 0.9418 10.26 <2e-16 *** G 19.9054 1.3319 14.95 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 11.53 on 298 degrees of freedom Multiple R-squared: 0.4284, Adjusted R-squared: 0.4265 F-statistic: 223.4 on 1 and 298 DF, p-value: < 2.2e-16 > ols_2 = lm(Y ~ I(G*comp)) # simple dose response, no IV; matched ivmodel result for OLS > summary(ols_2) Call: lm(formula = Y ~ I(G * comp)) Residuals: Min 1Q Median 3Q Max -27.9378 -6.0493 0.0842 6.5473 30.2622 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.038 0.769 11.75 <2e-16 *** I(G * comp) 35.310 1.728 20.43 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.846 on 298 degrees of freedom Multiple R-squared: 0.5835, Adjusted R-squared: 0.5821 F-statistic: 417.5 on 1 and 298 DF, p-value: < 2.2e-16