RQ3 Solution Sesame Street #for IV estimation > install.packages("AER") #for reading binary stata file > library(foreign) > ses = read.dta("http://www.stat.columbia.edu/~gelman/arm/examples/sesame/sesame.dta") #get Gelman data > names(ses) [1] "rownames" "id" "site" "sex" "age" "viewcat" "setting" "viewenc" "prebody" "prelet" "preform" "prenumb" "prerelat" [14] "preclasf" "postbody" "postlet" "postform" "postnumb" "postrelat" "postclasf" "peabody" "agecat" "encour" "_Isite_2" "_Isite_3" "_Isite_4" [27] "_Isite_5" "regular" > attach(ses) > #outcome is postnumb > #IV (Wald) estimate for encouragement design ## assumes no direct effect of encouragement to watch Sesame Street on outcome (only through viewing) ## e.g. no parental extra effort or resources triggered/motivated by the outside encouragement to foster the child's cogninitive development # A tough sell? > ivnum = ivreg(postnumb ~ regular |encour) > summary(ivnum) Call: ivreg(formula = postnumb ~ regular | encour) Residuals: Min 1Q Median 3Q Max -25.2611 -10.3866 -0.8866 10.6134 22.6134 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 25.261 3.578 7.061 1.8e-11 *** regular 6.125 4.503 1.360 0.175 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 12.18 on 238 degrees of freedom Multiple R-Squared: 0.1008, Adjusted R-squared: 0.09701 Wald test: 1.85 on 1 and 238 DF, p-value: 0.175 > confint(ivnum) 2.5 % 97.5 % (Intercept) 18.24931 32.27297 regular -2.70075 14.95158 #check Wald estimate > tapply(postnumb, encour, mean) 0 1 28.60227 30.82237 > tapply(regular, encour, mean) 0 1 0.5454545 0.9078947 > (30.82 - 28.60)/(.908 - .545) #slight truncation, arithmetic matches [1] 6.115702 > # remember class example used improvement in letters, rather than just postlet > #added precision of using the individual baseline may account for significant result in class example (Unit 4 of this course) > ##path analysis approach-- assumes Holland's delta = 0. Try to tell a story to support that (see Lecture 1 and Holland sec4.3). > #effect of viewing is then estimated by coef of regular in multiple regression > pathnum = lm(postnumb ~ regular + encour) > summary(pathnum) Call: lm(formula = postnumb ~ regular + encour) Residuals: Min 1Q Median 3Q Max -24.9549 -9.9070 -0.5191 10.0930 24.8209 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.179 1.699 13.056 < 2e-16 *** regular 11.776 2.045 5.757 2.63e-08 *** encour -2.048 1.772 -1.155 0.249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 12.02 on 237 degrees of freedom Multiple R-squared: 0.1288, Adjusted R-squared: 0.1215 F-statistic: 17.53 on 2 and 237 DF, p-value: 7.975e-08 > # now we get a 'causal' effect twice as large with one-half the standard error > # no wonder path analysis lives on with such fervor > confint(pathnum) 2.5 % 97.5 % (Intercept) 18.832358 25.525831 regular 7.746515 15.805138 encour -5.539509 1.443635 > # note also the neg coef for encour; Holland's results (from lecture) indicate that when delta is large that coef will be negative