RQ3 Solution  Sesame Street


#for IV estimation
> install.packages("AER")
#for reading binary stata file
> library(foreign)
> ses = read.dta("http://www.stat.columbia.edu/~gelman/arm/examples/sesame/sesame.dta") #get Gelman data
> names(ses)
 [1] "rownames"  "id"        "site"      "sex"       "age"       "viewcat"   "setting"   "viewenc"   "prebody"   "prelet"    "preform"   "prenumb"   "prerelat" 
[14] "preclasf"  "postbody"  "postlet"   "postform"  "postnumb"  "postrelat" "postclasf" "peabody"   "agecat"    "encour"    "_Isite_2"  "_Isite_3"  "_Isite_4" 
[27] "_Isite_5"  "regular"  
> attach(ses)
> #outcome is postnumb
> #IV (Wald) estimate for encouragement design
## assumes no direct effect of encouragement to watch Sesame Street on outcome (only through viewing)
## e.g. no parental extra effort or resources triggered/motivated by the outside encouragement to foster the child's cogninitive development
# A tough sell?
> ivnum = ivreg(postnumb ~ regular |encour)
> summary(ivnum)

Call:
ivreg(formula = postnumb ~ regular | encour)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.2611 -10.3866  -0.8866  10.6134  22.6134 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   25.261      3.578   7.061  1.8e-11 ***
regular        6.125      4.503   1.360    0.175    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.18 on 238 degrees of freedom
Multiple R-Squared: 0.1008,     Adjusted R-squared: 0.09701 
Wald test:  1.85 on 1 and 238 DF,  p-value: 0.175 

> confint(ivnum)
               2.5 %   97.5 %
(Intercept) 18.24931 32.27297
regular     -2.70075 14.95158
#check Wald estimate
> tapply(postnumb, encour, mean)
       0        1 
28.60227 30.82237 
> tapply(regular, encour, mean)
        0         1 
0.5454545 0.9078947 
> (30.82 - 28.60)/(.908 - .545) #slight truncation, arithmetic matches
[1] 6.115702
> 
# remember class example used improvement in letters, rather than just postlet
> #added precision of using the individual baseline may account for significant result in class example (Unit 4 of this course)


> ##path analysis approach-- assumes Holland's delta = 0. Try to tell a story to support that (see Lecture 1 and Holland sec4.3).
> #effect of viewing is then estimated by coef of regular in multiple regression
> pathnum = lm(postnumb ~ regular + encour)
> summary(pathnum)

Call:
lm(formula = postnumb ~ regular + encour)

Residuals:
     Min       1Q   Median       3Q      Max 
-24.9549  -9.9070  -0.5191  10.0930  24.8209 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   22.179      1.699  13.056  < 2e-16 ***
regular       11.776      2.045   5.757 2.63e-08 ***
encour        -2.048      1.772  -1.155    0.249    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.02 on 237 degrees of freedom
Multiple R-squared:  0.1288,    Adjusted R-squared:  0.1215 
F-statistic: 17.53 on 2 and 237 DF,  p-value: 7.975e-08

> # now we get a 'causal' effect twice as large with one-half the standard error
> # no wonder path analysis lives on with such fervor
> confint(pathnum)
                2.5 %    97.5 %
(Intercept) 18.832358 25.525831
regular      7.746515 15.805138
encour      -5.539509  1.443635
> # note also the neg coef for encour; Holland's results (from lecture) indicate that when delta is large that coef will be negative