Week 9 RQ3 Sesamee Street
The Sesame Street example is from Andrew Gelman at Columbia,
Data from an experiment in which a randomly selected group of children
was encouraged to watch the television program Sesame Street and the randomly
selected control group was not. The goal of the experiment was to estimate the
effect on child cognitive development of watching more Sesame Street. In the
experiment, encouragement, but not actual watching, was randomized.
We might consider implementing a randomized experiment where the participants are
preschool children, the treatment of interest is watching Sesame Street, the control
condition is not watching, and the outcome is the score on a test of letter
recognition. It is not possible here for the experimenter to force children to watch
a TV show or to refrain from watching (the experiment took place while Sesame Street
was on the air). Thus watching cannot be randomized. Instead, when this study was
actually performed, what was randomized was encouragement to watch the show--this is
called a randomized encouragement design.
-------------------------
Code book with variable names for Sesame Street data
id : subject identification number
site : 1 =Three to five year old disadvantaged children from inner city areas in
various parts of the country.
2 = Four year old advantaged suburban children.
3 = Advantaged rural children.
4 = Disadvantaged rural children.
5 = Disadvantaged Spanish speaking children.
sex male=1, female=2
age age in months
viewcat frequency of viewing
1=rarely watched the show
2=once or twice a week
3=three to five times a week
4=watched the show on average more than 5 times a week
setting: setting in which Sesame Street was viewed, 1=home 2=school
viewenc : treatment condition
1=child encouraged to watch, 2=child not encouraged to watch
encour: treatment condition
0=child not encouraged to watch, 1=child encouraged to watch
regular: frequency of viewing:
0=rarely watched the show, 1= watched once/week or greater
prebody : pretest on knowledge of body parts (scores range from 0-32)
prelet : pretest on letters (scores range from 0-58)
preform : pretest on forms (scores range from 0-20)
prenumb : pretest on numbers (scores range from 0-54)
prerelat : pretest on relational terms (scores range from 0-17)
preclasf : pretest on classification skills
postbody : posttest on knowledge of body parts (0-32)
postlet : posttest on letters (0-58)
postform : posttest on forms (0-20)
postnumb : posttest on numbers (0-54)
postrelat : posttest on relational terms (0-17)
postclasf: posttest on classification skills
peabody: mental age score obtained from administration of the Peabody Picture
Vocabulary test as a pretest measure of vocabulary maturity
{note: measures used in analyses below are:
encour, regular, postlet, prelet, peabody, site)
I have a data frame called "ses" ('cuz I can't spell sesame consistently)
> library(foreign)
> ses = read.dta("http://www.stat.columbia.edu/~gelman/arm/examples/sesame/sesame.dta")
> names(ses)
[1] "rownames" "id" "site" "sex" "age" "viewcat" "setting" "viewenc" "prebody" "prelet" "preform" "prenumb" "prerelat" "preclasf" "postbody" "postlet" "postform"
[18] "postnumb" "postrelat" "postclasf" "peabody" "agecat" "encour" "_Isite_2" "_Isite_3" "_Isite_4" "_Isite_5" "regular"
> attach(ses)
> detach(ses)
> ses$imp = ses$postlet- ses$prelet
> attach(ses)
> dim(ses) #we have 240 children
[1] 240 28
The data frame is in wide form (each child is a row)
> names(ses) #approximately correspond to the codebook listing
[1] "rownames" "id" "site" "sex" "age" "viewcat" "setting"
"viewenc" "prebody" "prelet"
[11] "preform" "prenumb" "prerelat" "preclasf" "postbody" "postlet"
"postform" "postnumb" "postrelat" "postclasf"
[21] "peabody" "agecat" "encour" "_Isite_2" "_Isite_3" "_Isite_4"
"_Isite_5" "regular"
Each of the 5 sites has encouraged and not encouraged (randomized) conditions
with different populations of children
> table(encour, site)
site
encour 1 2 3 4 5
0 28 19 14 23 4
1 32 36 50 20 14
--------------------------------------------------------------------------------------------------
Effect of viewing Sesame Street on improvement in 'letters'.
> t.test(imp ~ encour)
Welch Two Sample t-test
data: imp by encour
t = -3.102, df = 173.592, p-value = 0.002244
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: -7.567744 -1.682256
sample estimates:
mean in group 0 mean in group 1
7.875 12.500
we have have information on actual viewing of Sesame Street by the children; here we will
use the dichotomous measure "regular". As stated in the introduction: "The goal of the experiment was
to estimate the effect on child cognitive development of watching more Sesame Street."
# useful descriptives
> tapply(postlet, encour, mean)
0 1
24.92045 27.79605
> tapply(prelet, encour, mean)
0 1
17.04545 15.29605
> tapply(prelet, regular, mean)
0 1
14.42593 16.37634
> tapply(postlet, regular, mean)
0 1
16.90741 29.59677
> tapply(postlet, list(encour,regular), mean)
0 1
0 16.85000 31.64583
1 17.07143 28.88406
> tapply(prelet, list(encour,regular), mean)
0 1
0 14.07500 19.52083
1 15.42857 15.28261
> tapply(regular, encour, mean)
0 1
0.5454545 0.9078947
------------------------------------
. Using the encouragement design formulation estimate of the effect on child cognitive development (measured by "imp"
here) of watching more Sesame Street. What assumption is necessary for the IV estimation
in this design?
From the "useful descriptives" given you can reproduce this instrumental variables
estimate (Wald estimator).
-----------------------------------------
> library(AER)
> ivimp = ivreg(imp ~ regular | encour)
> summary(ivimp)
Call:
ivreg(formula = imp ~ regular | encour)
Residuals:
Min 1Q Median 3Q Max
-35.675 -7.675 -1.675 7.085 27.325
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9146 3.0185 0.303 0.762152
regular 12.7607 3.7995 3.359 0.000912 ***
---
Residual standard error: 10.28 on 238 degrees of freedom
Multiple R-Squared: 0.1562, Adjusted R-squared: 0.1526
Wald test: 11.28 on 1 and 238 DF, p-value: 0.0009123
> confint(ivimp)
2.5 % 97.5 %
(Intercept) -5.001465 6.830673
regular 5.313858 20.207594
---------------------------------------
## Wald estimator
> tapply(imp, encour, mean)
0 1
7.875 12.500
> tapply(regular, encour, mean)
0 1
0.5454545 0.9078947
> (12.5 - 7.875)/(.9079 - .5455)
[1] 12.76214