Lab 4 2/27/16 lalonde ancova
#####But wait, we must say "we are not done until the ancova is run"
# refer back to week 1 (or week 5), the social science practice is to put
# in the treatment variable (week 1 exs, breastfeeding, sexy media) and
# a whole bunch of other variables to "control" for self-selection, nonequivalence etc.
# week 5 we saw that was usually equivalent to analysis of covariance by whatever name
> data(lalonde)
> attach(lalonde)
> dim(lalonde)
[1] 614 10
> names(lalonde)
[1] "treat" "age" "educ" "black" "hispan" "married" "nodegree" "re74"
[9] "re75" "re78"
> ancova.lalonde = lm( re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75)
> summary(ancova.lalonde)
Call:
lm(formula = re78 ~ treat + age + educ + black + hispan + married +
nodegree + re74 + re75)
Residuals:
Min 1Q Median 3Q Max
-13595 -4894 -1662 3929 54570
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.651e+01 2.437e+03 0.027 0.9782
treat 1.548e+03 7.813e+02 1.982 0.0480 *
age 1.298e+01 3.249e+01 0.399 0.6897
educ 4.039e+02 1.589e+02 2.542 0.0113 *
black -1.241e+03 7.688e+02 -1.614 0.1071
hispan 4.989e+02 9.419e+02 0.530 0.5966
married 4.066e+02 6.955e+02 0.585 0.5590
nodegree 2.598e+02 8.474e+02 0.307 0.7593
re74 2.964e-01 5.827e-02 5.086 4.89e-07 ***
re75 2.315e-01 1.046e-01 2.213 0.0273 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6948 on 604 degrees of freedom
Multiple R-squared: 0.1478, Adjusted R-squared: 0.1351
F-statistic: 11.64 on 9 and 604 DF, p-value: < 2.2e-16
So it turns out we were wrong all along (along with the labor economists)
there is a significant effect of treat (job training), $1548. Well, I'll be.....
But if you use different subsets of these covariates you may/will (don't want to spoil the surprise)
get quite different results.
(Or to increase the chaos try some version of IV)
---------------------------------
just for reference, the stat60 t-test or regression version--no effect of treat (in opposite direction)
> t.test( re78 ~ treat)
Welch Two Sample t-test
data: re78 by treat
t = 0.9377, df = 326.412, p-value = 0.3491
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-697.192 1967.244
sample estimates:
mean in group 0 mean in group 1
6984.170 6349.144
> summary(lm(re78 ~ treat)) #regression version of t-test
Call:
lm(formula = re78 ~ treat)
Residuals:
Min 1Q Median 3Q Max
-6984 -6349 -2048 4100 53959
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6984.2 360.7 19.362 <2e-16 ***
treat -635.0 657.1 -0.966 0.334
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7471 on 612 degrees of freedom
Multiple R-squared: 0.001524, Adjusted R-squared: -0.0001079
F-statistic: 0.9338 on 1 and 612 DF, p-value: 0.3342