Lab 4 2/27/16 lalonde ancova

#####But wait, we must say "we are not done until the ancova is run"

# refer back to week 1 (or week 5), the social science practice is to put
# in the treatment variable (week 1 exs, breastfeeding, sexy media) and
# a whole bunch of other variables to "control" for self-selection, nonequivalence etc.
# week 5 we saw that was usually equivalent to analysis of covariance by whatever name

> data(lalonde)
> attach(lalonde)
> dim(lalonde)
[1] 614  10
> names(lalonde)
 [1] "treat"    "age"      "educ"     "black"    "hispan"   "married"  "nodegree" "re74"    
 [9] "re75"     "re78"    
> ancova.lalonde = lm( re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75)
> summary(ancova.lalonde)

Call:
lm(formula = re78 ~ treat + age + educ + black + hispan + married + 
    nodegree + re74 + re75)

Residuals:
   Min     1Q Median     3Q    Max 
-13595  -4894  -1662   3929  54570 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.651e+01  2.437e+03   0.027   0.9782    
treat        1.548e+03  7.813e+02   1.982   0.0480 *  
age          1.298e+01  3.249e+01   0.399   0.6897    
educ         4.039e+02  1.589e+02   2.542   0.0113 *  
black       -1.241e+03  7.688e+02  -1.614   0.1071    
hispan       4.989e+02  9.419e+02   0.530   0.5966    
married      4.066e+02  6.955e+02   0.585   0.5590    
nodegree     2.598e+02  8.474e+02   0.307   0.7593    
re74         2.964e-01  5.827e-02   5.086 4.89e-07 ***
re75         2.315e-01  1.046e-01   2.213   0.0273 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6948 on 604 degrees of freedom
Multiple R-squared: 0.1478,     Adjusted R-squared: 0.1351 
F-statistic: 11.64 on 9 and 604 DF,  p-value: < 2.2e-16 


So it turns out we were wrong all along (along with the labor economists)
there is a significant effect of treat (job training), $1548. Well, I'll be.....

But if you use different subsets of these covariates you may/will (don't want to spoil the surprise)
get quite different results.
(Or to increase the chaos try some version of IV)

---------------------------------

just for reference, the stat60 t-test or regression version--no effect of treat (in opposite direction)

> t.test( re78 ~ treat)
        Welch Two Sample t-test

data:  re78 by treat 
t = 0.9377, df = 326.412, p-value = 0.3491
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -697.192 1967.244 
sample estimates:
mean in group 0 mean in group 1 
       6984.170        6349.144 

> summary(lm(re78 ~ treat))  #regression version of t-test
Call:
lm(formula = re78 ~ treat)

Residuals:
   Min     1Q Median     3Q    Max 
 -6984  -6349  -2048   4100  53959 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6984.2      360.7  19.362   <2e-16 ***
treat         -635.0      657.1  -0.966    0.334    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 7471 on 612 degrees of freedom
Multiple R-squared: 0.001524,   Adjusted R-squared: -0.0001079 
F-statistic: 0.9338 on 1 and 612 DF,  p-value: 0.3342