Lab 4 Stat209 3/4/18

            Rogosa Lab 4 R-session (base: secs 1-3)

 Preliminaries, install MatchIt and get lab data
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# I Started from a relatively clean install, get MatchIt and optmatch
# some years order matters because of complication with license for optmatch algorithms
# this year appears smooth

> install.packages("optmatch")
> library(optmatch)
> install.packages("MatchIt")
> library(MatchIt)

> data(nuclearplants)  # we don't need this, just for illustration


Now to get going on our Lab 4 tasks


> data(lalonde) # get lalonde data from MatchIt, there are different versions in existence under this name (see M-B text)

# or simply
help(lalonde)  #produces
-----------------------------------
lalonde               package:MatchIt               R Documentation

Data from National Supported Work Demonstration and PSID, as analyzed by Dehejia and Wahba (1999).

Description:

     This is a subsample of the data from the treated group in the
     National Supported Work Demonstration (NSW) and the comparison
     sample from the Current Population Survey (CPS).   This data was
     previously analyzed extensively by Lalonde (1986) and Dehejia and
     Wahba (1999).   The full dataset is available at <URL:
     http://www.columbia.edu/~rd247/nswdata.html>. [note: broken link still in current documentation]

Usage:

     data(lalonde)

Format:

     A data frame with 313 [sic, 614] observations (185 treated, 429 control). 
     There are 10 variables measured for each individual.  "treat" is
     the treatment assignment (1=treated, 0=control).   "age" is age in
     years.  "educ" is education in number of years of schooling. 
     "black" is an indicator for African-American (1=African-American,
     0=not).  "hispan" is an indicator for being of Hispanic origin
     (1=Hispanic, 0=not).  "married" is an indicator for married
     (1=married, 0=not married).  "nodegree" is an indicator for
     whether the individual has a high school degree (1=no degree,
     0=degree). "re74" is income in 1974, in U.S. dollars.  "re75" is
     income in 1975, in U.S. dollars.  "re78" is income in 1978, in
     U.S. dollars.


References:

     Lalonde, R. (1986). Evaluating the econometric evaluations of
     training programs  with experimental data. American Economic
     Review 76: 604-620.

     Dehejia, R.H. and Wahba, S. (1999).  Causal Effects in
     Nonexperimental Studies:  Re-Evaluating the Evaluation  of
     Training Programs.  Journal of the American  Statistical
     Association 94: 1053-1062.
-----------------------------------------------
note: Week 8 primary readings includes Matching Methods for Causal Inference 
Elizabeth Stuart Donald Rubin [does lalonde lab4 example]
http://www.corwin.com/upm-data/18066_Chapter_11.pdf 

> dim(lalonde)  # we have 614 cases, 10 vars; see lab script ("exposition and commands") part 2
[1] 614  10   
> names(lalonde)
 [1] "treat"    "age"      "educ"     "black"    "hispan"   "married" 
 [7] "nodegree" "re74"     "re75"     "re78"    
> attach(lalonde)
> table(treat)  # so these summaries synch with data description
treat
  0   1 
429 185 
> lalonde[1:10,]  # here's the first 10 observations, all from the treatment
      treat age educ black hispan married nodegree re74 re75    re78
NSW1      1  37   11     1      0       1        1    0    0  9930.0
NSW2      1  22    9     0      1       0        1    0    0  3595.9
NSW3      1  30   12     1      0       0        0    0    0 24909.5
NSW4      1  27   11     1      0       0        1    0    0  7506.1
NSW5      1  33    8     1      0       0        1    0    0   289.8
NSW6      1  22    9     1      0       0        1    0    0  4056.5
NSW7      1  23   12     1      0       0        0    0    0     0.0
NSW8      1  32   11     1      0       0        1    0    0  8472.2
NSW9      1  22   16     1      0       0        0    0    0  2164.0
NSW10     1  33   12     0      0       1        0    0    0 12418.1


## before main events here's a warm-up
> # brute-force matching-- Lab4 script item 3.1

> numObs = dim(lalonde)[1]
> numObs
[1] 614
> # that's the 429 + 185, not going to distinguish between T/C here

   #Do it yourself matching function, Lab text part 3#######
> indiv1 = lalonde[1,]
> indiv1  # the target for this matching exercise
     treat age educ black hispan married nodegree re74 re75 re78
NSW1     1  37   11     1      0       1        1    0    0 9930
> # aside indiv1 is in the treatment group, the first 185 cases are all in treatment
> # for example, examine lalonde[185,] lalonde[186,]

> matches = matrix( nrow = numObs, ncol = 1, data = FALSE )
> # set up a dummy matrix and then loop through the data with specified calipers

> for ( i in 1:numObs) {
+ matches[i,1] <- all(
+ c(lalonde[i,4:7] == indiv1[4:7],
+ (lalonde[i,3] < indiv1[3] + 3) && (lalonde[i,3] > indiv1[3] - 3),
+ (lalonde[i,2] < indiv1[2] + 3) && (lalonde[i,2] > indiv1[2] - 3)
+ ) )
+ }
# Karen's function picks out exact matches on the four dichotomous black hisp married nodegree
  and caliper matching on age and educ
> sum(as.numeric(matches) )
[1] 4
# this picks up a match to itself (the target indiv1)
> # we only got 3 matches to others cases 183, 191 212, the latter 2 being control group members

> matches # you can print out all 614
        [,1]
  [1,]  TRUE
  [2,] FALSE
[182,] FALSE
[183,]  TRUE
[190,] FALSE
[191,]  TRUE
[211,] FALSE
[212,]  TRUE
[613,] FALSE
[614,] FALSE

> lalonde[191,]
      treat age educ black hispan married nodegree  re74  re75  re78
PSID6     0  37    9     1      0       1        1 13685 12756 17833
> lalonde[212,]
       treat age educ black hispan married nodegree  re74 re75 re78
PSID27     0  36    9     1      0       1        1 13256 8457    0
> lalonde[183,]
       treat age educ black hispan married nodegree  re74  re75  re78
NSW183     1  35    9     1      0       1        1 13602 13831 12804
> indiv1
     treat age educ black hispan married nodegree re74 re75 re78
NSW1     1  37   11     1      0       1        1    0    0 9930
> # if we took re74 re75 into account we wouldn't have any control matches for indiv1; matching not so easy as dimension increases

########### aside, opmatch demo

> # optmatch (Ben Hansen) could also find matches to person 1
> 
# I just used age and education for a demo; if I had further specified the dichotomous variables
   I would have an alternative to the results above
> m1 = matchit(treat ~ age + educ, data = lalonde, method = "optimal", ratio = 2)
Loading required package: optmatch
m1.data = match.data(m1)
# so I created 185 subclasses each with 3 members, (2 control, 1 treatment), fun
> names(m1.data)
 [1] "treat"    "age"      "educ"     "black"    "hispan"   "married"  "nodegree" "re74"     "re75"     "re78"    
[11] "distance" "weights"  "subclass"
# pick off subclass 1; two control members (not the same as above as i didn't use the other vars)
> s1 = subset(m1.data, subset = subclass < 2, select = c(age, educ, distance, subclass))
> dim(s1)
[1] 3 4
> s1
        age educ distance subclass
NSW1     37   11   0.2537        1
PSID9    38    9   0.2467        1
PSID400  37    8   0.2499        1
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
########## back to main event propensity scores and treatment effects using stratification
## for reference, the stat60 t-test or regression version shows no significant effect of treat 
## point estimate, negative treatment effect (in opposite direction)

> t.test( re78 ~ treat)
        Welch Two Sample t-test

data:  re78 by treat 
t = 0.9377, df = 326.412, p-value = 0.3491
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -697.192 1967.244 
sample estimates:
mean in group 0 mean in group 1 
       6984.170        6349.144 

> summary(lm(re78 ~ treat))  #regression version of t-test##equivalent to pooled t-test
Call:
lm(formula = re78 ~ treat)

Residuals:
   Min     1Q Median     3Q    Max 
 -6984  -6349  -2048   4100  53959 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6984.2      360.7  19.362   <2e-16 ***
treat         -635.0      657.1  -0.966    0.334    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 7471 on 612 degrees of freedom
Multiple R-squared: 0.001524,   Adjusted R-squared: -0.0001079 
F-statistic: 0.9338 on 1 and 612 DF,  p-value: 0.3342 

> # Lab script 3.2 
# now do the logistic regression that computes propensity scores (matching packages will do this for you with propen as distance measure)
> glm.lalonde = glm( treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, family = binomial)
> summary(glm.lalonde)

Call:
glm(formula = treat ~ age + educ + black + hispan + married + 
    nodegree + re74 + re75, family = binomial, data = lalonde)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.764  -0.474  -0.286   0.751   2.717  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -4.73e+00   1.02e+00   -4.65  3.3e-06 ***
age          1.58e-02   1.36e-02    1.16   0.2452    
educ         1.61e-01   6.51e-02    2.48   0.0133 *  
black        3.07e+00   2.87e-01   10.70  < 2e-16 ***
hispan       9.84e-01   4.26e-01    2.31   0.0208 *  
married     -8.32e-01   2.90e-01   -2.87   0.0042 ** 
nodegree     7.07e-01   3.38e-01    2.09   0.0362 *  
re74        -7.18e-05   2.87e-05   -2.50   0.0125 *  
re75         5.34e-05   4.63e-05    1.15   0.2488    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 751.49  on 613  degrees of freedom
Residual deviance: 487.84  on 605  degrees of freedom
AIC: 505.8

Number of Fisher Scoring iterations: 5

# best case is counter-intuitive; you really want not-wonderful predictions (analog to Rsq) so that
# you have both treat=1 and treat=0 cases at levels of propen--i.e. good overlap

> propen = fitted(glm.lalonde)  # now we have the propensity scores, Lab script calls these propScores

> quantile(propen) # overall distrib
     0%     25%     50%     75%    100% 
0.00908 0.04854 0.12068 0.63872 0.8531

> tapply(propen, treat, quantile) # look at overlap via 5-number summary (or side-by-side boxplots)
                                    not good overlap, as noted in class handout snippet
$`0`
     0%     25%     50%     75%    100% 
0.00908 0.03888 0.07585 0.19514 0.78917 

$`1`
     0%     25%     50%     75%    100% 
0.02495 0.52646 0.65368 0.72660 0.85315 

> # we are fitting prob(treat = 1) fits for those in treatment group
> # will be larger, we need good overlap for matching purposes

> boxplot(propen ~ treat) #gives side-by-side boxplots, you can add labels, not wonderful overlap compared to class exs
### could get fancier and recreate class displays of superimposed histograms (e.g. SAT, Ben Hansen)
### package such as 'HistogramTools' you can even do operations such as IntersectHistogram, but focus on basics here

###########
# superimposed histograms
# credit to Lucas Janson, creating the histogram code in real-time
# pick off propensity scores for control (p0) and treatment (p1) as separate vars, and check it

> p1 = propen[treat == 1]
> length(p1)
[1] 185
> p0 = propen[treat == 0]
> length(p0)
[1] 429
> fivenum(p1)
    NSW124     NSW156      NSW50     NSW119     NSW178 
0.02495179 0.52646352 0.65368426 0.72659995 0.85315284 
> fivenum(p0)
    PSID296     PSID347     PSID221     PSID334     PSID118 
0.009080193 0.038880745 0.075849106 0.195135746 0.789172834 
> # ok, this matches fivenum for full data set, I did subsetting right for once

> hist(p0,col=rgb(0,0,1,0.7),xlim=range(c(p0,p1)))
> hist(p1,col=rgb(1,0,0,0.7),add=T)
> # superimposed propensity histrograms, like Ben Hansen SAT, contol is blue, treatment is red, overlap close to perfect Stanford Cardinal red
# could supply more useful labels than my lazy implementation
# could also adjust the binning ("breaks") to make more bins, or make bins same as strata
##############

> tapply(propen, treat, stem) #less useful, stacked stem-and-leafs, something you can output in ascii

  The decimal point is 1 digit(s) to the left of the |   # treat = 0

  0 | 11111111111112222222222222222222222222222222222222222222233333333333+57
  0 | 55555555555555555555555555555556666666666666666666666666777777777777+39
  1 | 000000000000000011111111111111112222222222223334444
  1 | 56677888888889
  2 | 0000011122244
  2 | 55667788
  3 | 0
  3 | 6778899
  4 | 1233444
  4 | 568999
  5 | 1144
  5 | 5679
  6 | 11112234444
  6 | 555556667778888899999999
  7 | 111112222444
  7 | 55555555689


  The decimal point is 1 digit(s) to the left of the |  # treat = 1

  0 | 244
  0 | 6788899
  1 | 000122334444
  1 | 55
  2 | 012
  2 | 579
  3 | 
  3 | 8899
  4 | 0134
  4 | 5559
  5 | 1112334
  5 | 6666677778899999
  6 | 00111122222334444
  6 | 55555555555666666778888888899999
  7 | 000000001111222222222223333344444
  7 | 555555566666777777777778888888999
  8 | 1122
  8 | 5

$`0`
NULL

$`1`
NULL

5 

# or you can have MatchIt compute propensity scores for you and do a simple match (as in the Aspirin Study (Love))
# or more complex 'additional matching section'


######subclassification (by hand) #################

> # a common use of the propensity scores (backed by theory, class handout week8))
> # is to stratify by quintiles

> # the simple-minded way I do it is to use "cut", Lab script has fancier programming
> ?cut  # this is a simple function to create bins
> k = 1:4
> quantile(propen, k/5) # defines 5 bins (equal sample size) for the 614 propensity scores
    20%     40%     60%     80% 
0.04015 0.08721 0.26978 0.67085 
>  propbin = cut(propen, c(0, .04015,.08721,.26978,.67085,1))
> propen[1:10] # show propen values that get put into bins for first 10 cases
   NSW1    NSW2    NSW3    NSW4    NSW5    NSW6    NSW7    NSW8    NSW9   NSW10 
0.63877 0.22463 0.67824 0.77632 0.70164 0.69907 0.65368 0.78972 0.77984 0.04292
> propbin[1:10]  # here are the bins for those values
 [1] (0.27,0.671]    (0.0872,0.27]   (0.671,1]       (0.671,1]      
 [5] (0.671,1]       (0.671,1]       (0.27,0.671]    (0.671,1]      
 [9] (0.671,1]       (0.0401,0.0872]
Levels: (0,0.0401] (0.0401,0.0872] (0.0872,0.27] (0.27,0.671] (0.671,1]
> # levels are the 5 quintiles

> table(propbin, treat) # either way you display it, we do not have good overlap in the bottom
                          two quintiles, lower estimated probability for being in treatment
                          for treatment cases (see your side-by-side boxplot)
                 treat
propbin             0   1
  (0,0.0401]      122   1
  (0.0401,0.0872] 116   7
  (0.0872,0.27]   101  21
  (0.27,0.671]     53  71
  (0.671,1]        37  85
 
> tapply(propbin, treat,table) # show same thing sideways
$`0`

     (0,0.0401] (0.0401,0.0872]   (0.0872,0.27]    (0.27,0.671]       (0.671,1] 
            122             116             101              53              37 

$`1`

     (0,0.0401] (0.0401,0.0872]   (0.0872,0.27]    (0.27,0.671]       (0.671,1] 
              1               7              21              71              85 

> table(treat)
treat
  0   1 
429 185 
> #we have (real) overlap only in the upper 3 quintiles of the propensity score

##### looking at the code above, here's a one-line version using cut, cleaner, gives same binning
> pbin2 = cut(propen, quantile(propen, seq(0, 1, 1/5)), include.lowest = TRUE, labels = FALSE)
> table(pbin2 , lalonde$treat)
pbin2   0   1
    1 122   1
    2 116   7
    3 101  21
    4  53  71
    5  37  85


##############################
## an important diagnostic step which we focused on in class examples  is checking balance
## on each of the confounding vars (i.e. covariates) within each subclassification, c.f Rosebaum-Rubin ex
## in class we showed balance on covariates for Aspirin (nearest neighbor matching) and SAT (full matching)
# a quick look for these data is not overwhelmingly impressive, but some improvement



# for the two groups, before stratification matching
> tapply(propen, treat, mean) # of course these should differ
        0         1 
0.1822248 0.5774355 

> tapply(age, treat, mean)  # see HW8, prob 4 we looked at age 
       0        1 
28.03030 25.81622 
> tapply(educ, treat, mean) 
       0        1 
10.23543 10.34595 
> tapply(married, treat, mean) # big diff here
        0         1 
0.5128205 0.1891892 
> tapply(nodegree, treat, mean) 
        0         1 
0.5967366 0.7081081 
> tapply(re74, treat, mean)  # big diff
       0        1 
5619.237 2095.574 
> tapply(re75, treat, mean) 
       0        1 
2466.484 1532.055 

# after this initial quintile subclassification on propensity score, look at balance within quintile
# compare to overall above
# keep in mind the top 3 quintiles is where we have data for treat-control comparison

### there's only so much of the tools I want to introduce but I should mention here the package
### PSAgraphics which displays balance or lack thereof in propensity score analyses using stratification (subclasses)
### if you like pretty try the following (using the pbin2 version of binning above)
---------------------
> install.packages("PSAgraphics")
> library(PSAgraphics)
> box.psa(age, treat, pbin2)
## also version for categorical predictors; vignette at https://www.jstatsoft.org/article/view/v029i06/v29i06.pdf
---------------------------------


# first look at propen score balance
> tapply(propen, list(propbin, treat), mean) 
                         0          1
(0,0.0401]      0.02603935 0.02495179
(0.0401,0.0872] 0.06106601 0.06589413
(0.0872,0.27]   0.13761477 0.14830101
(0.27,0.671]    0.51792739 0.57075500
(0.671,1]       0.71796548 0.73766399
> tapply(propen, list(propbin, treat), median) 
                         0          1
(0,0.0401]      0.02592168 0.02495179
(0.0401,0.0872] 0.05951047 0.07382733
(0.0872,0.27]   0.11515593 0.13557377
(0.27,0.671]    0.53948683 0.59404299
(0.671,1]       0.71467967 0.73802599
> names(lalonde)
 [1] "treat"    "age"      "educ"     "black"    "hispan"   "married"  "nodegree" "re74"     "re75"    
[10] "re78"    
> tapply(age, list(propbin, treat), median) 
                 0  1
(0,0.0401]      29 27
(0.0401,0.0872] 26 23
(0.0872,0.27]   20 23
(0.27,0.671]    24 25
(0.671,1]       19 25

> tapply(age, list(propbin, treat), mean) # not better than overall
                       0        1
(0,0.0401]      32.09016 27.00000
(0.0401,0.0872] 28.93103 24.00000
(0.0872,0.27]   23.92079 24.95238
(0.27,0.671]    28.52830 26.18310
(0.671,1]       22.32432 25.85882

> tapply(educ, list(propbin, treat), mean) # not bad, but not better than overall
                        0        1
(0,0.0401]       9.877049 13.00000
(0.0401,0.0872] 10.086207 10.71429
(0.0872,0.27]   10.910891 10.52381
(0.27,0.671]     9.886792 10.05634
(0.671,1]       10.540541 10.48235

> tapply(married, list(propbin, treat), mean) # alot better than overall esp in top 3 where we have data
                        0          1
(0,0.0401]      0.9754098 1.00000000
(0.0401,0.0872] 0.5603448 0.28571429
(0.0872,0.27]   0.1188119 0.19047619
(0.27,0.671]    0.4528302 0.38028169
(0.671,1]       0.0000000 0.01176471
> tapply(nodegree, list(propbin, treat), mean) #  better than overall esp in top 3
                        0         1
(0,0.0401]      0.5245902 0.0000000
(0.0401,0.0872] 0.5775862 0.2857143
(0.0872,0.27]   0.6534653 0.7142857
(0.27,0.671]    0.5660377 0.5915493
(0.671,1]       0.7837838 0.8470588
> tapply(re74, list(propbin, treat), mean) #  better than overall 
                         0         1
(0,0.0401]      11879.0344 9381.5660
(0.0401,0.0872]  4716.3099 1490.1237
(0.0872,0.27]    1550.6141 3177.4622
(0.27,0.671]     4319.4811 3540.6004
(0.671,1]         777.6701  585.4043
> tapply(re75, list(propbin, treat), mean) #  better than overall 
                        0         1
(0,0.0401]      3762.2162  853.7225
(0.0401,0.0872] 2617.8838 1500.3862
(0.0872,0.27]   1254.2185 2009.7170
(0.27,0.671]    2504.0532 1858.2195
(0.671,1]        974.7579 1152.1901

##############################

> # lab text does a quintile function (see below), like my pbin2 command above
> # IMPLEMENT LAB TEXT 3.3
> quintilePropScores = quantile( propen, probs = seq( from = 0, to = 1, by = .2) )
> quintilePropScores
     0%     20%     40%     60%     80%    100% 
0.00908 0.04015 0.08721 0.26978 0.67085 0.85315 
# has bin breaks to greater signif digits, that makes a small difference it turns out

> lalonde$PropScores = propen  # make var names consistent
#################################

######### now start looking at outcome
> tapply(re78, list(propbin, treat),mean)  # here are the mean diffs in re78 the outcome
                                             stratified by propensity quintile
# direction of mean diffs favors treatment, job training
                    0    1
(0,0.0401]      10467    0
(0.0401,0.0872]  5797 7919
(0.0872,0.27]    6043 9211
(0.27,0.671]     4977 5819
(0.671,1]        4666 6030

# contrast that with the comparison ignoring any concerns about self-selection (selection bias), effect in the other direction, but not significant
> tapply(re78, treat, mean)
       0        1 
6984.170 6349.144 

> t.test(re78 ~ treat)
        Welch Two Sample t-test
data:  re78 by treat
t = 0.9377, df = 326.412, p-value = 0.3491
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -697.192 1967.244
sample estimates:
mean in group 0 mean in group 1 
       6984.170        6349.144 

> lm(re78 ~ treat, data = lalonde) # another way to look
Call:
lm(formula = re78 ~ treat, data = lalonde)
Coefficients:
(Intercept)        treat  
       6984         -635  


##### do t-tests by subclassification (strata)
> tapply(re78, list(propbin, treat),t.test) # can't do a full set of t-tests
Error in t.test.default(X[[6L]], ...) : not enough 'x' observations

> # so for the 3 upper quintiles is the mean difference significant?

attributes for case 1, bin and propensity score
> propbin[1]
[1] (0.27,0.671]
Levels: (0,0.0401] (0.0401,0.0872] (0.0872,0.27] (0.27,0.671] (0.671,1]
> propen[1]
  NSW1 
0.6388 

> bins = c("(0,0.0401]", "(0.0401,0.0872]",   "(0.0872,0.27]",    "(0.27,0.671]",       "(0.671,1]")
> # i.e. make a string out of the bin designations, then pull-off entries
#####actually you can just do  t.test(re78[propbin == propbin[5]] ~ treat[propbin == propbin[5]]) # t-test for quintile 5


> t.test(re78[propbin == bins[5]] ~ treat[propbin == bins[5]]) # t-test for quintile 5

        Welch Two Sample t-test

data:  re78[propbin == bins[5]] by treat[propbin == bins[5]] 
t = -0.9844, df = 100.7, p-value = 0.3273
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -4113  1385 
sample estimates:
mean in group 0 mean in group 1 
           4666            6030 
# since we are doing 3 of these best practice would be to Bonferroni 
  and do each at Type 1 .017 instead of .05. Also Karen did these one-sided which you might justify,
  won't make a difference here; see Lab text version below

> t.test(re78[propbin == bins[4]] ~ treat[propbin == bins[4]])

        Welch Two Sample t-test

data:  re78[propbin == bins[4]] by treat[propbin == bins[4]] 
t = -0.7516, df = 108.7, p-value = 0.4539
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -3061  1378 
sample estimates:
mean in group 0 mean in group 1 
           4977            5819 

> t.test(re78[propbin == bins[3]] ~ treat[propbin == bins[3]])

        Welch Two Sample t-test

data:  re78[propbin == bins[3]] by treat[propbin == bins[3]] 
t = -1.536, df = 23.57, p-value = 0.1379
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -7428  1093 
sample estimates:
mean in group 0 mean in group 1 
           6043            9211 

> # even one-sided versions of these tests are not going to be signif; should also Bonferroni the Type I error

> t.test(re78[propbin == bins[2]] ~ treat[propbin == bins[2]]) # can also try bin 2 but really don't have enough data

        Welch Two Sample t-test

data:  re78[propbin == bins[2]] by treat[propbin == bins[2]] 
t = -1.083, df = 7.38, p-value = 0.3128
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -6708  2462 
sample estimates:
mean in group 0 mean in group 1 
           5797            7919 

## so in summary we can't find any evidence for the effectiveness of job training looking at each of the subclasses

##### lmer, a better way to do the t-tests  ########################
# Refer to week 4, lab2, TH1 for our background on lmer
# What we are going to do here is carry out the t-test within each of the 5 subclassifications (level-1 model)
# and then allow lmer to 'average' (weighting by the precision basically) those 5 comparisons. There are no
# conditioning variables in the level 2 model so the fixed effects are just the 'averages' of the level 1 effects

# being careful I added propen and the bin classification to the data set
> detach(lalonde)
> lalonde$propen = propen
> lalonde$bins = propbin
> lalonde[1:10,]
      treat age educ black hispan married nodegree re74 re75       re78     propen            bins
NSW1      1  37   11     1      0       1        1    0    0  9930.0460 0.63876993    (0.27,0.671]
NSW2      1  22    9     0      1       0        1    0    0  3595.8940 0.22463424   (0.0872,0.27]
NSW3      1  30   12     1      0       0        0    0    0 24909.4500 0.67824388       (0.671,1]
NSW4      1  27   11     1      0       0        1    0    0  7506.1460 0.77632408       (0.671,1]
NSW5      1  33    8     1      0       0        1    0    0   289.7899 0.70163874       (0.671,1]
NSW6      1  22    9     1      0       0        1    0    0  4056.4940 0.69906990       (0.671,1]
NSW7      1  23   12     1      0       0        0    0    0     0.0000 0.65368426    (0.27,0.671]
NSW8      1  32   11     1      0       0        1    0    0  8472.1580 0.78972311       (0.671,1]
NSW9      1  22   16     1      0       0        0    0    0  2164.0220 0.77983825       (0.671,1]
NSW10     1  33   12     0      0       1        0    0    0 12418.0700 0.04292461 (0.0401,0.0872]

> str(lalonde)
'data.frame':   614 obs. of  12 variables:
 $ treat   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ age     : int  37 22 30 27 33 22 23 32 22 33 ...
 $ educ    : int  11 9 12 11 8 9 12 11 16 12 ...
 $ black   : int  1 0 1 1 1 1 1 1 1 0 ...
 $ hispan  : int  0 1 0 0 0 0 0 0 0 0 ...
 $ married : int  1 0 0 0 0 0 0 0 0 1 ...
 $ nodegree: int  1 1 0 1 1 1 0 1 0 0 ...
 $ re74    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ re75    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ re78    : num  9930 3596 24909 7506 290 ...
 $ propen  : num  0.639 0.225 0.678 0.776 0.702 ...
 $ bins    : Factor w/ 5 levels "(0,0.0401]","(0.0401,0.0872]",..: 4 3 5 5 5 5 4 5 5 2 ...


> propen.lmer = lmer(re78 ~ treat + (1 + treat|bins), data = lalonde)
> summary(propen.lmer)
Linear mixed model fit by REML 
Formula: re78 ~ treat + (1 + treat | bins) 
   Data: lalonde 
   AIC   BIC logLik deviance REMLdev
 12649 12676  -6319    12668   12637
Random effects:
 Groups   Name        Variance Std.Dev. Corr   
 bins     (Intercept)  5208931 2282.3          
          treat        2069968 1438.7   -1.000 
 Residual             52597983 7252.4          
Number of obs: 614, groups: bins, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6434.3     1090.2   5.902
treat          383.9      951.5   0.403

Correlation of Fixed Effects:
      (Intr)
treat -0.796

# so here we have an overall estimate of the effect of the treat on re78 of positive $384, but
# far from significant. Much smaller point estimate than in some of the individual strata
# of course could/should do confint(propen.lmer), gonna be wide here
################################

##Ancova with propensity score alternative ##############################
#One not terrible, but not great, use of propensity scores which you likely will encounter 
in the literature is to use # the propensity score as a covariate.
# Not great, but not the worst analysis possible. An analogy could be made to covariance vs blocking
# where here we use regression smoothing rather than blocks (strat). Or to nonrandom assignment
# (week 5) where at each level of propensity, use a (biased) coin flip to assign to treat or control.
# One reason the ancova is not great is that the OLS regression will give most influence to extremes
# of propen, where we have no overlap.

> propen_ancova = lm(re78 ~ treat + propen)
> summary(propen_ancova)
Call:
lm(formula = re78 ~ treat + propen)
Residuals:
   Min     1Q Median     3Q    Max 
 -8925  -5632  -2148   3993  54953 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7833.8      430.7  18.188  < 2e-16 ***
treat         1207.6      834.1   1.448 0.148200    
propen       -4662.4     1319.4  -3.534 0.000441 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7402 on 611 degrees of freedom
Multiple R-squared:  0.02152,   Adjusted R-squared:  0.01832 
F-statistic:  6.72 on 2 and 611 DF,  p-value: 0.001298

# so bigger point estimate (in the same direction) than lmer on strata, but still not significant.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
### Karen's original functions (for analysis the lmer full is better; Karen followed the t-test in Gary King's tutorial)
> # lab text does a more sophisticated quintile function
> # IMPLEMENT LAB TEXT 3.3
> quintilePropScores = quantile( propen, probs = seq( from = 0, to = 1, by = .2) )
> quintilePropScores[1:10]
     0%     20%     40%     60%     80%    100%    <NA>    <NA>    <NA>    <NA> 
0.00908 0.04015 0.08721 0.26978 0.67085 0.85315      NA      NA      NA      NA 
> quintilePropScores
     0%     20%     40%     60%     80%    100% 
0.00908 0.04015 0.08721 0.26978 0.67085 0.85315 
> lalonde$PropScores = propen
> lalonde$PropScores[1:10]
 [1] 0.63877 0.22463 0.67824 0.77632 0.70164 0.69907 0.65368 0.78972 0.77984 0.04292

> # here's a pretty fancy do-loop version from Lab script
> treated = list(5)
> controls = list(5)

> for ( i in 1:5 ) {
+ treated[[i]] = subset( x = lalonde,
+ subset = PropScores >= quintilePropScores[i] &
+ PropScores < quintilePropScores[i+1] &
+ treat == 1)
+ controls[[i]] = subset( x = lalonde,
+ subset = PropScores >= quintilePropScores[i] &
+ PropScores < quintilePropScores[i+1] &
+ treat == 0)
+ }
> for ( i in 1:5 ) {
+ t.test( treated[[i]]$re78,
+ controls[[i]]$re78,
+ alternative = "greater" )
+ }
Error in t.test.default(treated[[i]]$re78, controls[[i]]$re78, alternative = "greater") : 
        not enough 'x' observations

# so I change the loop in Lab text to exclude testing bin 1
> for ( i in 2:5 ) {
+ t.test( treated[[i]]$re78,
+ controls[[i]]$re78,
+ alternative = "greater" )
+ }
> ptests = for ( i in 2:5 ) {
+ t.test( treated[[i]]$re78,
+ controls[[i]]$re78,
+ alternative = "greater" )
+ }
> ptests

        Welch Two Sample t-test

data:  treated[[i]]$re78 and controls[[i]]$re78 
t = 1.120, df = 108.3, p-value = 0.1326
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -735.8    Inf 
sample estimates:
mean of x mean of y 
     6027      4499 

> t.test( treated[[5]]$re78,
+ controls[[5]]$re78,
+ alternative = "greater" )

        Welch Two Sample t-test

data:  treated[[5]]$re78 and controls[[5]]$re78 
t = 1.120, df = 108.3, p-value = 0.1326
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -735.8    Inf 
sample estimates:
mean of x mean of y 
     6027      4499 

> t.test( treated[[4]]$re78,
+ controls[[4]]$re78,
+ alternative = "greater" )

        Welch Two Sample t-test

data:  treated[[4]]$re78 and controls[[4]]$re78 
t = 0.6149, df = 103.4, p-value = 0.27
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -1193   Inf 
sample estimates:
mean of x mean of y 
     5819      5117 

> t.test( treated[[3]]$re78,
+ controls[[3]]$re78,
+ alternative = "greater" )

        Welch Two Sample t-test

data:  treated[[3]]$re78 and controls[[3]]$re78 
t = 1.536, df = 23.57, p-value = 0.06895
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -363.7    Inf 
sample estimates:
mean of x mean of y 
     9211      6043 

> length(treated[[3]]$re78)
[1] 21
> mean(treated[[3]]$re78)
[1] 9211
> length(controls[[3]]$re78)
[1] 101
> mean(controls[[3]]$re78)
[1] 6043
> # those match my bins
> length(treated[[4]]$re78)
[1] 71
> mean(treated[[4]]$re78)
[1] 5819
> length(controls[[4]]$re78)
[1] 51
> mean(controls[[4]]$re78)
[1] 5117
> # our different bins procedure are a little discrepant and that matters, 2 controls
##########################################################

####Doing this by hand vs letting MatchIt do all
##
##if you type at the R-prompt 'demo(subclass)' you will be taken
##on a tour of MatchIt output. Last time I tried it This demo uses a subset of our matching
##variables, hardwired to 6 subclasses, but these fail to be of equal
##size (even close) like they are supposed to be. Shows you the balance statistics
##and some charming plots, (e.g. absolute standardized diffs)if you let it run. 
##For pedagogy I chose to do all this step by step.
# The matchit JSS paper, linked in week 8, provides better explanation of using the subclass option
# than the main matchit docs
#####################*****end Lab script part 3 ***********###################################