WEEK1  Intro Matching Methods                                    CHPR290/Stat266
R version 3.3.3 (2017-03-06) -- "Another Canoe"
#### Week 1 session. Lalonde data
# If you start from a relatively clean install, I start this from scratch, first get MatchIt and optmatch
# some years order matters because of complication with license for optmatch algorithms this year appears smooth

> # install matching packages
> install.packages("optmatch")   # I include all the install chatter
--- Please select a CRAN mirror for use in this session ---
also installing the dependencies ‘SparseM’, ‘abind’, ‘xtable’, ‘svd’, ‘Rcpp’, ‘RItools’, ‘digest’

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/SparseM_1.76.zip'
Content type 'application/zip' length 934958 bytes (913 KB)
downloaded 913 KB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/abind_1.4-5.zip'
Content type 'application/zip' length 40157 bytes (39 KB)
downloaded 39 KB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/xtable_1.8-2.zip'
Content type 'application/zip' length 710121 bytes (693 KB)
downloaded 693 KB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/svd_0.4.zip'
Content type 'application/zip' length 494588 bytes (482 KB)
downloaded 482 KB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/Rcpp_0.12.10.zip'
Content type 'application/zip' length 3316792 bytes (3.2 MB)
downloaded 3.2 MB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/RItools_0.1-15.zip'
Content type 'application/zip' length 194909 bytes (190 KB)
downloaded 190 KB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/digest_0.6.12.zip'
Content type 'application/zip' length 172751 bytes (168 KB)
downloaded 168 KB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/optmatch_0.9-7.zip'
Content type 'application/zip' length 1563360 bytes (1.5 MB)
downloaded 1.5 MB

package ‘SparseM’ successfully unpacked and MD5 sums checked
package ‘abind’ successfully unpacked and MD5 sums checked
package ‘xtable’ successfully unpacked and MD5 sums checked
package ‘svd’ successfully unpacked and MD5 sums checked
package ‘Rcpp’ successfully unpacked and MD5 sums checked
package ‘RItools’ successfully unpacked and MD5 sums checked
package ‘digest’ successfully unpacked and MD5 sums checked
package ‘optmatch’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages
> library(optmatch)
Loading required package: survival
The optmatch package has an academic license. Enter relaxinfo() for more information.
> install.packages("MatchIt")
Installing package into ‘C:/Users/rag/Documents/R/win-library/3.3’
(as ‘lib’ is unspecified)
trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/MatchIt_2.4-22.zip'
Content type 'application/zip' length 360769 bytes (352 KB)
downloaded 352 KB

package ‘MatchIt’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages
> library(MatchIt)
Loading required package: MASS

> vignette(package = "MatchIt")
Vignettes in package ‘MatchIt’:

matchit                                MatchIt: Nonparametric Preprocessing for Parametric Causal Inference (source,
                                       pdf)


> ## install other packages we use in CC1
> install.packages("lme4")
Installing package into ‘C:/Users/rag/Documents/R/win-library/3.3’
(as ‘lib’ is unspecified)
also installing the dependencies ‘minqa’, ‘nloptr’, ‘RcppEigen’

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/minqa_1.2.4.zip'
Content type 'application/zip' length 624508 bytes (609 KB)
downloaded 609 KB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/nloptr_1.0.4.zip'
Content type 'application/zip' length 1172933 bytes (1.1 MB)
downloaded 1.1 MB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/RcppEigen_0.3.2.9.1.zip'
Content type 'application/zip' length 2117664 bytes (2.0 MB)
downloaded 2.0 MB

trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/lme4_1.1-12.zip'
Content type 'application/zip' length 4768645 bytes (4.5 MB)
downloaded 4.5 MB

package ‘minqa’ successfully unpacked and MD5 sums checked
package ‘nloptr’ successfully unpacked and MD5 sums checked
package ‘RcppEigen’ successfully unpacked and MD5 sums checked
package ‘lme4’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages
> library(lme4)
Loading required package: Matrix
> install.packages("PSAgraphics")
Installing package into ‘C:/Users/rag/Documents/R/win-library/3.3’
(as ‘lib’ is unspecified)
trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/PSAgraphics_2.1.1.zip'
Content type 'application/zip' length 89957 bytes (87 KB)
downloaded 87 KB

package ‘PSAgraphics’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages
>   library(PSAgraphics)
Loading required package: rpart
> data(lalonde)

> help(lalonde)
starting httpd help server ... done
 #produces
-----------------------------------
lalonde               package:MatchIt               R Documentation

Data from National Supported Work Demonstration and PSID, as analyzed by Dehejia and Wahba (1999).

Description:

     This is a subsample of the data from the treated group in the
     National Supported Work Demonstration (NSW) and the comparison
     sample from the Current Population Survey (CPS).   This data was
     previously analyzed extensively by Lalonde (1986) and Dehejia and
     Wahba (1999).   The full dataset is available at <URL:
     http://www.columbia.edu/~rd247/nswdata.html>. [note: broken link still in current documentation]

Usage:

     data(lalonde)

Format:

     A data frame with 313 [sic, 614] observations (185 treated, 429 control). 
     There are 10 variables measured for each individual.  "treat" is
     the treatment assignment (1=treated, 0=control).   "age" is age in
     years.  "educ" is education in number of years of schooling. 
     "black" is an indicator for African-American (1=African-American,
     0=not).  "hispan" is an indicator for being of Hispanic origin
     (1=Hispanic, 0=not).  "married" is an indicator for married
     (1=married, 0=not married).  "nodegree" is an indicator for
     whether the individual has a high school degree (1=no degree,
     0=degree). "re74" is income in 1974, in U.S. dollars.  "re75" is
     income in 1975, in U.S. dollars.  "re78" is income in 1978, in
     U.S. dollars.


References:

     Lalonde, R. (1986). Evaluating the econometric evaluations of
     training programs  with experimental data. American Economic
     Review 76: 604-620.

     Dehejia, R.H. and Wahba, S. (1999).  Causal Effects in
     Nonexperimental Studies:  Re-Evaluating the Evaluation  of
     Training Programs.  Journal of the American  Statistical
     Association 94: 1053-1062.
-----------------------------------------------


> dim(lalonde)
[1] 614  10
> attach(lalonde)
> table(treat) # so these summaries synch with data description
treat
  0   1 
429 185 
> head(lalonde)
     treat age educ black hispan married nodegree re74 re75       re78
NSW1     1  37   11     1      0       1        1    0    0  9930.0460
NSW2     1  22    9     0      1       0        1    0    0  3595.8940
NSW3     1  30   12     1      0       0        0    0    0 24909.4500
NSW4     1  27   11     1      0       0        1    0    0  7506.1460
NSW5     1  33    8     1      0       0        1    0    0   289.7899
NSW6     1  22    9     1      0       0        1    0    0  4056.4940

################## prelim compare groups on outcome measure
> tapply(re78, treat, median)
       0        1 
4975.505 4232.309 
> tapply(re78, treat, fivenum)
$`0`
[1]     0.0000   220.1813  4975.5050 11688.8200 25564.6700

$`1`
[1]     0.0000   485.2298  4232.3090  9642.9990 60307.9300

> t.test(re78 ~ treat)

        Welch Two Sample t-test

data:  re78 by treat
t = 0.93773, df = 326.41, p-value = 0.3491
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -697.192 1967.244
sample estimates:
mean in group 0 mean in group 1 
       6984.170        6349.144 

> wilcox.test(re78 ~ treat. conf.int = TRUE)
Error: unexpected symbol in "wilcox.test(re78 ~ treat. conf.int"
> wilcox.test(re78 ~ treat, conf.int = TRUE)

        Wilcoxon rank sum test with continuity correction

data:  re78 by treat
W = 41840, p-value = 0.2818
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -4.664401e-05  1.053159e+03
sample estimates:
difference in location 
          5.053114e-05 

> #####But wait, some say "we are never done until the ancova is run" see Fish
> # as we see the social science, life science practice is to put into prediction of outcome the treatment variable  and
> # a whole bunch of other variables to "control" for self-selection, nonequivalence etc. Coeff of treatment is the causal effect.
> #  equivalent to analysis of covariance by whatever name
> ancova.lalonde = lm( re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75)
>   summary(ancova.lalonde)

Call:
lm(formula = re78 ~ treat + age + educ + black + hispan + married + 
    nodegree + re74 + re75)

Residuals:
   Min     1Q Median     3Q    Max 
-13595  -4894  -1662   3929  54570 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.651e+01  2.437e+03   0.027   0.9782    
treat        1.548e+03  7.813e+02   1.982   0.0480 *  # so treatment is significantly helpful ??
age          1.298e+01  3.249e+01   0.399   0.6897    
educ         4.039e+02  1.589e+02   2.542   0.0113 *  
black       -1.241e+03  7.688e+02  -1.614   0.1071    
hispan       4.989e+02  9.419e+02   0.530   0.5966    
married      4.066e+02  6.955e+02   0.585   0.5590    
nodegree     2.598e+02  8.474e+02   0.307   0.7593    
re74         2.964e-01  5.827e-02   5.086 4.89e-07 ***
re75         2.315e-01  1.046e-01   2.213   0.0273 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6948 on 604 degrees of freedom
Multiple R-squared:  0.1478,    Adjusted R-squared:  0.1351 
F-statistic: 11.64 on 9 and 604 DF,  p-value: < 2.2e-16

> 
> ########### Begin matching analysis; Quintile Subclassification with Propensity Scores
## original Rosenbaum-Rubin, cardiac; Rubin breast cancer

> # now do the logistic regression that computes propensity scores (matching packages will do this for you with propen as distance measure)
> 
> glm.p = glm( treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, family = binomial)
>   summary(glm.p)

Call:
glm(formula = treat ~ age + educ + black + hispan + married + 
    nodegree + re74 + re75, family = binomial, data = lalonde)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7645  -0.4736  -0.2862   0.7508   2.7169  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -4.729e+00  1.017e+00  -4.649 3.33e-06 ***
age          1.578e-02  1.358e-02   1.162  0.24521    
educ         1.613e-01  6.513e-02   2.477  0.01325 *  
black        3.065e+00  2.865e-01  10.699  < 2e-16 ***
hispan       9.836e-01  4.257e-01   2.311  0.02084 *  
married     -8.321e-01  2.903e-01  -2.866  0.00415 ** 
nodegree     7.073e-01  3.377e-01   2.095  0.03620 *  
re74        -7.178e-05  2.875e-05  -2.497  0.01253 *  
re75         5.345e-05  4.635e-05   1.153  0.24884    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 751.49  on 613  degrees of freedom
Residual deviance: 487.84  on 605  degrees of freedom
AIC: 505.84

Number of Fisher Scoring iterations: 5

> propen = fitted(glm.p)  # now we have the propensity scores
>  quantile(propen) # overall distrib
         0%         25%         50%         75%        100% 
0.009080193 0.048536484 0.120676493 0.638715991 0.853152844 
> tapply(propen, treat, quantile) # look at overlap via 5-number summary (or side-by-side boxplots) not good overlap, 
$`0`
         0%         25%         50%         75%        100% 
0.009080193 0.038880745 0.075849106 0.195135746 0.789172834 

$`1`
        0%        25%        50%        75%       100% 
0.02495179 0.52646352 0.65368426 0.72659995 0.85315284 

> # as we are fitting prob(treat = 1) fits for those in treatment group will be larger, we need good overlap for matching purposes
> boxplot(propen ~ treat) #gives side-by-side boxplots, you can add labels, not wonderful overlap
>  detach(lalonde)
>  lalonde$propen = propen
>  attach(lalonde)
The following object is masked _by_ .GlobalEnv:

    propen

> head(lalonde)
     treat age educ black hispan married nodegree re74 re75       re78    propen
NSW1     1  37   11     1      0       1        1    0    0  9930.0460 0.6387699
NSW2     1  22    9     0      1       0        1    0    0  3595.8940 0.2246342
NSW3     1  30   12     1      0       0        0    0    0 24909.4500 0.6782439
NSW4     1  27   11     1      0       0        1    0    0  7506.1460 0.7763241
NSW5     1  33    8     1      0       0        1    0    0   289.7899 0.7016387
NSW6     1  22    9     1      0       0        1    0    0  4056.4940 0.6990699

#### looking at overlap, histograms
>  p1 = propen[treat == 1]
>   length(p1)
[1] 185
> 
>   p0 = propen[treat == 0]
>   length(p0)
[1] 429
> 
>   fivenum(p1)
    NSW124     NSW156      NSW50     NSW119     NSW178 
0.02495179 0.52646352 0.65368426 0.72659995 0.85315284 
>  
>   fivenum(p0)
    PSID296     PSID347     PSID221     PSID334     PSID118 
0.009080193 0.038880745 0.075849106 0.195135746 0.789172834 
> 
>   hist(p0,col=rgb(0,0,1,0.7),xlim=range(c(p0,p1)))
>   hist(p1,col=rgb(1,0,0,0.7),add=T)
>   # superimposed propensity histograms, like Ben Hansen SAT, contol is blue, treatment is red, overlap close to perfect Stanford Cardinal red
>  hist(p0, breaks = "FD", col=rgb(0,0,1,0.7),xlim=range(c(p0,p1)))
>  hist(p1, breaks = "FD", col=rgb(1,0,0,0.7),add=T)

############################## start subclassification
> ### make quintiles of propensity distribution
>   pbin = cut(propen, quantile(propen, seq(0, 1, 1/5)), include.lowest = TRUE, labels = FALSE)
>  detach(lalonde)
>  lalonde$bins = pbin
>  attach(lalonde)
The following object is masked _by_ .GlobalEnv:

    propen

>   table(pbin, treat)
    treat
pbin   0   1
   1 122   1
   2 116   7
   3 101  21
   4  53  71
   5  37  85

> ##### examples of checking balance (more to come)
>  tapply(age, list(bins, treat), median)
   0  1
1 29 27
2 26 23
3 20 23
4 24 25
5 19 25
> ### we already did upfront install.packages("PSAgraphics")
>   # and library(PSAgraphics)
>   box.psa(age, treat, bins) # a graphical display of balance or not
Warning messages:
1: In xy.coords(x, y) : NAs introduced by coercion
2: In xy.coords(x, y) : NAs introduced by coercion

> #################################### examine outcome by strata
>  tapply(re78, list(bins, treat),mean)  # here are the mean diffs in re78 (the outcome) stratified by propensity quintile
          0        1
1 10467.064    0.000
2  5796.548 7919.316
3  6043.316 9210.726
4  4977.401 5819.143
5  4666.221 6030.258
> # direction of mean diffs favors treatment, job training
> 
>  # contrast that with the comparison ignoring any concerns about self-selection (selection bias), effect in the other direction, but not significant
>   tapply(re78, treat, mean)
       0        1 
6984.170 6349.144 
> ##### can do t-tests by subclassification (strata)
>  # e.g. for the 3 upper quintiles is the mean difference significant? since we are doing 3 of these best practice would be to Bonferroni 
>  ##  we won't find any evidence for the effectiveness of job training looking at each of the subclasses
>  
>  ##### lmer, a better way to do the t-tests  ########################
>  # we already did library(lme4)
> 
>  propen.lmer = lmer(re78 ~ treat + (1 + treat|bins), data = lalonde)
>  summary(propen.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: re78 ~ treat + (1 + treat | bins)
   Data: lalonde

REML criterion at convergence: 12637.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3976 -0.7541 -0.2878  0.5408  7.4535 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 bins     (Intercept)  5208943 2282          
          treat        2069963 1439     -1.00
 Residual             52597981 7252          
Number of obs: 614, groups:  bins, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6434.2     1090.2   5.902
treat          385.7      950.8   0.406

Correlation of Fixed Effects:
      (Intr)
treat -0.795

> # so here we have an overall estimate of the effect of the treat on re78 of positive $386, but
> # far from significant. Much smaller point estimate than in some of the individual strata
> 
> ## confint(propen.lmer) # bombs
>  confint(propen.lmer, method = "boot", nsim = 1000, boot.type = "perc")
Computing bootstrap confidence intervals ...
                  2.5 %   97.5 %
.sig01        427.46839 4085.240
.sig02         -1.00000    1.000
.sig03         55.88642 3646.069
.sigma       6846.48238 7654.688
(Intercept)  4432.19374 8696.164
treat       -1683.72564 2566.001    ### CI for effect of treatment
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
7: In bootMer(object, FUN = FUN, nsim = nsim, ...) :
  some bootstrap runs failed (9/1000)



> ########################### Full Matching (Hansen, via Rosenbaum, using MatchIt)
> 
>   m2full.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "full")
Warning message:
In optmatch::fullmatch(d, ...) :
  Without 'data' argument the order of the match is not guaranteed
    to be the same as your original data.
> summary(m2full.out)

Call:
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age + married + nodegree, data = lalonde, method = "full")

Summary of balance for all data:
         Means Treated Means Control  Mean Diff   eQQ Med  eQQ Mean   eQQ Max
distance        0.5774        0.1822     0.3952    0.5176    0.3955    0.5966
re74         2095.5737     5619.2365 -3523.6628 2425.5720 3620.9240 9216.5000
re75         1532.0553     2466.4844  -934.4291  981.0968 1060.6582 6795.0100
educ           10.3459       10.2354     0.1105    1.0000    0.7027    4.0000
black           0.8432        0.2028     0.6404    1.0000    0.6432    1.0000
hispan          0.0595        0.1422    -0.0827    0.0000    0.0811    1.0000
age            25.8162       28.0303    -2.2141    1.0000    3.2649   10.0000
married         0.1892        0.5128    -0.3236    0.0000    0.3243    1.0000
nodegree        0.7081        0.5967     0.1114    0.0000    0.1135    1.0000


Summary of balance for matched data:
         Means Treated Means Control Mean Diff  eQQ Med eQQ Mean   eQQ Max
distance        0.5774        0.5761    0.0013   0.0026   0.0066     0.096
re74         2095.5737     2199.7126 -104.1390  72.6510 512.7210 13121.750
re75         1532.0553     1524.8362    7.2191 209.6655 460.5643 12746.050
educ           10.3459       10.3227    0.0233   0.0000   0.4596     4.000
black           0.8432        0.8347    0.0086   0.0000   0.0020     1.000
hispan          0.0595        0.0583    0.0012   0.0000   0.0012     1.000
age            25.8162       24.6928    1.1235   3.0000   3.3100     9.000
married         0.1892        0.1285    0.0607   0.0000   0.0544     1.000
nodegree        0.7081        0.7040    0.0041   0.0000   0.0028     1.000

Percent Balance Improvement:
         Mean Diff.   eQQ Med eQQ Mean  eQQ Max
distance    99.6662   99.5001  98.3388  83.9052
re74        97.0446   97.0048  85.8401 -42.3724
re75        99.2274   78.6295  56.5775 -87.5796
educ        78.9494  100.0000  34.5954   0.0000
black       98.6582  100.0000  99.6891   0.0000
hispan      98.5858    0.0000  98.5200   0.0000
age         49.2583 -200.0000  -1.3825  10.0000
married     81.2495    0.0000  83.2267   0.0000
nodegree    96.3435    0.0000  97.5333   0.0000

Sample sizes:
          Control Treated
All           429     185
Matched       429     185
Unmatched       0       0
Discarded       0       0

> summary(m2full.out, standardize = T)

Call:
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age + married + nodegree, data = lalonde, method = "full")

Summary of balance for all data:
         Means Treated Means Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max
distance        0.5774        0.1822          1.7941   0.3964    0.3774   0.6444
re74         2095.5737     5619.2365         -0.7211   0.2335    0.2248   0.4470
re75         1532.0553     2466.4844         -0.2903   0.1355    0.1342   0.2876
educ           10.3459       10.2354          0.0550   0.0228    0.0347   0.1114
black           0.8432        0.2028          1.7568   0.3202    0.3202   0.6404
hispan          0.0595        0.1422         -0.3489   0.0414    0.0414   0.0827
age            25.8162       28.0303         -0.3094   0.0827    0.0813   0.1577
married         0.1892        0.5128         -0.8241   0.1618    0.1618   0.3236
nodegree        0.7081        0.5967          0.2443   0.0557    0.0557   0.1114


Summary of balance for matched data:
         Means Treated Means Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max
distance        0.5774        0.5761          0.0060   0.0060    0.0085   0.0596
re74         2095.5737     2199.7126         -0.0213   0.0160    0.0476   0.2268
re75         1532.0553     1524.8362          0.0022   0.0348    0.0693   0.2324
educ           10.3459       10.3227          0.0116   0.0286    0.0275   0.0568
black           0.8432        0.8347          0.0236   0.0104    0.0104   0.0208
hispan          0.0595        0.0583          0.0049   0.0036    0.0036   0.0072
age            25.8162       24.6928          0.1570   0.0416    0.0857   0.3436
married         0.1892        0.1285          0.1545   0.0366    0.0366   0.0732
nodegree        0.7081        0.7040          0.0089   0.0008    0.0008   0.0016

Percent Balance Improvement:
         Std. Mean Diff. eCDF Med eCDF Mean  eCDF Max
distance         99.6662  98.4863   97.7452   90.7506
re74             97.0446  93.1488   78.8321   49.2658
re75             99.2274  74.3198   48.3597   19.2062
educ             78.9494 -25.6137   20.8722   48.9995
black            98.6582  96.7523   96.7523   96.7523
hispan           98.5858  91.2972   91.2972   91.2972
age              49.2583  49.7246   -5.3122 -117.8448
married          81.2495  77.3817   77.3817   77.3817
nodegree         96.3435  98.5634   98.5634   98.5634

Sample sizes:
          Control Treated
All           429     185
Matched       429     185
Unmatched       0       0
Discarded       0       0

> plot(summary(m2full.out, standardize = T))
[1] "To identify the variables, use first mouse button; to stop, use second."
integer(0)

> # click on points to get labels, right-click to exit
>   setwd("D:\\drr17\\chpr290\\week1\\")
>   plot(m2full.out) # gives you QQ plots for each var
Waiting to confirm page change...
Waiting to confirm page change...
>  detach(lalonde)

>   m2full.dat = match.data(m2full.out)  # obtain results from the full matching
>   head(m2full.dat)
     treat age educ black hispan married nodegree re74 re75       re78    propen bins  distance weights subclass
NSW1     1  37   11     1      0       1        1    0    0  9930.0460 0.6387699    4 0.6387699       1        1
NSW2     1  22    9     0      1       0        1    0    0  3595.8940 0.2246342    3 0.2246342       1      103
NSW3     1  30   12     1      0       0        0    0    0 24909.4500 0.6782439    5 0.6782439       1        8
NSW4     1  27   11     1      0       0        1    0    0  7506.1460 0.7763241    5 0.7763241       1       15
NSW5     1  33    8     1      0       0        1    0    0   289.7899 0.7016387    5 0.7016387       1       24
NSW6     1  22    9     1      0       0        1    0    0  4056.4940 0.6990699    5 0.6990699       1       32
>  attach(m2full.dat)
The following object is masked _by_ .GlobalEnv:

    propen

> 
>   # so you can see match.data appends 3 colums "distance" "weights"  "subclass" to the original data set
>   table(m2full.dat$subclass) #the 104 subclasses have various sizes

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33 
  2  13   2   7   3   5   3   2   4   2   8   3   2   2   9   4   2   9   6  14   3   2   2   6   3   4   2   4   3   3   2   2   2 
 34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66 
  3   3  14   5   3   3   2   6   2   5   3   2  10   2   4   8   3   2  14   7   2  14   2   2   4  40   2   2   3   2  91   5   3 
 67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 
  2  13   3   3   2   3  70   2   5   6   2   2  13   2   2   2   2   2   7   3   2   2   3   2   2   2   2   3   6   4   3   4   2 
100 101 102 103 104 
  5   2   2   4   5 

> ###### outcome comparison over the (matched) subclasses
>  mfull.lmer = lmer(re78 ~ treat + (1 + treat|subclass), data = m2full.dat) # like for the quintiles in base section, t-test in each subclass
>  summary(mfull.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: re78 ~ treat + (1 + treat | subclass)
   Data: m2full.dat

REML criterion at convergence: 12633.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.5267 -0.7497 -0.2851  0.5165  7.4616 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 subclass (Intercept)  4027897 2007          
          treat        3810081 1952     -0.89
 Residual             51103906 7149          
Number of obs: 614, groups:  subclass, 104

Fixed effects:
            Estimate Std. Error t value
(Intercept)   5862.9      507.8  11.546
treat          504.5      736.2   0.685  ## about the same as seen in base section 384 (951)

Correlation of Fixed Effects:
      (Intr)
treat -0.679

> confint(mfull.lmer)
Computing profile confidence intervals ...
                2.5 %   97.5 %
.sig01      1216.8601 3011.801
.sig02        -1.0000    1.000
.sig03         0.0000      Inf
.sigma      6740.8621 7581.414
(Intercept) 4807.1941 6873.722
treat       -985.7685 1977.973   ### CI for effect of treatment
There were 50 or more warnings (use warnings() to see the first 50)
>