WEEK1 Intro Matching Methods CHPR290/Stat266 R version 3.3.3 (2017-03-06) -- "Another Canoe" #### Week 1 session. Lalonde data # If you start from a relatively clean install, I start this from scratch, first get MatchIt and optmatch # some years order matters because of complication with license for optmatch algorithms this year appears smooth > # install matching packages > install.packages("optmatch") # I include all the install chatter --- Please select a CRAN mirror for use in this session --- also installing the dependencies ‘SparseM’, ‘abind’, ‘xtable’, ‘svd’, ‘Rcpp’, ‘RItools’, ‘digest’ trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/SparseM_1.76.zip' Content type 'application/zip' length 934958 bytes (913 KB) downloaded 913 KB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/abind_1.4-5.zip' Content type 'application/zip' length 40157 bytes (39 KB) downloaded 39 KB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/xtable_1.8-2.zip' Content type 'application/zip' length 710121 bytes (693 KB) downloaded 693 KB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/svd_0.4.zip' Content type 'application/zip' length 494588 bytes (482 KB) downloaded 482 KB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/Rcpp_0.12.10.zip' Content type 'application/zip' length 3316792 bytes (3.2 MB) downloaded 3.2 MB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/RItools_0.1-15.zip' Content type 'application/zip' length 194909 bytes (190 KB) downloaded 190 KB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/digest_0.6.12.zip' Content type 'application/zip' length 172751 bytes (168 KB) downloaded 168 KB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/optmatch_0.9-7.zip' Content type 'application/zip' length 1563360 bytes (1.5 MB) downloaded 1.5 MB package ‘SparseM’ successfully unpacked and MD5 sums checked package ‘abind’ successfully unpacked and MD5 sums checked package ‘xtable’ successfully unpacked and MD5 sums checked package ‘svd’ successfully unpacked and MD5 sums checked package ‘Rcpp’ successfully unpacked and MD5 sums checked package ‘RItools’ successfully unpacked and MD5 sums checked package ‘digest’ successfully unpacked and MD5 sums checked package ‘optmatch’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages > library(optmatch) Loading required package: survival The optmatch package has an academic license. Enter relaxinfo() for more information. > install.packages("MatchIt") Installing package into ‘C:/Users/rag/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/MatchIt_2.4-22.zip' Content type 'application/zip' length 360769 bytes (352 KB) downloaded 352 KB package ‘MatchIt’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages > library(MatchIt) Loading required package: MASS > vignette(package = "MatchIt") Vignettes in package ‘MatchIt’: matchit MatchIt: Nonparametric Preprocessing for Parametric Causal Inference (source, pdf) > ## install other packages we use in CC1 > install.packages("lme4") Installing package into ‘C:/Users/rag/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) also installing the dependencies ‘minqa’, ‘nloptr’, ‘RcppEigen’ trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/minqa_1.2.4.zip' Content type 'application/zip' length 624508 bytes (609 KB) downloaded 609 KB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/nloptr_1.0.4.zip' Content type 'application/zip' length 1172933 bytes (1.1 MB) downloaded 1.1 MB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/RcppEigen_0.3.2.9.1.zip' Content type 'application/zip' length 2117664 bytes (2.0 MB) downloaded 2.0 MB trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/lme4_1.1-12.zip' Content type 'application/zip' length 4768645 bytes (4.5 MB) downloaded 4.5 MB package ‘minqa’ successfully unpacked and MD5 sums checked package ‘nloptr’ successfully unpacked and MD5 sums checked package ‘RcppEigen’ successfully unpacked and MD5 sums checked package ‘lme4’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages > library(lme4) Loading required package: Matrix > install.packages("PSAgraphics") Installing package into ‘C:/Users/rag/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) trying URL 'http://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/PSAgraphics_2.1.1.zip' Content type 'application/zip' length 89957 bytes (87 KB) downloaded 87 KB package ‘PSAgraphics’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\rag\AppData\Local\Temp\RtmpOIz4Kb\downloaded_packages > library(PSAgraphics) Loading required package: rpart > data(lalonde) > help(lalonde) starting httpd help server ... done #produces ----------------------------------- lalonde package:MatchIt R Documentation Data from National Supported Work Demonstration and PSID, as analyzed by Dehejia and Wahba (1999). Description: This is a subsample of the data from the treated group in the National Supported Work Demonstration (NSW) and the comparison sample from the Current Population Survey (CPS). This data was previously analyzed extensively by Lalonde (1986) and Dehejia and Wahba (1999). The full dataset is available at . [note: broken link still in current documentation] Usage: data(lalonde) Format: A data frame with 313 [sic, 614] observations (185 treated, 429 control). There are 10 variables measured for each individual. "treat" is the treatment assignment (1=treated, 0=control). "age" is age in years. "educ" is education in number of years of schooling. "black" is an indicator for African-American (1=African-American, 0=not). "hispan" is an indicator for being of Hispanic origin (1=Hispanic, 0=not). "married" is an indicator for married (1=married, 0=not married). "nodegree" is an indicator for whether the individual has a high school degree (1=no degree, 0=degree). "re74" is income in 1974, in U.S. dollars. "re75" is income in 1975, in U.S. dollars. "re78" is income in 1978, in U.S. dollars. References: Lalonde, R. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76: 604-620. Dehejia, R.H. and Wahba, S. (1999). Causal Effects in Nonexperimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94: 1053-1062. ----------------------------------------------- > dim(lalonde) [1] 614 10 > attach(lalonde) > table(treat) # so these summaries synch with data description treat 0 1 429 185 > head(lalonde) treat age educ black hispan married nodegree re74 re75 re78 NSW1 1 37 11 1 0 1 1 0 0 9930.0460 NSW2 1 22 9 0 1 0 1 0 0 3595.8940 NSW3 1 30 12 1 0 0 0 0 0 24909.4500 NSW4 1 27 11 1 0 0 1 0 0 7506.1460 NSW5 1 33 8 1 0 0 1 0 0 289.7899 NSW6 1 22 9 1 0 0 1 0 0 4056.4940 ################## prelim compare groups on outcome measure > tapply(re78, treat, median) 0 1 4975.505 4232.309 > tapply(re78, treat, fivenum) $`0` [1] 0.0000 220.1813 4975.5050 11688.8200 25564.6700 $`1` [1] 0.0000 485.2298 4232.3090 9642.9990 60307.9300 > t.test(re78 ~ treat) Welch Two Sample t-test data: re78 by treat t = 0.93773, df = 326.41, p-value = 0.3491 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -697.192 1967.244 sample estimates: mean in group 0 mean in group 1 6984.170 6349.144 > wilcox.test(re78 ~ treat. conf.int = TRUE) Error: unexpected symbol in "wilcox.test(re78 ~ treat. conf.int" > wilcox.test(re78 ~ treat, conf.int = TRUE) Wilcoxon rank sum test with continuity correction data: re78 by treat W = 41840, p-value = 0.2818 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -4.664401e-05 1.053159e+03 sample estimates: difference in location 5.053114e-05 > #####But wait, some say "we are never done until the ancova is run" see Fish > # as we see the social science, life science practice is to put into prediction of outcome the treatment variable and > # a whole bunch of other variables to "control" for self-selection, nonequivalence etc. Coeff of treatment is the causal effect. > # equivalent to analysis of covariance by whatever name > ancova.lalonde = lm( re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75) > summary(ancova.lalonde) Call: lm(formula = re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75) Residuals: Min 1Q Median 3Q Max -13595 -4894 -1662 3929 54570 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.651e+01 2.437e+03 0.027 0.9782 treat 1.548e+03 7.813e+02 1.982 0.0480 * # so treatment is significantly helpful ?? age 1.298e+01 3.249e+01 0.399 0.6897 educ 4.039e+02 1.589e+02 2.542 0.0113 * black -1.241e+03 7.688e+02 -1.614 0.1071 hispan 4.989e+02 9.419e+02 0.530 0.5966 married 4.066e+02 6.955e+02 0.585 0.5590 nodegree 2.598e+02 8.474e+02 0.307 0.7593 re74 2.964e-01 5.827e-02 5.086 4.89e-07 *** re75 2.315e-01 1.046e-01 2.213 0.0273 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 6948 on 604 degrees of freedom Multiple R-squared: 0.1478, Adjusted R-squared: 0.1351 F-statistic: 11.64 on 9 and 604 DF, p-value: < 2.2e-16 > > ########### Begin matching analysis; Quintile Subclassification with Propensity Scores ## original Rosenbaum-Rubin, cardiac; Rubin breast cancer > # now do the logistic regression that computes propensity scores (matching packages will do this for you with propen as distance measure) > > glm.p = glm( treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, family = binomial) > summary(glm.p) Call: glm(formula = treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, family = binomial, data = lalonde) Deviance Residuals: Min 1Q Median 3Q Max -1.7645 -0.4736 -0.2862 0.7508 2.7169 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.729e+00 1.017e+00 -4.649 3.33e-06 *** age 1.578e-02 1.358e-02 1.162 0.24521 educ 1.613e-01 6.513e-02 2.477 0.01325 * black 3.065e+00 2.865e-01 10.699 < 2e-16 *** hispan 9.836e-01 4.257e-01 2.311 0.02084 * married -8.321e-01 2.903e-01 -2.866 0.00415 ** nodegree 7.073e-01 3.377e-01 2.095 0.03620 * re74 -7.178e-05 2.875e-05 -2.497 0.01253 * re75 5.345e-05 4.635e-05 1.153 0.24884 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 751.49 on 613 degrees of freedom Residual deviance: 487.84 on 605 degrees of freedom AIC: 505.84 Number of Fisher Scoring iterations: 5 > propen = fitted(glm.p) # now we have the propensity scores > quantile(propen) # overall distrib 0% 25% 50% 75% 100% 0.009080193 0.048536484 0.120676493 0.638715991 0.853152844 > tapply(propen, treat, quantile) # look at overlap via 5-number summary (or side-by-side boxplots) not good overlap, $`0` 0% 25% 50% 75% 100% 0.009080193 0.038880745 0.075849106 0.195135746 0.789172834 $`1` 0% 25% 50% 75% 100% 0.02495179 0.52646352 0.65368426 0.72659995 0.85315284 > # as we are fitting prob(treat = 1) fits for those in treatment group will be larger, we need good overlap for matching purposes > boxplot(propen ~ treat) #gives side-by-side boxplots, you can add labels, not wonderful overlap > detach(lalonde) > lalonde$propen = propen > attach(lalonde) The following object is masked _by_ .GlobalEnv: propen > head(lalonde) treat age educ black hispan married nodegree re74 re75 re78 propen NSW1 1 37 11 1 0 1 1 0 0 9930.0460 0.6387699 NSW2 1 22 9 0 1 0 1 0 0 3595.8940 0.2246342 NSW3 1 30 12 1 0 0 0 0 0 24909.4500 0.6782439 NSW4 1 27 11 1 0 0 1 0 0 7506.1460 0.7763241 NSW5 1 33 8 1 0 0 1 0 0 289.7899 0.7016387 NSW6 1 22 9 1 0 0 1 0 0 4056.4940 0.6990699 #### looking at overlap, histograms > p1 = propen[treat == 1] > length(p1) [1] 185 > > p0 = propen[treat == 0] > length(p0) [1] 429 > > fivenum(p1) NSW124 NSW156 NSW50 NSW119 NSW178 0.02495179 0.52646352 0.65368426 0.72659995 0.85315284 > > fivenum(p0) PSID296 PSID347 PSID221 PSID334 PSID118 0.009080193 0.038880745 0.075849106 0.195135746 0.789172834 > > hist(p0,col=rgb(0,0,1,0.7),xlim=range(c(p0,p1))) > hist(p1,col=rgb(1,0,0,0.7),add=T) > # superimposed propensity histograms, like Ben Hansen SAT, contol is blue, treatment is red, overlap close to perfect Stanford Cardinal red > hist(p0, breaks = "FD", col=rgb(0,0,1,0.7),xlim=range(c(p0,p1))) > hist(p1, breaks = "FD", col=rgb(1,0,0,0.7),add=T) ############################## start subclassification > ### make quintiles of propensity distribution > pbin = cut(propen, quantile(propen, seq(0, 1, 1/5)), include.lowest = TRUE, labels = FALSE) > detach(lalonde) > lalonde$bins = pbin > attach(lalonde) The following object is masked _by_ .GlobalEnv: propen > table(pbin, treat) treat pbin 0 1 1 122 1 2 116 7 3 101 21 4 53 71 5 37 85 > ##### examples of checking balance (more to come) > tapply(age, list(bins, treat), median) 0 1 1 29 27 2 26 23 3 20 23 4 24 25 5 19 25 > ### we already did upfront install.packages("PSAgraphics") > # and library(PSAgraphics) > box.psa(age, treat, bins) # a graphical display of balance or not Warning messages: 1: In xy.coords(x, y) : NAs introduced by coercion 2: In xy.coords(x, y) : NAs introduced by coercion > #################################### examine outcome by strata > tapply(re78, list(bins, treat),mean) # here are the mean diffs in re78 (the outcome) stratified by propensity quintile 0 1 1 10467.064 0.000 2 5796.548 7919.316 3 6043.316 9210.726 4 4977.401 5819.143 5 4666.221 6030.258 > # direction of mean diffs favors treatment, job training > > # contrast that with the comparison ignoring any concerns about self-selection (selection bias), effect in the other direction, but not significant > tapply(re78, treat, mean) 0 1 6984.170 6349.144 > ##### can do t-tests by subclassification (strata) > # e.g. for the 3 upper quintiles is the mean difference significant? since we are doing 3 of these best practice would be to Bonferroni > ## we won't find any evidence for the effectiveness of job training looking at each of the subclasses > > ##### lmer, a better way to do the t-tests ######################## > # we already did library(lme4) > > propen.lmer = lmer(re78 ~ treat + (1 + treat|bins), data = lalonde) > summary(propen.lmer) Linear mixed model fit by REML ['lmerMod'] Formula: re78 ~ treat + (1 + treat | bins) Data: lalonde REML criterion at convergence: 12637.1 Scaled residuals: Min 1Q Median 3Q Max -1.3976 -0.7541 -0.2878 0.5408 7.4535 Random effects: Groups Name Variance Std.Dev. Corr bins (Intercept) 5208943 2282 treat 2069963 1439 -1.00 Residual 52597981 7252 Number of obs: 614, groups: bins, 5 Fixed effects: Estimate Std. Error t value (Intercept) 6434.2 1090.2 5.902 treat 385.7 950.8 0.406 Correlation of Fixed Effects: (Intr) treat -0.795 > # so here we have an overall estimate of the effect of the treat on re78 of positive $386, but > # far from significant. Much smaller point estimate than in some of the individual strata > > ## confint(propen.lmer) # bombs > confint(propen.lmer, method = "boot", nsim = 1000, boot.type = "perc") Computing bootstrap confidence intervals ... 2.5 % 97.5 % .sig01 427.46839 4085.240 .sig02 -1.00000 1.000 .sig03 55.88642 3646.069 .sigma 6846.48238 7654.688 (Intercept) 4432.19374 8696.164 treat -1683.72564 2566.001 ### CI for effect of treatment Warning messages: 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : unable to evaluate scaled gradient 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge: degenerate Hessian with 1 negative eigenvalues 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : unable to evaluate scaled gradient 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge: degenerate Hessian with 1 negative eigenvalues 5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : unable to evaluate scaled gradient 6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge: degenerate Hessian with 1 negative eigenvalues 7: In bootMer(object, FUN = FUN, nsim = nsim, ...) : some bootstrap runs failed (9/1000) > ########################### Full Matching (Hansen, via Rosenbaum, using MatchIt) > > m2full.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "full") Warning message: In optmatch::fullmatch(d, ...) : Without 'data' argument the order of the match is not guaranteed to be the same as your original data. > summary(m2full.out) Call: matchit(formula = treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "full") Summary of balance for all data: Means Treated Means Control Mean Diff eQQ Med eQQ Mean eQQ Max distance 0.5774 0.1822 0.3952 0.5176 0.3955 0.5966 re74 2095.5737 5619.2365 -3523.6628 2425.5720 3620.9240 9216.5000 re75 1532.0553 2466.4844 -934.4291 981.0968 1060.6582 6795.0100 educ 10.3459 10.2354 0.1105 1.0000 0.7027 4.0000 black 0.8432 0.2028 0.6404 1.0000 0.6432 1.0000 hispan 0.0595 0.1422 -0.0827 0.0000 0.0811 1.0000 age 25.8162 28.0303 -2.2141 1.0000 3.2649 10.0000 married 0.1892 0.5128 -0.3236 0.0000 0.3243 1.0000 nodegree 0.7081 0.5967 0.1114 0.0000 0.1135 1.0000 Summary of balance for matched data: Means Treated Means Control Mean Diff eQQ Med eQQ Mean eQQ Max distance 0.5774 0.5761 0.0013 0.0026 0.0066 0.096 re74 2095.5737 2199.7126 -104.1390 72.6510 512.7210 13121.750 re75 1532.0553 1524.8362 7.2191 209.6655 460.5643 12746.050 educ 10.3459 10.3227 0.0233 0.0000 0.4596 4.000 black 0.8432 0.8347 0.0086 0.0000 0.0020 1.000 hispan 0.0595 0.0583 0.0012 0.0000 0.0012 1.000 age 25.8162 24.6928 1.1235 3.0000 3.3100 9.000 married 0.1892 0.1285 0.0607 0.0000 0.0544 1.000 nodegree 0.7081 0.7040 0.0041 0.0000 0.0028 1.000 Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance 99.6662 99.5001 98.3388 83.9052 re74 97.0446 97.0048 85.8401 -42.3724 re75 99.2274 78.6295 56.5775 -87.5796 educ 78.9494 100.0000 34.5954 0.0000 black 98.6582 100.0000 99.6891 0.0000 hispan 98.5858 0.0000 98.5200 0.0000 age 49.2583 -200.0000 -1.3825 10.0000 married 81.2495 0.0000 83.2267 0.0000 nodegree 96.3435 0.0000 97.5333 0.0000 Sample sizes: Control Treated All 429 185 Matched 429 185 Unmatched 0 0 Discarded 0 0 > summary(m2full.out, standardize = T) Call: matchit(formula = treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "full") Summary of balance for all data: Means Treated Means Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.5774 0.1822 1.7941 0.3964 0.3774 0.6444 re74 2095.5737 5619.2365 -0.7211 0.2335 0.2248 0.4470 re75 1532.0553 2466.4844 -0.2903 0.1355 0.1342 0.2876 educ 10.3459 10.2354 0.0550 0.0228 0.0347 0.1114 black 0.8432 0.2028 1.7568 0.3202 0.3202 0.6404 hispan 0.0595 0.1422 -0.3489 0.0414 0.0414 0.0827 age 25.8162 28.0303 -0.3094 0.0827 0.0813 0.1577 married 0.1892 0.5128 -0.8241 0.1618 0.1618 0.3236 nodegree 0.7081 0.5967 0.2443 0.0557 0.0557 0.1114 Summary of balance for matched data: Means Treated Means Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.5774 0.5761 0.0060 0.0060 0.0085 0.0596 re74 2095.5737 2199.7126 -0.0213 0.0160 0.0476 0.2268 re75 1532.0553 1524.8362 0.0022 0.0348 0.0693 0.2324 educ 10.3459 10.3227 0.0116 0.0286 0.0275 0.0568 black 0.8432 0.8347 0.0236 0.0104 0.0104 0.0208 hispan 0.0595 0.0583 0.0049 0.0036 0.0036 0.0072 age 25.8162 24.6928 0.1570 0.0416 0.0857 0.3436 married 0.1892 0.1285 0.1545 0.0366 0.0366 0.0732 nodegree 0.7081 0.7040 0.0089 0.0008 0.0008 0.0016 Percent Balance Improvement: Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 99.6662 98.4863 97.7452 90.7506 re74 97.0446 93.1488 78.8321 49.2658 re75 99.2274 74.3198 48.3597 19.2062 educ 78.9494 -25.6137 20.8722 48.9995 black 98.6582 96.7523 96.7523 96.7523 hispan 98.5858 91.2972 91.2972 91.2972 age 49.2583 49.7246 -5.3122 -117.8448 married 81.2495 77.3817 77.3817 77.3817 nodegree 96.3435 98.5634 98.5634 98.5634 Sample sizes: Control Treated All 429 185 Matched 429 185 Unmatched 0 0 Discarded 0 0 > plot(summary(m2full.out, standardize = T)) [1] "To identify the variables, use first mouse button; to stop, use second." integer(0) > # click on points to get labels, right-click to exit > setwd("D:\\drr17\\chpr290\\week1\\") > plot(m2full.out) # gives you QQ plots for each var Waiting to confirm page change... Waiting to confirm page change... > detach(lalonde) > m2full.dat = match.data(m2full.out) # obtain results from the full matching > head(m2full.dat) treat age educ black hispan married nodegree re74 re75 re78 propen bins distance weights subclass NSW1 1 37 11 1 0 1 1 0 0 9930.0460 0.6387699 4 0.6387699 1 1 NSW2 1 22 9 0 1 0 1 0 0 3595.8940 0.2246342 3 0.2246342 1 103 NSW3 1 30 12 1 0 0 0 0 0 24909.4500 0.6782439 5 0.6782439 1 8 NSW4 1 27 11 1 0 0 1 0 0 7506.1460 0.7763241 5 0.7763241 1 15 NSW5 1 33 8 1 0 0 1 0 0 289.7899 0.7016387 5 0.7016387 1 24 NSW6 1 22 9 1 0 0 1 0 0 4056.4940 0.6990699 5 0.6990699 1 32 > attach(m2full.dat) The following object is masked _by_ .GlobalEnv: propen > > # so you can see match.data appends 3 colums "distance" "weights" "subclass" to the original data set > table(m2full.dat$subclass) #the 104 subclasses have various sizes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 2 13 2 7 3 5 3 2 4 2 8 3 2 2 9 4 2 9 6 14 3 2 2 6 3 4 2 4 3 3 2 2 2 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 3 3 14 5 3 3 2 6 2 5 3 2 10 2 4 8 3 2 14 7 2 14 2 2 4 40 2 2 3 2 91 5 3 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 2 13 3 3 2 3 70 2 5 6 2 2 13 2 2 2 2 2 7 3 2 2 3 2 2 2 2 3 6 4 3 4 2 100 101 102 103 104 5 2 2 4 5 > ###### outcome comparison over the (matched) subclasses > mfull.lmer = lmer(re78 ~ treat + (1 + treat|subclass), data = m2full.dat) # like for the quintiles in base section, t-test in each subclass > summary(mfull.lmer) Linear mixed model fit by REML ['lmerMod'] Formula: re78 ~ treat + (1 + treat | subclass) Data: m2full.dat REML criterion at convergence: 12633.5 Scaled residuals: Min 1Q Median 3Q Max -1.5267 -0.7497 -0.2851 0.5165 7.4616 Random effects: Groups Name Variance Std.Dev. Corr subclass (Intercept) 4027897 2007 treat 3810081 1952 -0.89 Residual 51103906 7149 Number of obs: 614, groups: subclass, 104 Fixed effects: Estimate Std. Error t value (Intercept) 5862.9 507.8 11.546 treat 504.5 736.2 0.685 ## about the same as seen in base section 384 (951) Correlation of Fixed Effects: (Intr) treat -0.679 > confint(mfull.lmer) Computing profile confidence intervals ... 2.5 % 97.5 % .sig01 1216.8601 3011.801 .sig02 -1.0000 1.000 .sig03 0.0000 Inf .sigma 6740.8621 7581.414 (Intercept) 4807.1941 6873.722 treat -985.7685 1977.973 ### CI for effect of treatment There were 50 or more warnings (use warnings() to see the first 50) >