################ Week 1, Review Question 2 # Nearest neighbor (1:1) matching using MatchIt package > mn.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "nearest") > summary(mn.out) Call: matchit(formula = treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "nearest") Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max distance 0.5774 0.1822 0.2295 0.3952 0.5176 0.3955 0.5966 re74 2095.5737 5619.2365 6788.7508 -3523.6628 2425.5720 3620.9240 9216.5000 re75 1532.0553 2466.4844 3291.9962 -934.4291 981.0968 1060.6582 6795.0100 educ 10.3459 10.2354 2.8552 0.1105 1.0000 0.7027 4.0000 black 0.8432 0.2028 0.4026 0.6404 1.0000 0.6432 1.0000 hispan 0.0595 0.1422 0.3497 -0.0827 0.0000 0.0811 1.0000 age 25.8162 28.0303 10.7867 -2.2141 1.0000 3.2649 10.0000 married 0.1892 0.5128 0.5004 -0.3236 0.0000 0.3243 1.0000 nodegree 0.7081 0.5967 0.4911 0.1114 0.0000 0.1135 1.0000 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max distance 0.5774 0.3629 0.2533 0.2145 0.1646 0.2146 0.4492 re74 2095.5737 2342.1076 4238.9757 -246.5339 131.2709 545.1182 13121.7500 re75 1532.0553 1614.7451 2632.3533 -82.6898 152.1774 349.5371 11365.7100 educ 10.3459 10.6054 2.6582 -0.2595 0.0000 0.4541 3.0000 black 0.8432 0.4703 0.5005 0.3730 0.0000 0.3730 1.0000 hispan 0.0595 0.2162 0.4128 -0.1568 0.0000 0.1568 1.0000 age 25.8162 25.3027 10.5864 0.5135 3.0000 3.3892 9.0000 married 0.1892 0.2108 0.4090 -0.0216 0.0000 0.0216 1.0000 nodegree 0.7081 0.6378 0.4819 0.0703 0.0000 0.0703 1.0000 Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance 45.7140 68.1921 45.7536 24.7011 re74 93.0035 94.5880 84.9453 -42.3724 re75 91.1508 84.4891 67.0453 -67.2655 educ -134.7737 100.0000 35.3846 25.0000 black 41.7636 100.0000 42.0168 0.0000 hispan -89.4761 0.0000 -93.3333 0.0000 age 76.8070 -200.0000 -3.8079 10.0000 married 93.3191 0.0000 93.3333 0.0000 nodegree 36.9046 0.0000 38.0952 0.0000 Sample sizes: Control Treated All 429 185 Matched 185 185 Unmatched 244 0 Discarded 0 0 > plot(summary(mn.out, standardize = T)) compare this plot with the plot for full matching shown in class in Computing Corner; full matching does a much better job, though from wonderful, in improving balance. pdf of plot is in class directory, file week1RQ2.pdf #also standardized version of summary goes along with the plot # goal is for standardized mean diff < .1 > summary(mn.out, standardize = T) Call: matchit(formula = treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "nearest") Summary of balance for all data: Means Treated Means Control SD Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.5774 0.1822 0.2295 1.7941 0.3964 0.3774 0.6444 re74 2095.5737 5619.2365 6788.7508 -0.7211 0.2335 0.2248 0.4470 re75 1532.0553 2466.4844 3291.9962 -0.2903 0.1355 0.1342 0.2876 educ 10.3459 10.2354 2.8552 0.0550 0.0228 0.0347 0.1114 black 0.8432 0.2028 0.4026 1.7568 0.3202 0.3202 0.6404 hispan 0.0595 0.1422 0.3497 -0.3489 0.0414 0.0414 0.0827 age 25.8162 28.0303 10.7867 -0.3094 0.0827 0.0813 0.1577 married 0.1892 0.5128 0.5004 -0.8241 0.1618 0.1618 0.3236 nodegree 0.7081 0.5967 0.4911 0.2443 0.0557 0.0557 0.1114 Summary of balance for matched data: Means Treated Means Control SD Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.5774 0.3629 0.2533 0.9739 0.2405 0.2266 0.4216 re74 2095.5737 2342.1076 4238.9757 -0.0505 0.0324 0.0673 0.2757 re75 1532.0553 1614.7451 2632.3533 -0.0257 0.0270 0.0516 0.2054 educ 10.3459 10.6054 2.6582 -0.1290 0.0108 0.0261 0.0757 black 0.8432 0.4703 0.5005 1.0231 0.1865 0.1865 0.3730 hispan 0.0595 0.2162 0.4128 -0.6611 0.0784 0.0784 0.1568 age 25.8162 25.3027 10.5864 0.0718 0.0703 0.0855 0.2541 married 0.1892 0.2108 0.4090 -0.0551 0.0108 0.0108 0.0216 nodegree 0.7081 0.6378 0.4819 0.1541 0.0351 0.0351 0.0703 Percent Balance Improvement: Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 45.7140 39.3143 39.9682 34.5679 re74 93.0035 86.1124 70.0659 38.3325 re75 91.1508 80.0558 61.5500 28.5908 educ -134.7737 52.5180 24.9096 32.0511 black 41.7636 41.7636 41.7636 41.7636 hispan -89.4761 -89.4761 -89.4761 -89.4761 age 76.8070 15.0754 -5.1356 -61.0721 married 93.3191 93.3191 93.3191 93.3191 nodegree 36.9046 36.9046 36.9046 36.9046 Sample sizes: Control Treated All 429 185 Matched 185 185 Unmatched 244 0 Discarded 0 0 ### just as an aside 2:1 matching (1 treat, 2 controls in each subclass) also doesn't do as well as full matching > m12.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "optimal", ratio = 2) Warning message: In fullmatch(d, min.controls = ratio, max.controls = ratio, omit.fraction = (n0 - : Without 'data' argument the order of the match is not guaranteed to be the same as your original data. > summary(m12.out) Call: matchit(formula = treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "optimal", ratio = 2) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max distance 0.5774 0.1822 0.2295 0.3952 0.5176 0.3955 0.5966 re74 2095.5737 5619.2365 6788.7508 -3523.6628 2425.5720 3620.9240 9216.5000 re75 1532.0553 2466.4844 3291.9962 -934.4291 981.0968 1060.6582 6795.0100 educ 10.3459 10.2354 2.8552 0.1105 1.0000 0.7027 4.0000 black 0.8432 0.2028 0.4026 0.6404 1.0000 0.6432 1.0000 hispan 0.0595 0.1422 0.3497 -0.0827 0.0000 0.0811 1.0000 age 25.8162 28.0303 10.7867 -2.2141 1.0000 3.2649 10.0000 married 0.1892 0.5128 0.5004 -0.3236 0.0000 0.3243 1.0000 nodegree 0.7081 0.5967 0.4911 0.1114 0.0000 0.1135 1.0000 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max distance 0.5774 0.2084 0.2369 0.3690 0.4740 0.3701 0.5827 re74 2095.5737 3989.7874 5309.1812 -1894.2137 1469.4500 2013.9652 9177.7500 re75 1532.0553 2288.7920 3210.6422 -756.7366 805.6452 880.8906 6795.0100 educ 10.3459 10.2027 2.8139 0.1432 1.0000 0.6486 4.0000 black 0.8432 0.2351 0.4247 0.6081 1.0000 0.6108 1.0000 hispan 0.0595 0.1649 0.3716 -0.1054 0.0000 0.1027 1.0000 age 25.8162 26.9892 10.4749 -1.1730 2.0000 2.9189 9.0000 married 0.1892 0.4351 0.4964 -0.2459 0.0000 0.2432 1.0000 nodegree 0.7081 0.6351 0.4820 0.0730 0.0000 0.0757 1.0000 Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance 6.6205 8.4277 6.4367 2.3290 re74 46.2430 39.4184 44.3798 0.4204 re75 19.0162 17.8832 16.9487 0.0000 educ -29.6146 0.0000 7.6923 0.0000 black 5.0493 0.0000 5.0420 0.0000 hispan -27.4063 0.0000 -26.6667 0.0000 age 47.0223 -100.0000 10.5960 10.0000 married 24.0043 0.0000 25.0000 0.0000 nodegree 34.4779 0.0000 33.3333 0.0000 Sample sizes: Control Treated All 429 185 Matched 370 185 Unmatched 59 0 Discarded 0 0 > plot(summary(m12.out, standardize = T)) # still not as good as full match results