## First, thanks to the students who showed me the legacy code I provided did not produce the HSB side-by-side boxplots shown in class
## Issue is lmList subset command gives a warning (for the first time this century) and clearly does not appear to do subsetting correctly
## Workaround is to form seperate Public and Catholic datasets outside the lmList command (we did that for the various scatterplots anyway)
## Session below figures that out and shows you the workaround
#########################################################################################################################
R version 3.3.3 (2017-03-06) -- "Another Canoe"
> # address HSB, R-studio results, where did I go wrong?
> setwd("D:\\drr17\\stat196\\week2\\")
> #read in prepared Bryk data
> library(lme4)
Loading required package: Matrix
> brykD = read.table("http://statweb.stanford.edu/~rag/stat209/brykdataPrepared", header = T)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
InternetOpenUrl failed: 'A connection with the server could not be established'
#### first problem, cannot from off campus access these data in R (can from a browser), a central computing alias issue
### my solution is that I moved a copy over to GoDaddy
> brykD = read.table("http://rogosateaching.com/stat196/brykdataPrepared", header = T)
> dim(brykD)
[1] 7185 6
> # loading these data over to GoDaddy solves that Statistics Dept is unreachable
> # do formal lmer model
> # order the sector factor so that Public is the reference group, to match pubs etc,
> # relevel in Bates mlmrev rev is shorter way
> brykD$sector = factor(brykD$sector, levels = c('Public', 'Catholic'))
## the mixed model analysis
> bryklmer = lmer(mathach ~ meanses*cses + sector*cses + (1 + cses|school), data = brykD)
> summary(bryklmer)
Linear mixed model fit by REML ['lmerMod']
Formula: mathach ~ meanses * cses + sector * cses + (1 + cses | school)
Data: brykD
REML criterion at convergence: 46503.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.15926 -0.72319 0.01704 0.75445 2.95822
Random effects:
Groups Name Variance Std.Dev. Corr
school (Intercept) 2.3796 1.5426
cses 0.1012 0.3181 0.39
Residual 36.7212 6.0598
Number of obs: 7185, groups: school, 160
Fixed effects:
Estimate Std. Error t value
(Intercept) 12.1279 0.1993 60.86
meanses 5.3329 0.3692 14.45
cses 2.9450 0.1556 18.93
sectorCatholic 1.2266 0.3063 4.00
meanses:cses 1.0392 0.2989 3.48
cses:sectorCatholic -1.6427 0.2398 -6.85
Correlation of Fixed Effects:
(Intr) meanss cses sctrCt mnss:c
meanses 0.256
cses 0.075 0.019
sectorCthlc -0.699 -0.356 -0.053
meanses:css 0.019 0.074 0.293 -0.026
css:sctrCth -0.052 -0.027 -0.696 0.077 -0.351
> # this result matches every of the many many analysis runs of these data, so the data are fine
>
> ## proceed to the prelim lmLists plots, the issue raised
> # make side-by-side boxplots of level 1 gradients and level (Y_bar)
> cathRegC = lmList(mathach ~ cses| school, subset = sector == "Catholic", data = brykD)
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable
> pubRegC = lmList(mathach ~ cses| school, subset = sector == "Public", data = brykD)
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable
## ok we are warned, what do we have?
> length(cathRegC); length(pubRegC)
[1] 70
[1] 90
# we have the right number of schools at least
> # so for first time this century lmList is complaining about subsetting; does the warning matter? Turns out YES
> pubcoef= coef(pubRegC)
> cathcoef= coef(cathRegC)
> fivenum(cathcoef[,2])
[1] -1.217531 1.478560 2.233799 2.830862 4.039119
> fivenum(cathcoef[,1])
[1] 9.100918 11.748964 12.802266 13.750344 16.687763
> # so subset in lmList is not working as I would want it
> # These numbers are consistent with the plots students showed me
## solution is to subset first, then use lmList
> brykD_C = brykD[brykD$sector == "Catholic",] # data frame of the 70 catholic schools
> cathRegC_2 = lmList(mathach ~ cses| school, data = brykD_C)
> cathcoef_2= coef(cathRegC_2)
> fivenum(cathcoef_2[,2]) #check what we have, different from above and matches legacy plot for slope and intercept
[1] -2.0150264 0.5604999 1.5233059 2.4632036 5.2575334
> fivenum(cathcoef_2[,1])
[1] 7.335937 13.177688 14.466754 15.983170 19.719143
# also subset the Public (which I had done for the plots)
> brykD_P = brykD[brykD$sector == "Public",] # data frame of the 90 public schools
> pubRegP_2 = lmList(mathach ~ cses| school, data = brykD_P)
> pubcoef_2= coef(pubRegP_2)
## let's see if this works
> par( mfrow = c(1,2)) # opens a graphics window, creates the figure shown in lecture
> boxplot(cathcoef_2[,1], pubcoef_2[,1], main = 'Intercepts', names = c('Catholic', 'Public'))
> boxplot(cathcoef_2[,2], pubcoef_2[,2], main = 'Slopes', names = c('Catholic', 'Public'))
> #produces the boxplot we showed in class and corresponds to the lmer analysis
>
## Thanks again to students who ran the code on this years lmList