
## First, thanks to the students who showed me the legacy code I provided did not produce the HSB side-by-side boxplots shown in class

## Issue is lmList subset command gives a warning (for the first time this century) and clearly does not appear to do subsetting correctly

## Workaround is to form seperate Public and Catholic datasets outside the lmList command (we did that for the various scatterplots anyway)

## Session below figures that out and shows you the workaround
#########################################################################################################################
R version 3.3.3 (2017-03-06) -- "Another Canoe"


> # address HSB, R-studio results, where did I go wrong?
>  setwd("D:\\drr17\\stat196\\week2\\")
>  #read in prepared Bryk data
>  library(lme4)
Loading required package: Matrix

>   brykD = read.table("http://statweb.stanford.edu/~rag/stat209/brykdataPrepared", header = T)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  InternetOpenUrl failed: 'A connection with the server could not be established'
#### first problem, cannot from off campus access these data in R (can from a browser), a central computing alias issue
### my solution is that I moved a copy over to GoDaddy

>   brykD = read.table("http://rogosateaching.com/stat196/brykdataPrepared", header = T)
> dim(brykD)
[1] 7185    6
> # loading these data over to GoDaddy solves that Statistics Dept is unreachable

> #  do formal lmer model
>  # order the sector factor so that Public is the reference group, to match pubs etc, 
>  # relevel in Bates mlmrev rev is shorter way
>   brykD$sector = factor(brykD$sector, levels = c('Public', 'Catholic'))
## the mixed model analysis
>  bryklmer = lmer(mathach ~ meanses*cses + sector*cses + (1 + cses|school), data = brykD)
>  summary(bryklmer)
Linear mixed model fit by REML ['lmerMod']
Formula: mathach ~ meanses * cses + sector * cses + (1 + cses | school)
   Data: brykD

REML criterion at convergence: 46503.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.15926 -0.72319  0.01704  0.75445  2.95822 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 school   (Intercept)  2.3796  1.5426       
          cses         0.1012  0.3181   0.39
 Residual             36.7212  6.0598       
Number of obs: 7185, groups:  school, 160

Fixed effects:
                    Estimate Std. Error t value
(Intercept)          12.1279     0.1993   60.86
meanses               5.3329     0.3692   14.45
cses                  2.9450     0.1556   18.93
sectorCatholic        1.2266     0.3063    4.00
meanses:cses          1.0392     0.2989    3.48
cses:sectorCatholic  -1.6427     0.2398   -6.85

Correlation of Fixed Effects:
            (Intr) meanss cses   sctrCt mnss:c
meanses      0.256                            
cses         0.075  0.019                     
sectorCthlc -0.699 -0.356 -0.053              
meanses:css  0.019  0.074  0.293 -0.026       
css:sctrCth -0.052 -0.027 -0.696  0.077 -0.351
> # this result matches every of the many many analysis runs of these data, so the data are fine
> 
> ## proceed to the prelim lmLists plots, the issue raised 
>     # make side-by-side boxplots of level 1 gradients and level (Y_bar)

>  cathRegC = lmList(mathach ~ cses| school, subset = sector == "Catholic", data = brykD)
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
  data length is not a multiple of split variable

>   pubRegC = lmList(mathach ~ cses| school, subset = sector == "Public", data = brykD)
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
  data length is not a multiple of split variable
## ok we are warned, what do we have?
>   length(cathRegC); length(pubRegC)
[1] 70
[1] 90
# we have the right number of schools at least
>  # so for first time this century lmList is complaining about subsetting; does the warning matter? Turns out YES

>   pubcoef= coef(pubRegC)
>   cathcoef= coef(cathRegC)
> fivenum(cathcoef[,2])
[1] -1.217531  1.478560  2.233799  2.830862  4.039119
> fivenum(cathcoef[,1])
[1]  9.100918 11.748964 12.802266 13.750344 16.687763
> # so subset in lmList is not working as I would want it
> # These numbers are consistent with the plots students showed me 


## solution is to subset first, then use lmList
> brykD_C = brykD[brykD$sector == "Catholic",] # data frame of the 70 catholic schools
>  cathRegC_2 = lmList(mathach ~ cses| school,  data = brykD_C)
>   cathcoef_2= coef(cathRegC_2)
> fivenum(cathcoef_2[,2]) #check what we have, different from above and matches legacy plot for slope and intercept
[1] -2.0150264  0.5604999  1.5233059  2.4632036  5.2575334
> fivenum(cathcoef_2[,1])
[1]  7.335937 13.177688 14.466754 15.983170 19.719143
# also subset the Public (which I had done for the plots)
> brykD_P = brykD[brykD$sector == "Public",] # data frame of the 90 public schools
>  pubRegP_2 = lmList(mathach ~ cses| school,  data = brykD_P)
>   pubcoef_2= coef(pubRegP_2)

## let's see if this works
> par( mfrow = c(1,2)) # opens a graphics window, creates the figure shown in lecture
>   boxplot(cathcoef_2[,1], pubcoef_2[,1], main = 'Intercepts', names = c('Catholic', 'Public'))
>   boxplot(cathcoef_2[,2], pubcoef_2[,2], main = 'Slopes', names = c('Catholic', 'Public'))
> #produces the boxplot we showed in class and corresponds to the lmer analysis
> 
## Thanks again to students who ran the code on this years lmList
