11 Multi-level models
Sources:
- qstep, Gabriel Katz
- Statistical rethinking (McElreath2015?)
McElreath:
Multilevel models… remember features of each cluster in the data as they learn about all of the clusters.
When it comes to regression, multilevel regression deserves to be the default approach.
DR: When I first heard about this MLM stuff, coming from an econometric background, it seemed like trying to ‘have your cake and eat it too.’ If you ‘control’ at the level of clusters, how can you also make inferences about things that vary within clusters. I think I was wrong about this.
See also: gelmanChapterMRPNoncensus
11.1 Introduction (Qstep)
(Gabriel or MLM?) came from test scores modeling, education literature. Determinants at multiple levels … e.g,. students’ own characteristics and school characteristics
Standard response: clustered standard errors; this is essentially the same as the ‘random intercept model.’
NoteThere is not necessarily a hierarchy; school versus neighborhood… may overlap \(\rightarrow\) ‘cross-nested’/‘cross-classified’; can have a hierarchy within some levels, not others
Panel/longitudinal (time/individual)
Accounts for heterogeneity and dependence
- correlated behaviors within the same group (??)
11.2 Some basic theory
Example: student i, score \(y_i\), i=1,…l
Characteristics of i \(X_i\) and of school j
11.2.1 Level 1 model
\(y_i = \alpha_j + \beta x_i + \epsilon_i\)
Intercept varies across schools
New approach allows you to make the random effect term as much of a function of ?? to give it more flexibility
11.2.2 Level 2
\(\alpha_j=\lambda + \delta z_j + v_j\)
Together with the level-1 term, a mlm
‘Bias-adjusted FE’ is that other fancier approach that Sebastian does; requires stronger assumptions
The \(v_j\) term allows an explicit heterogeneity of the effect
Total variation in y_i unexplained by observed factors is \(\sigma^2_\epsilon + \sigma^2_v\)
Proportion of variation in outcome accounted for by ‘unobserved contextual factors’ (the level-2 stuff): ‘Intra-class correlation coefficient’
Any mlm can be written as a sungle level model with a bunch of random effects … it’s a ‘variance component model’
Estimating a MLE with only an intercept is like a standard model with clustered se
… the basic mlm allows correlation **** between students in the same classroom
Even without a ‘measured’ second level we still have an mlm
Suppose schools only have an impact on the average student score… Could also have a ‘random slope’ model; the model with heterogeneity
Could affect the impact of student effort; could affect students’ average marks, or both
Intra-cluster correlations
From David McKenzie’s WorldBank blog. Tools of the Trade: Intra-cluster Correlations
You can calculate the ICC using ANOVA estimation.
icc_anova <- aov(outcome ~ group,data = df) %>% summary
icc <- icc_anova[[1]][1,2]/sum(icc_anova[[1]][,2])
11.2.3 Alternative/Naive approaches
- Could also do ‘one regression per school’
- some will have very few obs
- In contrast, schools with few obs will have an estimated RE very close to the mean; those further will have a larger RE estimated
- School-level predictors in the single-level regression
… this assumes the schools influence is merfectly measured by z_j terms
- Add both? no, these are perfectly collinear (although there are some attempts to relax this)
11.2.4 ‘old way’: two-stage regression
- school dummies in individual regression
- regress dummies on school covariates
Problems:
- few students/school, imprecise estimates;
- ignoring error terms of dummy estimates \(\rightarrow\) spurious significance
11.2.5 How many higher-level units do you need?
- Debate in the literature … conventional answers ask for 25; this only pertains to frequentist work, not Bayesian … because it is not based on asymptotics
?can’t you also use simulation in frequentist models (bootstrap the MSE)…?
(juddTreatingStimuliRandom2012?) ### Other names: Random effects hierarchical, etc.
11.3 Fitting mlm in practice
R and Stata good for simple standard models
Winbugs enables complicated models (there is a package that derives the posteriors for you)
Stata: ‘xtreg, re’ is random intercept … there is also ‘mreg’
11.4 “Stimuli” (treatments) as a random factor
“Treating Stimuli as a Random Factor in Social Psychology: A New and Comprehensive Solution to a Pervasive but Largely Ignored Problem” - Judd et al, 2012
In this article, we present a comprehensive solution using mixed models for the analysis of data with crossed random factors (e.g., participants and stimuli).
(John List talks a lot about ‘randomizing across context’ maybe there’s an experimental econ literature on this too?)