11 Multi-level models

Sources:

qstep, Gabriel Katz
Statistical rethinking (McElreath2015?)

McElreath:

Multilevel models… remember features of each cluster in the data as they learn about all of the clusters.

When it comes to regression, multilevel regression deserves to be the default approach.

DR: When I first heard about this MLM stuff, coming from an econometric background, it seemed like trying to ‘have your cake and eat it too.’ If you ‘control’ at the level of clusters, how can you also make inferences about things that vary within clusters. I think I was wrong about this.

11.1 Introduction (Qstep)

(Gabriel or MLM?) came from test scores modeling, education literature. Determinants at multiple levels … e.g,. students’ own characteristics and school characteristics

Standard response: clustered standard errors; this is essentially the same as the ‘random intercept model.’

NoteThere is not necessarily a hierarchy; school versus neighborhood… may overlap \(\rightarrow\) ‘cross-nested’/‘cross-classified’; can have a hierarchy within some levels, not others

Panel/longitudinal (time/individual)

Accounts for heterogeneity and dependence

correlated behaviors within the same group (??)

11.2 Some basic theory

Example: student i, score \(y_i\), i=1,…l

Characteristics of i \(X_i\) and of school j

11.2.1 Level 1 model

\(y_i = \alpha_j + \beta x_i + \epsilon_i\)

Intercept varies across schools

New approach allows you to make the random effect term as much of a function of ?? to give it more flexibility

11.2.2 Level 2

\(\alpha_j=\lambda + \delta z_j + v_j\)

Together with the level-1 term, a mlm

‘Bias-adjusted FE’ is that other fancier approach that Sebastian does; requires stronger assumptions

The \(v_j\) term allows an explicit heterogeneity of the effect

Total variation in y_i unexplained by observed factors is \(\sigma^2_\epsilon + \sigma^2_v\)

Proportion of variation in outcome accounted for by ‘unobserved contextual factors’ (the level-2 stuff): ‘Intra-class correlation coefficient’

Any mlm can be written as a sungle level model with a bunch of random effects … it’s a ‘variance component model’

Estimating a MLE with only an intercept is like a standard model with clustered se

… the basic mlm allows correlation **** between students in the same classroom

Even without a ‘measured’ second level we still have an mlm

Suppose schools only have an impact on the average student score… Could also have a ‘random slope’ model; the model with heterogeneity

Could affect the impact of student effort; could affect students’ average marks, or both

Intra-cluster correlations

From David McKenzie’s WorldBank blog. Tools of the Trade: Intra-cluster Correlations

You can calculate the ICC using ANOVA estimation.

icc_anova <- aov(outcome ~ group,data = df) %>% summary
icc <- icc_anova[[1]][1,2]/sum(icc_anova[[1]][,2])

11.2.3 Alternative/Naive approaches

Could also do ‘one regression per school’

some will have very few obs
In contrast, schools with few obs will have an estimated RE very close to the mean; those further will have a larger RE estimated

School-level predictors in the single-level regression

… this assumes the schools influence is merfectly measured by z_j terms

Add both? no, these are perfectly collinear (although there are some attempts to relax this)

11.2.4 ‘old way’: two-stage regression

1. school dummies in individual regression
1. regress dummies on school covariates

Problems:

few students/school, imprecise estimates;
ignoring error terms of dummy estimates \(\rightarrow\) spurious significance

11.2.5 How many higher-level units do you need?

Debate in the literature … conventional answers ask for 25; this only pertains to frequentist work, not Bayesian … because it is not based on asymptotics

?can’t you also use simulation in frequentist models (bootstrap the MSE)…?

(juddTreatingStimuliRandom2012?) ### Other names: Random effects hierarchical, etc.

11.3 Fitting mlm in practice

R and Stata good for simple standard models

Winbugs enables complicated models (there is a package that derives the posteriors for you)

Stata: ‘xtreg, re’ is random intercept … there is also ‘mreg’

11.4 “Stimuli” (treatments) as a random factor

“Treating Stimuli as a Random Factor in Social Psychology: A New and Comprehensive Solution to a Pervasive but Largely Ignored Problem” - Judd et al, 2012

In this article, we present a comprehensive solution using mixed models for the analysis of data with crossed random factors (e.g., participants and stimuli).

(John List talks a lot about ‘randomizing across context’ maybe there’s an experimental econ literature on this too?)