17 Boiling down: Construct validation/reliability, dimension reduction, factor analysis, and Psychometrics

17.1 Constructs and construct validation and reliability

Reliability vs validity:

Reliability wiki: “similar[ity] of results under similar conditions”

Validity of a tool [wiki]: – “the degree to which the tool measures what it claims to measure”

17.1.1 Validity: general discussion

In psychometrics, validity has a particular application known as test validity: “the degree to which evidence and theory support the interpretations of test scores” (“as entailed by proposed uses of tests”).[3]

Construct validity refers to the extent to which operationalizations of a construct (e.g., practical tests developed from a theory) measure a construct as defined by a theory. It subsumes all other types of validity.

There are a number of other measures and types of validity defined, some of which seem to involve subjective judgement, others of which are formalizable

17.1.2 Reliability: general discussion

Reliability of measurements and multicomponent measuring instruments:

  • Inter-rater reliability
  • Test-retest reliability
  • Inter-method reliability
  • Internal consistency reliability (“across items within a test.”[6]")

“Classical test theory”

measurement errors are essentially random. … they are not correlated with true scores or with errors on other tests.

variance of obtained scores is simply the sum of the variance of true scores plus the variance of errors of measurement.[7]


The reliability coefficient is defined as the ratio of true score variance to the total variance of test scores.

Why is a larger reliability coefficient a good thing? I guess because it indicates that made the measurement error part as small as possible given the true score variance. But wouldn’t it be better to choose a measure with a lower ‘true score variance?’

17.1.3 (raykovMetaanalysisScaleReliability2013?)


The basic problem (?)


frequently … in empirical social and behavioral research, we consider a measuring instrument consisting of a prespecified set of \(p\) congeneric [related?] components denoted \(X_1, ..., X_p\). …

Let \(T_1, ..., T_p\) and \(E_1, ..., E_p\) be their corresponding true scores and error scores, respectively, with the latter assumed uncorrelated [to each other]

… Like ‘measurement error’ in Econometrics

The scale components thus measure the same underlying latent dimension, designated \(\xi\), with possibly different units and origins of measurement as well as error variances; that is,

\[X_i= T_i + E_i = \alpha_i + \beta_i \xi + E_i\]

Often used, of concern: the “composite (scale) score” \(Z = X_1 + X_2 + ... + X_k\) (e.g., Likert scale index)

  • Convergent and discriminant validity: Things meant to measure similar things should correlate, things meant to not be related are not

  • “analysis of composite reliability”

Latent factors/latent variables

17.2 Factor analysis and principal-component analysis

17.3 Other

“Common methods bias”