5 Robustness and diagnostics, with integrity; Open Science resources

5.1 (How) can diagnostic tests make sense? Where is the burden of proof?

Where a particular assumption is critical to identification and inference … iFailure to reject the violation of an assumption is not sufficient to give us confidence that it is satisfied and the results are credible. Authors frequently cite insignificant statistical tests as evidence in support of a substantive model, or of evidence that they do not need to worry about certain confounds. Although the problem of induction is difficult, I find this approach inadequate. Where a negative finding is given as an important result, the authors should also show that their parameter estimate is tightly bounded around zero. Where it is cited as evidence they can ignore a confound, they should provide evidence that they can statistically bound that effect is small enough that it should not reasonably cause an issue (e.g., as using Lee or McNemar bounds for selective attrition/hurdles).

I am concerned with the interpretation of diagnostic testing, both in model selection, and in the defense of the exclusion restrictions or identification assumptions. It is problematic, when the basic consistency of the estimator (or a main finding of the paper) critically depends on such tests failing to reject a null hypothesis, to merely state that the ‘test failed to reject, therefore we maintain the null hypothesis.’


  • How powerful are these tests?

  • I.e. what is the probability of a false negative Type II error?

  • How large a bias would be compatible with reasonable confidence intervals for these tests?

5.2 Estimating standard errors

5.3 Sensitivity analysis: Interactive presentation

5.4 Supplement: open science resources, tools and considerations


5.5 Diagnosing p-hacking and publication bias (see also meta-analysis)

5.5.1 Publication bias – see also considering publication bias in meta-analysis