1
Basic options used across files and shortcut functions, e.g., ‘pp()’ for print
2
functions grabbed from web and created by us for analysis/output
3
Introduction
3.1
(Conceptual: approaches to statistics/inference and causality)[#conceptual]
Bayesian vs. frequentist approaches
3.1.1
Causal vs. descriptive; ‘treatment effects’ and the potential outcomes causal model
3.1.2
Theory, restrictions, and ‘structural vs reduced form’
3.2
Getting, cleaning and using data; project management and coding
3.2.1
Data: What/why/where/how
3.2.2
Organizing a project
3.2.3
Dynamic documents (esp Rmd/bookdown)
3.2.4
Good coding practices
3.2.5
Data sharing and integrity
3.3
Basic regression and statistical inference: Common mistakes and issues
3.3.1
“Bad control” (“colliders”)
3.3.2
Choices of lhs and rhs variables
3.3.3
Functional form
3.3.4
OLS and heterogeneity
3.3.5
“Null effects”
3.3.6
Multiple hypothesis testing (MHT)
3.3.7
Interaction terms and pitfalls
3.3.8
Choice of test statistics (including nonparametric)
3.3.9
How to display and write about regression results and tests
3.3.10
Bayesian interpretations of results
3.4
LDV and discrete choice modeling
3.5
Robustness and diagnostics, with integrity
3.5.1
(How) can diagnostic tests make sense? Where is the burden of proof?
3.5.2
Estimating standard errors
3.5.3
Sensitivity analysis: Interactive presentation
3.6
Control strategies and prediction; Machine Learning approaches
3.6.1
Machine Learning (statistical learning): Lasso, Ridge, and more
3.6.2
Limitations to inference from learning approaches
3.7
IV and its many issues
3.7.1
Instrument validity
3.7.2
Heterogeneity and LATE
3.7.3
Weak instruments, other issues
3.7.4
Reference to the use of IV in experiments/mediation
3.8
Other paths to observational identification
3.8.1
Fixed effects and differencing
3.8.2
DiD
3.8.3
RD
3.8.4
Time-series-ish panel approaches to micro
3.9
Causal pathways:
Mediation modeling and its massive limitations
3.10
Causal pathways: selection, corners, hurdles, and ‘conditional on’ estimates
3.10.1
‘Corner solution’ or hurdle variables and ‘Conditional on Positive’
3.11
(Experimental) Study design: Identifying meaningful and useful (causal) relationships and parameters
3.11.1
Why run an experiment or study?
3.11.2
Causal channels and identification
3.11.3
Types of experiments, ‘demand effects’ and more artifacts of artifical setups
3.11.4
Generalizability (and heterogeneity)
3.12
(Experimental) Study design: Background and quantitative issues
3.12.1
Pre-registration and Pre-analysis plans
3.12.2
Sequential and adaptive designs
3.12.3
Efficient assignment of treatments
3.13
(Experimental) Study design: (Ex-ante) Power calculations
3.13.1
What sort of ‘power calculations’ make sense, and what is the point?
3.13.2
Power calculations without real data
3.13.3
Power calculations using prior data
3.14
‘Experimetrics’ and measurement of treatment effects from RCTs
3.14.1
Which error structure? Random effects?
3.14.2
Randomization inference?
3.14.3
Parametric and nonparametric tests of simple hypotheses
3.14.4
Adjustments for exogenous (but non-random) treatment assignment
3.14.5
IV in an experimental context to get at ‘mediators’?
3.14.6
Heterogeneity in an experimental context
3.15
Making inferences from previous work; Meta-analysis, combining studies
3.15.1
Publication bias
3.15.2
Combining a few (your own) studies/estimates
3.15.3
Full meta-analyses
3.16
The Bayesian approach
3.17
Some key resources and references
3.17.1
Consider:
4
Conceptual: approaches to statistics/inference and causality
4.1
Bayesian vs. frequentist approaches
4.1.1
Interpretation of CI’s (aside)
4.2
Causal vs. descriptive; ‘treatment effects’ and the potential outcomes causal model
4.2.1
DAGs and Potential outcomes
4.3
Theory, restrictions, and ‘structural vs reduced form’
5
Getting, cleaning and using data
5.1
Data: What/why/where/how
5.2
Organizing a project
5.3
Dynamic documents (esp Rmd/bookdown)
5.3.1
Managing references/citations
5.3.2
An example of dynamic code
5.4
Project management tools, esp. Git/Github
5.5
Good coding practices
5.5.1
New tools and approaches to data (esp ‘tidyverse’)
5.5.2
Style and consistency
5.5.3
Using functions, variable lists, etc., for clean, concise, readable code
5.5.4
Mapping over lists to produce results
5.5.5
Building results based on ‘lists of filters’ of the data set
5.5.6
Coding style and indenting in Stata (one approach)
5.6
Additional tips (integrate)
6
Basic statistical inference and regressions: Common mistakes and issues
6.1
Basic regression and statistical inference: Common mistakes and issues briefly listed
6.1.1
Bad control
6.1.2
“Bad control” (“colliders”)
6.1.3
Choices of lhs and rhs variables
6.1.4
Functional form
6.1.5
OLS and heterogeneity
6.1.6
“Null effects”
6.1.7
Multiple hypothesis testing (MHT)
6.1.8
Interaction terms and pitfalls
6.1.9
Choice of test statistics (including nonparametric)
6.1.10
How to display and write about regression results and tests
6.1.11
Bayesian interpretations of results
7
Robustness and diagnostics, with integrity; Open Science resources
7.1
(How) can diagnostic tests make sense? Where is the burden of proof?
7.1.1
Further discussion: the DiD approach and ‘parallel trends’
7.2
Estimating standard errors
7.3
Sensitivity analysis: Interactive presentation
7.4
Supplement: open science resources, tools and considerations
7.5
Diagnosing p-hacking and publication bias (see also
meta-analysis
)
7.5.1
Publication bias – see also
considering publication bias in meta-analysis
7.6
Multiple hypothesis testing - see above
8
Control strategies and prediction, Machine Learning (Statistical Learning) approaches
8.1
See also
“notes on Data Science for Business”
8.2
Machine Learning (statistical learning): Lasso, Ridge, and more
8.2.1
Limitations to inference from learning approaches
8.3
Notes Hastie: Statistical Learning with Sparsity
8.3.1
Introduction
8.3.2
Ch2: Lasso for linear models
8.3.3
Chapter 3: Generalized linear models
8.3.4
Chapter 4: Generalizations of the Lasso penalty
8.4
Notes: Mullainathan
9
IV and its many issues
Some casual discussion
9.1
Instrument validity
9.2
Heterogeneity and LATE
9.3
Weak instruments, other issues
9.4
Intrumenting Interactions
9.5
Reference to the use of IV in experiments/mediation
10
Other paths to observational identification
10.1
Fixed effects and differencing
10.2
DiD
10.3
RD
10.4
Time-series-ish panel approaches to micro
10.4.1
Lagged dependent variable and fixed effects –> ‘Nickel bias’
11
Causal pathways - mediators
11.1
Mediators (and selection and Roy models): a review, considering two research applications
11.2
DR initial thoughts (for NL education paper)
11.3
Econometric Mediation Analyses (Heckman and Pinto)
Relevance to Parey et al
11.3.1
Summary and key modeling
11.3.2
Common assumptions and their implications
11.4
Pinto (2015), Selection Bias in a Controlled Experiment: The Case of Moving to Opportunity
Summary
Relevance to Parey et al
Introduction
Identification strategy brief
Results in brief
Framework: first for binary/binary (simplification)
Framework for MTO multiple treatment groups, multiple choices
11.5
Antonakis approaches
12
Causal pathways: selection, corners, hurdles, and ‘conditional on’ estimates
12.1
‘Corner solution’ or hurdle variables and ‘Conditional on Positive’
12.2
Bounding approaches (Lee, Manski, etc)
12.2.1
Notes: Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects, David Lee, 2009, RESTUD
13
(Experimental) Study design: Identifying meaningful and useful (causal) relationships and parameters
13.1
Why run an experiment or study?
13.1.1
Sitzia and Sugden on what theoretically driven experiments can and should do
13.2
Causal channels and identification
13.3
Types of experiments, ‘demand effects’ and more artifacts of artifical setups
13.4
Generalizability (and heterogeneity)
14
(Experimental) Study design: Background and quantitative issues
14.1
Pre-registration and Pre-analysis plans
14.1.1
The benefits and costs of pre-registration: a typical discussion
14.1.2
The hazards of specification-searching
14.2
Sequential and adaptive designs
14.3
Efficient assignment of treatments
14.3.1
See also
multiple hypothesis testing
14.3.2
How many treatment arms can you ‘afford’?
14.3.3
Other notes and resources
15
(Experimental) Study design: (Ex-ante) Power calculations
15.1
What sort of ‘power calculations’ make sense, and what is the point?
15.1.1
Why do a power analysis? What are the practical benefits of doing a power analysis
15.2
Key ingredients necessary for doing a power analysis(and designing a study in light of this)
17.2
Excerpts and notes from ‘Doing Meta-Analysis in R: A Hands-on Guide’ (Harrer et al)
17.2.1
Pooling effect sizes
17.2.2
Bayesian Meta-analysis
17.2.3
Forest plots
17.3
Dealing with publication bias
17.3.1
Diagnosis and responses: P-curves, funnel plots, adjustments
17.4
Other notes, links, and commentary
17.5
Other resources and tools
17.6
Example: discussion of meta-analyses of the Paleolithic diet
BELOW
18
Bayesian approaches
18.1
My (David Reinstein’s) uses for Bayesian approaches (brainstorm)
18.1.1
Meta-analysis of previous evidence
18.1.2
Inference, particularly about ‘null effects’
18.1.3
‘Policy’ and business implications and recommendations
18.1.4
Theory-driven inference about optimizing agents, esp. in strategic settings
18.1.5
Experimental design
18.2
‘Statistical thinking’ (McElreath) and
AJ Kurtz ‘recoded’ (bookdown)
: highlights and notes
18.2.1
The Golem of Prague (Chapter 1)
18.2.2
Small Worlds and Large Worlds (Ch 2)
18.2.3
Using prior information
18.2.4
From counts to probability.
18.3
Third video/chapter
18.3.1
Normal distributions
18.4
Title: “Introduction to Bayesian analysis in R and Stata - Katz, Qstep”
18.4.1
Why and when use Bayesian (MCMC) methods?
18.4.2
Theory
18.4.3
Comparing models … Equivalent of ‘likelihood’
18.4.4
On choosing priors
18.4.5
Implementation
18.4.6
Generate predictions from a WinBUGS model
18.4.7
Missing data case
18.4.8
Stata
18.4.9
R mcmc pac
18.5
Other resources and notes to integrate
19
Notes on Data Science for Business by Foster Provost and Tom Fawcett (2013)
19.1
Evaluation of this resource
Ch 1 Introduction: Data-Analytic Thinking
Example: During Hurricane Frances… predicting demand to gear inventory and avoid shortages … lead to huge profit for Wal-Mart
Example: Predicting Customer Churn
19.1.1
Data Science, Engineering, and Data-Driven Decision Making
19.1.2
Data Processing and “Big Data”
19.1.3
Data and Data Science Capability as a
Strategic Asset
19.1.4
Data-Analytic Thinking
19.1.5
Data Mining and Data Science, Revisited
19.2
Ch 2 Business Problems and Data Science Solutions
19.2.1
Types of problems and approaches
19.2.2
The Data Mining Process
19.3
Ch 3: Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
19.3.1
Models, Induction, and Prediction
19.3.2
Supervised Segmentation
19.3.3
Summary
19.3.4
NOTE – check if there is a gap here
19.4
Ch. 4: Fitting a Model to Data
19.4.1
Classification via Mathematical Functions
19.4.2
Regression via Mathematical Functions
19.4.3
Class Probability Estimation and Logistic Regression
19.4.4
Logistic Regression: Some Technical Details
19.4.5
Example: Logistic Regression versus Tree Induction
19.4.6
Nonlinear Functions, Support Vector Machines, and Neural NetworksThe two most common families of techniques that are based on fitting the parameters of complex, nonlinear functions are nonlinear supportvector machines and neural networks.
19.5
Ch 5: Overfitting and its avoidance
19.5.1
Generalization
19.5.2
Holdout Data and Fitting Graphs
19.5.3
Example: Overfitting Linear Functions
19.5.4
Example: Why Is Overfitting Bad?
19.5.5
From Holdout Evaluation to Cross-Validation
19.5.6
Learning Curves
19.5.7
Avoiding Overfitting with Tree Induction
19.5.8
A General Method for Avoiding Overfitting
19.5.9
A General Method for Avoiding Overfitting
19.5.10
Avoiding Overfitting for Parameter Optimization
19.6
Ch 6.: Similarity, Neighbors, and Clusters
19.6.1
Similarity and Distance
19.6.2
Similarity and Distance
19.6.3
Example: Whiskey Analytics
19.6.4
Nearest Neighbors for Predictive Modeling
19.6.5
How Many Neighbors and How Much Influence?
19.6.6
Geometric Interpretation, Overfitting, and Complexity Control
19.6.7
Issues with Nearest-Neighbor Methods
19.6.8
Other Distance Functions
19.6.9
Stepping Back: Solving a Business Problem Versus Data Exploration
19.6.10
Summary
19.7
Ch. 7. Decision Analytic Thinking I: What Is a Good Model?
19.7.1
Evaluating Classifier
19.7.2
The Confusion Matrix
19.7.3
Problems with Unbalanced Classes
19.7.4
Generalizing Beyond Classification
19.7.5
A Key Analytical Framework: Expected Value
19.7.6
Using Expected Value to Frame Classifier Use
19.7.7
Using Expected Value to Frame Classifier Evaluation
19.7.8
Evaluation, Baseline Performance, and Implications for Investments in Data
19.7.9
Summary
19.7.10
Ranking Instead of Classifying
19.7.11
Profit Curves
19.8
Contents and consideration
20
Meta-analysis arbitrary example: the ‘Paleo diet’
20.1
Conceptual: Thoughts on nutritional studies and meta-analysis issues
20.2
Manheimer et al
20.2.1
External critiques and evaluations of Manheimer et al, (esp Fenton) authors’ response
20.3
Other meta-analyses and consideration of the Paleo diet
20.4
Focus: Boers et al {#}
20.5
Overall analysis
21
List of references
Published with bookdown
Econometrics, statistics, and data science: Reinstein notes with a Micro, Behavioral, and Experimental focus
Econometrics, statistics, and data science: Reinstein notes with a Micro, Behavioral, and Experimental focus
Dr. David Reinstein,
2020-12-22
Abstract
This ‘book’ organizes my notes and helps others understand it and learn from it
1
Basic options used across files and shortcut functions, e.g., ‘pp()’ for print