Chapter 6 Economic theory, modeling, and empirical work
6.1 (From theory to) empirical work
6.2 Doing economic modelling and theory
Doing economic theory
Building an economic model
…
Posing your hypothesis an empirical test
Writing an empirical/econometric model
6.3 Economic theory and empirical research: writing about your work
Explain the limitations of your analysis to the reader, and what the next step would be. Perhaps you are aware there is an advanced estimation technique, or a larger data set, that could better answer your thesis question. However, this might be “too difficult” considering your abilities and resources. If you can explain this, do so.
If you’re doing a theory paper (also useful in an empirical paper) try to explicitly state and clearly explain a formal economic model, using mathematical notation.
If you’re doing an empirical paper clearly explain and describe your data, techniques, and results. Explain the econometrics behind your techniques as clearly as you can.
6.4 Empirical work: techniques and econometrics
Techniques. Understand what techniques others have used to answer your question, what technique you are using and why, and the arguments for each technique. Understand the limitations of each technique, previous papers, and of your own work.
Show you understand economic theory and the connection between theory and econometrics and empirical work. Understand the difference between these, and what each can do.
Use of techniques: use the tools you can handle, understand, and explain. Try to use the right techniques, but also try to limit yourself to techniques you can explain, at least in general terms.
Justify the techniques you use; don’t merely hide behind the rationalisation that “other authors did it”. If other authors jumped off the Brooklyn bridge, would you jump?
Know your limits. Set reasonable goals for your dissertation, and do not claim to have achieved more than you have done.
6.5 Why do we use data?
6.5.1 What data you do you need to answer your question?
6.5.2 Descriptive
To measure and understand our object of study: the economy (including individuals, households, firms, and governments)
Levels of variables, patterns (e.g., Life cycle consumption),
observed relationships between variables (differences by group, correlations, linear relationships, etc.)
For its own sake
To use in executing policy (e.g., the Consumer Price Index)
To generate hypotheses and “calibrate” our models
We can have statistical tests of “descriptive” hypothesis.
E.g., testing
H0: Incomes of men and women are the same ceteris paribus
vs.
HA: Women with the same characteristics as men earn less on average.
Note: this is not testing a causal relationship;
a difference doesn’t necessarily imply a particular explanation (e.g., sex discrimination).
Causal
6.5.3 Causal: To make statistical inferences (and statistical
predictions) about effects
(sometimes called “causal effects” but I find that redundant).
To measure and test hypotheses about the causal relationship between important factors and outcomes.
What data do you need to answer your question?
Relevant to your topic
the relevant population, years, fields;
relevant outcome variable(s), “independent variable(s)” of interest, control variables)
Useful for answering your question
e.g., contains a useful “instrumental variable”, a long enough time series, or repeated observations on individuals to allow ‘fixed effects’ controls
Reliable, accessible, understandable
What data have previous authors used to answer this or related questions?
6.6 Some types of data
Survey and collected data: self-reports, interviewers, physical measures and visual checks
Administrative data (e.g., tax records)
- Transactions/interactions
- Scanner data
- Web data (e.g., Ebay, Amazon)
- Price data
Public financial data and company reports
Official government data (public releases and announcements, e.g., budget data)
Data from lab experiments
Data from field experiments
Consider the differences between:
Micro-data (individual/transaction level) vs. Macro-data (aggregated to firm, region, country-year level etc)
Panel vs cross-section vs time-series data
6.7 Getting and using data
6.7.1 Finding data
In searching for data, note that the American Economics Association has a very comprehensive list of links: http://www.aeaweb.org/RFE/toc.php?show=complete for the UK in specific, see http://www.statistics.gov.uk/default.asp
For macro and micro data, see http://www.esds.ac.uk/
For large scale data, see also the UK Data Service database.
Some other sources of data, and links to aggregations on my webpage here.
Also note that data from published papers are typically expected to be made publically accessible (for replication and checking purposes). If you cannot find it on the journal or the author’s website, you can email the corresponding author to ask for it.
Don’t wait too long to begin collecting your data and producing simple graphs and summary statistics, to get a sense of your data.
Empirical work is difficult and you may not be able to get the “best” data This is OK. Remember, at the undergraduate/MSc level, we generally want you to show your competencies in these assignments; we expect the analysis will have limitations.
6.7.2 Some examples of datasets used by Undergraduate students
Workplace Employee Relations Survey: Private Sector Panel, 1998-2004 data, from the UK Data Archive.
Data on cigarette consumption from the US Centers of Disease Control (CDC) from 1986 to 2011, for 50 states \(\rightarrow\) 1300 observations.
The 1958 National Child Development Survey, a longitudinal study tracking a group of individuals born in a single week in 1958.
Data on UK cities’ population, employment, geography, extracted from various ONS tables.
“The ICCSR UK Environmental & Financial Dataset, is a large panel data set on a a sample of firms, giving a set of ratings on “community and environmental responsibility”; merged to a set of financial variables on these firms, collected from Datastream
Exchange rates between the US dollar, the British pound, Australian dollar, Canadian dollar and Swiss franc, for the period 1975-2010, from the OECD Main Economic Indicators database.
The World Bank Development Indicator database (2013); 210 countries over a 20-year period from 1991-2010
65 banks over 8 years from BankScope (profitability measures, etc)
6.8 Understanding your data
Present simple statistics and graphics on your data before doing more involved analyses.
6.9 What does data look like (brief)
6.9.1 Observations, variables
Each “unit” is an observation. Think of these as the rows of a spreadsheet.Every unit will have values for each of the “variables”. You may create new variables from transformations and combinations of the variables.You may limit your analysis to a subset of the observations for justifiable reasons. Your analysis may need to drop some observations, e.g., with missing variables (but be careful).
6.9.2 Cross-sectional, time-series, and panel data
6.9.3 String and numeric variables
String variables are text. In their raw form, they usually have quotes (“john smith”,) around them.
Numeric variables can be integers, “floats”, etc, stored in various forms. They are numbers.
Most statistical packages and programming languages treats these two types of variables differently, with a different “syntax” and different commands for each. Be careful.
There are many other data types, with some variation in how these are categorised and stored between languages. E.g.,
‘Factor’ variables (categorical, ordinal)
Logical (true/false)
Date and time variables
6.10 Doing an econometric analysis
(see previous notes on using data)
Which techniques
You may not be able to use the “ideal” estimation technique; it may be too advanced. But try to be aware (and able to explain) of the strengths and weaknesses of your econometric approach.
Time series, cross section, or panel data?
“A major problem is always understanding the difference between a panel and a time series. My students always want to just do a time series regression, and don’t understand why the cross-section dimension is important.” –University of Essex lecturer
Common difficulties
Diagnostic tests, etc.
Interpreting your results
“The second most frequent issue is that they think they are supposed to get a ‘right’ answer. They stress out when the regression doesn’t come out ‘right’.” – University of Essex lecturer
6.11 Presenting your results
…
Considering alternative hypotheses and “robustness checks”