12 Survey design and implementation; analysis of survey data
The Wikipedia entry on ‘survey sampling’ provides a good overview.*
* Also see: Carl-Erik Särndal; Bengt Swensson; Jan Wretman (2003). Model assisted survey sampling. Springer. pp. 9–12. ISBN 978-0-387-40620-6. Retrieved 2 January 2011.
12.1 Survey sampling/intake
Probability sampling
In the fold below, I offer a rough characterization of the rationale for probabiliity sampling, in my own words, based on my reading of the aforementioned Wikipedia entry:
‘Probability sampling’ has been the standard approach in survey sampling since random-digit dialing was possible.
The basic idea, if I understand correctly, is to define a population of interest and a ‘sample frame’ (the place we are actually drawing from, the empirical analogue of the population of interest perhaps). Essentially, rather than advertising or trying to recruit everyone, or have people enter by self selection, probability sampling select from the sample frame with a particular probability, and then actively tries to get the selected individuals to complete the survey. As only a smaller number of people are selected to be interviewed/fill out the survey, you can spend more time and money/incentives, trying to make sure they respond.
Probability sampling also allows ‘stratification,’ and oversampling of harder to reach groups. We potentially divide up (‘stratify’) that frame by observable groups. We randomly draw (sample) within each strata with a certain probability.
If we have an informative estimate of the TRUE shares in each strata we can sample/re-weight so that the heterogeneous parameter of interest can be said to represent the average value for the true population of interest.
A probability-based survey sample is created by constructing a list of the target population, called the sampling frame, a randomized process for selecting units from the sample frame, called a selection procedure, and a method of contacting selected units to enable them to complete the survey, called a data collection method or mode (lookup in Wiki)
Note that the ‘sampling frame,’ ‘the source material or device from which a sample is drawn,’ e.g, a telephone directory, may not exactly contain all elements of the ‘population of interest’ (e.g., the population with and without listed numbers).
On the other hand this terminology doesn’t seem to be everywhere consistent, e.g., Salganik and H refer to it as ‘a list of all of the members in the population.’
See ‘Missing elements,’ ‘Foreign elements,’ ‘Duplicate entries’ and ‘Groups or clusters’
Also note that
Not all frames explicitly list population elements; some list only ‘clusters.’ For example, a street map can be used as a frame for a door-to-door survey.
- Simple random versus stratified and cluster sampling
Issues**
** Many of these also seem relevant for our ‘social movement’ case of interest below.
- Non-response bias (biggest issue?)
Non-probability sampling
There is a wide range of non-probability designs that include case-control studies, clinical trials, evaluation research designs, intercept surveys, and opt-in panels, to name a few.
(bakerSummaryReportAAPOR2013?)
Non-Probability Sampling - report of the aapor task force on non-probability sampling
‘River sampling’ is closest to the case we are dealing with below.
12.2 Case: Surveying an unmeasured and rare population surrounding a ‘social movement’
Background and setup
Consider a case where:
- We have a population-of-interest based on an affiliation, certain actions, or a set of ideas. E.g., Vegetarians; ‘Tea party conservatives’ in the US; Jews, both religious and ‘culturally Jewish,’ Jazz musicians, “Goths” (‘ethnography’; Paul Hodkinson)
For this writeup, we will call the targeted group ‘the Jazz Movement’ or ‘the Jazz population.’ Individuals will either be ‘Jazzy’ (J) or ‘non-Jazzy’ (NJ).
There are some disagreements about how to define this group.
- We have no ‘gold standard’ to benchmark against.
- There is no ‘actual targeted and measured outcome’ such as voting in an election.
- There are no other surveys or enumerations (e.g., censuses) to inform our results.
- We have collected survey responses from self-selected ‘convenience’ samples (‘internet surveys’) across several years; this most resembles ‘river sampling’
… based on advertising and word-of-mouth in a variety of outlets (‘referrers’) associated with the ‘movement.’*
*Particularly:
- A discussion forum
- A newsletter
- A popular website and hub for the movement
- We can identify which ‘referrer’ lead someone to our survey.
- All participants are given a similar ‘donation’ incentive, an incentive that might tend to particularly attract members of the Movement.**
** Given the context, we might reasonably expect that willingness to complete the survey might be associated with depth of support for the movement.
- We can link some individuals across years.
- Some questions repeat across years.
- We have (self-reported) measures of
- Demographics (age, gender, etc),
- Attitudes and beliefs (e.g., support for the death penalty),
- Retrospectives (esp. ‘year you became Jazzy’), and
- Behaviors (e.g., charitable donations).
- Our research goals include measuring:
The size of the movement (challenging),
… The demographics (and economic status, psychographics, etc) of the movement,
… The attitudes, beliefs and behaviors of people in the movement,
… The (causal) drivers of joining the movement and actively participating in the movement (or leaving the movement),
… and the trends/changes in all of the above.
We are particularly interested in the most avid and engaged Jazzers, and in knowing about self-reported challenges to participation.
Why do we care? We want to know these things for several reasons, including…
Our ‘theory of change.’
To find ways to build membership (perhaps ‘expand and diversify’), and increase participation in the movement (including specific behaviors like donating), especially through…
- Funding causal drivers (policies) that ‘work,’ and
- Profiling and targeting ‘likely Jazzers’ from outside the movement
To understand and better represent the attitudes of the movement’s members in our movement-wide activities.
‘For general understanding of the movement and its members,’ to inform a wide range of decisions across the movement, and further research into the movement.
Our ‘convenience’ method; issues, alternatives
Our current approach may be described as a combination of ‘convenience sampling’ (‘river sampling’ and ‘opt-in’) and ‘snowball sampling.’ The major distinction from probability sampling (as I see it) is…
Probability sampling identifies a population of interest and a sample frame meant to capture this population. Rather than appealing to this entire population/frame, probability sampling randomly (or using stratification/clustering) samples a ‘probability share’ (e.g., 1/1000) from this frame. Selected participants are (hopefully) given strong incentives to complete the survey. One can carefully analyze —and perhaps adjust for—the rate of non-response.
In contrast,
I have heard that ‘internet surveys,’ if done right, with proper adjustments, are seen as increasingly reliable, especially in the context of electoral polling. Is our approach similar enough to this to be able to adopt these approaches?
Wikipedia entry on ‘convenience sampling’
Another example would be a gaming company that wants to know how one of their games is doing in the market one day after its release. Its analyst may choose to create an online survey on Facebook to rate that game.
Bias The results of the convenience sampling cannot be generalized to the target population because of the potential bias of the sampling technique due to under-representation of subgroups in the sample in comparison to the population of interest. The bias of the sample cannot be measured. Therefore, inferences based on the convenience sampling should be made only about the sample itself.[9] (Wikipedia, on ‘Convenience sampling,’ cites Borenstein et al, 2017)
This statement is deeply pessimistic… ‘the bias cannot be measured.’ We might dig more deeply to see if there are potential approaches to dealing with this
Our methodological questions
Analysis: Are there any approaches that would be better than ‘reporting the unweighted raw results’ (e.g., weighting, cross-validating something or other) to using this “convenience/river” sample to either:
Getting results (either levels or changes) likely ‘to be more ’representative of the movement as a whole’ than our unweighted raw measures of the responses in each year?
Getting measures of the extent to which our reports are likely to be biased due to undercoverage, ‘overcoverage,’ and differential participation rates, … perhaps bounds on this bias.
Survey design: In designing future years’ surveys, is there a better approach?
Probability sampling of a ‘large group’ or within each outlet (with larger incentives/inducements), or even a nationally representative sample
Respondent-driven sampling (with network-analysis based adjustments)
Comparisons to other data points from other surveys and measures, comparisons to other groups and within groups
Sensitivity testing ideas
As we can separately measure demographics (as well as stated beliefs/attitudes) for respondents from each referrer, we could consider testing the sensitivity of the results to “how we weight responses from each referrer.”
How much would results vary if we compare the widest group to the most motivated “first responders” to our regular posted survey?
Use demographics derived from some weighted estimate of surveys in all previous years to re-weight the survey data in the present year to be “more representative?” (But how to weight and judge previous surveys and referrers?)
A Bayesian meta-analytic approach where the ‘true population parameters’ are unknown and each survey provides an imperfect window.
- As noted we subjectively think that some referrers are more representative than others, so maybe we can do something with this using Bayesian tools.) We may have some measures of the demographics of participants on some of the referrers, which might be used to consider weighting to deal with differential non-response
Weighting may not be appropriate as a means of gaining ‘complete representativeness,’ as we have no gold standard. Even if we had a nationally-representative sample/survey this could yield differential response rates among different types of EA-ers. An approach to ‘convergent validation’ may be possible in the future, but it will take time to learn/develop a methodology.
However, we can measure and test the sensitivity of our results to variation along several ‘dimensions’
Ease of response/dedication (maybe compare first to later responders, responders after reminders, future – compare those with additional incentives and pressure)
Referrers with different characteristics (‘large pool’ vs ‘small pool’ or something)
Demographics and ‘clusters/vectors of demographics’
(Possibly) re-weighting to match the demographics of some known group