12 Survey design and implementation; analysis of survey data

The Wikipedia entry on ‘survey sampling’ provides a good overview.*

* Also see: Carl-Erik Särndal; Bengt Swensson; Jan Wretman (2003). Model assisted survey sampling. Springer. pp. 9–12. ISBN 978-0-387-40620-6. Retrieved 2 January 2011.

12.1 Survey sampling/intake

Probability sampling

In the fold below, I offer a rough characterization of the rationale for probabiliity sampling, in my own words, based on my reading of the aforementioned Wikipedia entry:

‘Probability sampling’ has been the standard approach in survey sampling since random-digit dialing was possible.

The basic idea, if I understand correctly, is to define a population of interest and a ‘sample frame’ (the place we are actually drawing from, the empirical analogue of the population of interest perhaps). Essentially, rather than advertising or trying to recruit everyone, or have people enter by self selection, probability sampling select from the sample frame with a particular probability, and then actively tries to get the selected individuals to complete the survey. As only a smaller number of people are selected to be interviewed/fill out the survey, you can spend more time and money/incentives, trying to make sure they respond.

Probability sampling also allows ‘stratification,’ and oversampling of harder to reach groups. We potentially divide up (‘stratify’) that frame by observable groups. We randomly draw (sample) within each strata with a certain probability.

If we have an informative estimate of the TRUE shares in each strata we can sample/re-weight so that the heterogeneous parameter of interest can be said to represent the average value for the true population of interest.

A probability-based survey sample is created by constructing a list of the target population, called the sampling frame, a randomized process for selecting units from the sample frame, called a selection procedure, and a method of contacting selected units to enable them to complete the survey, called a data collection method or mode (lookup in Wiki)

Note that the ‘sampling frame,’ ‘the source material or device from which a sample is drawn,’ e.g, a telephone directory, may not exactly contain all elements of the ‘population of interest’ (e.g., the population with and without listed numbers).

On the other hand this terminology doesn’t seem to be everywhere consistent, e.g., Salganik and H refer to it as ‘a list of all of the members in the population.’

See ‘Missing elements,’ ‘Foreign elements,’ ‘Duplicate entries’ and ‘Groups or clusters’

Also note that

Not all frames explicitly list population elements; some list only ‘clusters.’ For example, a street map can be used as a frame for a door-to-door survey.

  • Simple random versus stratified and cluster sampling


** Many of these also seem relevant for our ‘social movement’ case of interest below.

  • Non-response bias (biggest issue?)

Non-probability sampling

There is a wide range of non-probability designs that include case-control studies, clinical trials, evaluation research designs, intercept surveys, and opt-in panels, to name a few.


Non-Probability Sampling - report of the aapor task force on non-probability sampling

‘River sampling’ is closest to the case we are dealing with below.

Respondent-driven sampling (from rare or ‘hidden’ populations)

sampling and estimation in hidden populations using respondent-driven sampling (salganikSamplingEstimationHidden2004?)

They suggest a way to use ‘RDS’ and then recover representativeness, under certain assumptions.

Standard sampling and estimation techniques require the researcher to select sample members with a known probability of selection. In most cases this requirement means that researchers must have a sampling frame, a list of all members in the population. However, for many populations of interest such a list does not exist.

  • ‘Standard’ approach … Construct a sample frame of ‘all the injection drug users in a large city’; ‘probably impossible’

  • Or ‘reach a large number of people via random digit dialing and then screen them for membership in the hidden population. Again, this approach is extremely costly and potentially very inaccurate.’

  • Or “take a sample of target population members in an institutional setting—for example, injection drug users in a drug rehabilitation program.” But “a nonrandom sample from the hidden population, … impossible to use to … make accurate estimates about the entire hidden population”

Targeted sampling (‘street outreach’) is cost-effective but it is very hard to know what you are getting. Time-space sampling can deal with certain limitations to this, but it’s far from a complete remedy.

The authors propose:

respondent-driven sampling, is a variation of chain-referral sampling methods that were first introduced by Coleman (1958) under the name snowball sampling.

select a small number of seeds who are the first people to participate in the study. These seeds then recruit others to participate in the study. This process of existing sample members recruiting future sample members continues until the desired sample size is reached


chain-referral methods produce samples that are not even close to simple random samples

researchers believed that any small bias in selecting the seeds would be compounded in unknown ways as the sampling process continued.

and also …

This research is fairly well summarized by Berg(1988) when he writes, “as a rule, a snowball sample will be strongly biased toward inclusion of those who have many interrelationships with or are coupled to, a large number of individuals.” In the absence of knowledge of individual inclusion probabilities in different waves of the snowball sample, unbiased estimation is not possible.

“Biased toward inclusion of those who have many interrelationships…”

This would seem to suggest a solution, perhaps driving the approach the authors use. If we can measure the likelihood that an individual is reached as a function of their network, we may be able to downweight those individuals that are more likely to be sampled.

But the authors believe they have a solution

We believe that previous researchers have been overly pessimistic about chain-referral samples. In this paper we will show that it is possible to make unbiased estimates about hidden populations from these types of samples. Further, we will show that these estimates are asymptotically unbiased no matter how the seeds are select

How does it work?:

First, the sample is used to make estimates about the social network connectingthe population. Then, this information about the social network is used to derive the proportion of the population in different groups (forexample, HIV- or HIV+)

There are some intricate and interesting calculations involving network analysis .

They show the unbiasedness of this approach under specified conditions, and its robustness to the seed choice. They also give a monte-carlo simulation illustrating the same.

Finally, they do some sort of comparison of their approach and previous approaches in real settings.**

** DR: I didn’t read the latter carefully. Worth following up, perhaps.

12.2 Case: Surveying an unmeasured and rare population surrounding a ‘social movement’

Background and setup

Consider a case where:

  1. We have a population-of-interest based on an affiliation, certain actions, or a set of ideas. E.g., Vegetarians; ‘Tea party conservatives’ in the US; Jews, both religious and ‘culturally Jewish,’ Jazz musicians, “Goths” (‘ethnography’; Paul Hodkinson)

For this writeup, we will call the targeted group ‘the Jazz Movement’ or ‘the Jazz population.’ Individuals will either be ‘Jazzy’ (J) or ‘non-Jazzy’ (NJ).

There are some disagreements about how to define this group.

  1. We have no ‘gold standard’ to benchmark against.
  • There is no ‘actual targeted and measured outcome’ such as voting in an election.
  • There are no other surveys or enumerations (e.g., censuses) to inform our results.

  1. We have collected survey responses from self-selected ‘convenience’ samples (‘internet surveys’) across several years; this most resembles ‘river sampling’

… based on advertising and word-of-mouth in a variety of outlets (‘referrers’) associated with the ‘movement.’*


  • A discussion forum
  • A newsletter
  • A popular website and hub for the movement
  • We can identify which ‘referrer’ lead someone to our survey.
  • All participants are given a similar ‘donation’ incentive, an incentive that might tend to particularly attract members of the Movement.**

** Given the context, we might reasonably expect that willingness to complete the survey might be associated with depth of support for the movement.

  • We can link some individuals across years.
  • Some questions repeat across years.

  1. We have (self-reported) measures of
  • Demographics (age, gender, etc),
  • Attitudes and beliefs (e.g., support for the death penalty),
  • Retrospectives (esp. ‘year you became Jazzy’), and
  • Behaviors (e.g., charitable donations).

  1. Our research goals include measuring:

The size of the movement (challenging),

… The demographics (and economic status, psychographics, etc) of the movement,

… The attitudes, beliefs and behaviors of people in the movement,

… The (causal) drivers of joining the movement and actively participating in the movement (or leaving the movement),

… and the trends/changes in all of the above.

We are particularly interested in the most avid and engaged Jazzers, and in knowing about self-reported challenges to participation.

Why do we care? We want to know these things for several reasons, including…

Our ‘theory of change.’

To find ways to build membership (perhaps ‘expand and diversify’), and increase participation in the movement (including specific behaviors like donating), especially through…

  • Funding causal drivers (policies) that ‘work,’ and
  • Profiling and targeting ‘likely Jazzers’ from outside the movement

To understand and better represent the attitudes of the movement’s members in our movement-wide activities.

‘For general understanding of the movement and its members,’ to inform a wide range of decisions across the movement, and further research into the movement.

Our ‘convenience’ method; issues, alternatives

Our current approach may be described as a combination of ‘convenience sampling’ (‘river sampling’ and ‘opt-in’) and ‘snowball sampling.’ The major distinction from probability sampling (as I see it) is…

Probability sampling identifies a population of interest and a sample frame meant to capture this population. Rather than appealing to this entire population/frame, probability sampling randomly (or using stratification/clustering) samples a ‘probability share’ (e.g., 1/1000) from this frame. Selected participants are (hopefully) given strong incentives to complete the survey. One can carefully analyze —and perhaps adjust for—the rate of non-response.

In contrast,

I have heard that ‘internet surveys,’ if done right, with proper adjustments, are seen as increasingly reliable, especially in the context of electoral polling. Is our approach similar enough to this to be able to adopt these approaches?

Wikipedia entry on ‘convenience sampling’

Another example would be a gaming company that wants to know how one of their games is doing in the market one day after its release. Its analyst may choose to create an online survey on Facebook to rate that game.

Bias The results of the convenience sampling cannot be generalized to the target population because of the potential bias of the sampling technique due to under-representation of subgroups in the sample in comparison to the population of interest. The bias of the sample cannot be measured. Therefore, inferences based on the convenience sampling should be made only about the sample itself.[9] (Wikipedia, on ‘Convenience sampling,’ cites Borenstein et al, 2017)

This statement is deeply pessimistic… ‘the bias cannot be measured.’ We might dig more deeply to see if there are potential approaches to dealing with this

Our methodological questions

Analysis: Are there any approaches that would be better than ‘reporting the unweighted raw results’ (e.g., weighting, cross-validating something or other) to using this “convenience/river” sample to either:

  1. Getting results (either levels or changes) likely ‘to be more ’representative of the movement as a whole’ than our unweighted raw measures of the responses in each year?

  2. Getting measures of the extent to which our reports are likely to be biased due to undercoverage, ‘overcoverage,’ and differential participation rates, … perhaps bounds on this bias.

Survey design: In designing future years’ surveys, is there a better approach?

  • Probability sampling of a ‘large group’ or within each outlet (with larger incentives/inducements), or even a nationally representative sample

  • Respondent-driven sampling (with network-analysis based adjustments)

  • Comparisons to other data points from other surveys and measures, comparisons to other groups and within groups

Sensitivity testing ideas

As we can separately measure demographics (as well as stated beliefs/attitudes) for respondents from each referrer, we could consider testing the sensitivity of the results to “how we weight responses from each referrer.”

  • How much would results vary if we compare the widest group to the most motivated “first responders” to our regular posted survey?

  • Use demographics derived from some weighted estimate of surveys in all previous years to re-weight the survey data in the present year to be “more representative?” (But how to weight and judge previous surveys and referrers?)

  • A Bayesian meta-analytic approach where the ‘true population parameters’ are unknown and each survey provides an imperfect window.

    • As noted we subjectively think that some referrers are more representative than others, so maybe we can do something with this using Bayesian tools.) We may have some measures of the demographics of participants on some of the referrers, which might be used to consider weighting to deal with differential non-response

Weighting may not be appropriate as a means of gaining ‘complete representativeness,’ as we have no gold standard. Even if we had a nationally-representative sample/survey this could yield differential response rates among different types of EA-ers. An approach to ‘convergent validation’ may be possible in the future, but it will take time to learn/develop a methodology.

However, we can measure and test the sensitivity of our results to variation along several ‘dimensions’

  • Ease of response/dedication (maybe compare first to later responders, responders after reminders, future – compare those with additional incentives and pressure)

  • Referrers with different characteristics (‘large pool’ vs ‘small pool’ or something)

  • Demographics and ‘clusters/vectors of demographics’

  • (Possibly) re-weighting to match the demographics of some known group

We may do this in an a-theoretical ad-hoc way for now, unless we can identify and use a systematic framework for this.

12.2.1 Sketched model and approach: Bayesian inference/updating for estimating demographics and attitudes of an rare/hidden population

I suspect there must be substantial work on this, but I’m just going to flesh out how I think it would probably go:


We consider both latent/unobservable ‘true’ population parameters and our observed values.

Movement population and composition (latent): The movement has a population (henceforth ‘our population’) of unknown size \(\Pi\) and composition \(\delta\).

‘Demographics’ \(\delta\) can either represent a single proportion, e.g., the female share of our population, or a vector of these proportions for various demographics.* For now, for simplicity, we will focus on a single demographic. Let \(\delta_0\) be the male share and \(\delta_1 = 1- \delta_0\) the non-male share. For indexing this simple case, we refer to demographics \(\delta_k\) for \(k=0,1\).

* …or even a matrix or array describing each pairing of demographics.

E.g., if the male share is 70%, we have be \(\Pi_0= \delta_0 \Pi = 0.7\times 100 = 70\) males in our population.**

Naturally, \(\Pi_1 = \delta_1 \Pi = (1-\delta_0)\Pi = \Pi - \Pi_0 = 100-30 = 30\) non-males in our population.

As noted below, we will be targeting the population via several “referrers” \(r=0,1,...,R\). For now, we will assume each member of the population also only “belongs” to a single referrer. This is also described by latent parameters: we do not know what share of the population belongs to each referrer, nor how this breaks down by demographics.

For simplicity, consider only two referrers (\(r=0,1\)), the ‘wide and narrow referrers,’ respectively, and let \(\rho_0\) denote the share of our population that belongs to the wider referrer (and \(\rho_1 = 1-\rho_0\) the share belonging to the narrower referrer).***

*** For the case in mind when I built this example, this embodies in a place of assumption that is too strong. In fact each of our members may belong to one or more referrers, or in fact to none at all.

Thus, supposing 80% of our population belongs to the wider referrer, we have \(R_0= \rho_0 \Pi = 0.8\times 100 = 80\) members of our population belonging to the wider referrer. (Note that we let \(R_0\) denote the ‘number of members of our population belonging to the wider referrer.’)

Again, we do not know how this breaks down by demographics, and it may not be even. Thus, we need to define the shares of each demographic belonging to each referrer:

\(\rho_{r_k}=pr(k=k|r=r)\): The share of the ‘population of referrer r’ that also both belongs to demographic \(k\).

(Note – I’m not sure if I should use this ‘cross proportion’ or instead use two separate proportions, i.e., conditional probabilities.)

In general (and defining the corresponding ‘number of observations’ notation):


Thus, e.g., given the shares mentioned above, and assuming that 70% of the narrower referrer’s population is male (\(\rho_{1_0}=pr(k=0|r=1)) = 0.7\), our true population of ‘males in the narrower referrer’ is

\(\Pi_{10}=\Pi\times\rho_1\times\rho_{1_0}=100\times0.2\times0.7 = 14\)

Surveys/referrers and response/participation rates (observed)

We have an opt-in web survey that has been administered yearly in years \(t=0,...,T\).

In each year, the survey has been advertised in one of the two referrers noted above.*.

More generally, in \(R\) referrers … \(r=0,1,...,R\).

We observe the number of participants coming from each referrer \(r\) with each demographic \(k\) in each year \(t\)

\(n_{r,k,t} \forall \: r,k,t\)

Suppose that the survey has only been run for 2 years.

Thus, e.g., we may observe \(n_{1,0,1}=20\) from the narrower referrer, who are male, in the first year (year ‘0’).

Participation ‘model’:

I suggested above, we assume that each member of our population only observes (or only “meaningfully observes”) the survey from at most one referrer, the one they ‘belong to.’

If an individual indeed observes the survey, they then decide whether to participate.*

* In future, we might model these participation decisions as the outcome of optimization problems, perhaps giving us insight into the direction of the bias (e.g., according to opportunity cost and concern for the outcome of the survey).

This is perhaps a “double hurdle”: the individual may or may not observe the survey, and then may or may not decide whether it is worth participating. Separating these two processes would seem to separate the “coverage” and “differential response/participation” issues.

As a result, we may have an unobserved “propensity to participate” for each participant, in each year, which may vary by referrer and by demographic.

For now, we will simplify things; We will not model the two hurdles separately, but simply refer to a “latent propensity to (observe and) participate” for each subgroup:

\(\alpha_{r,k,t}\): The true probability that an individual belonging to referrer \(r\), who is in demographic \(k\) (in year \(t\)) participates (returns the survey) in year \(t\).

If we had an (unbiased?) estimate of \(\alpha_{r,k,t} \forall r,k,t\) (and if we observe responses from each combination of these, and if the some share of each group belongs to each referrer) we should be able to use inverse probability weighting to gain an unbiased estimate of the true demographic population shares as well as the overall true population size (?).

The parameters and the priors:

Specify some informative but still somewhat diffuse prior distribution over these parameters; Including true population shares and probabilities of response within each group.

Update and iterate towards ‘our best posteriors?’