7 (Preregistration for the first run, with brief note on extension to the second run)

Registered at OSF here using ‘AsPredicted’ template. Below: relevant excerpts.

Main question

What’s the main question being asked or hypothesis being tested in this study? (optional)

What is the impact of including ‘information about the per-dollar impact’ of a charity (in terms of services provided) on the average donation (equivalently, total amount raised) and the donation incidence rate? (Our field experiment is implemented in a particular context and we recognize that heterogeneity is possible; nonetheless, we see this as a substantial piece of evidence, reasonably generalizable to related relevant fundraisers.)

Key dependent variables

Describe the key dependent variable(s) specifying how they will be measured.

Whether they made a gift, i.e., a donation to the charity (yes/no)
Gift amount
Email open rates (although these shouldn’t vary by treatment as we didn’t change the subject line)
Click through (whether they clicked on a link: yes/no)
Number of click throughs (how many links they clicked on)

The charity fundraising partner will provide us with this information and may provide additional outcome measures (to be determined). We may be able to look at both the immediate response and the long-term response of those receiving the treatment and control emails, perhaps including unsubscription rates.

Conditions/treatments

How many and which conditions will participants be assigned to? (optional)

Two: a single treatment and a single control.

We are running this subject to the final say of the charity. We have proposed that the Treatment emails (but not the control emails) will include a sentence/fragment such as the following in both a captioned photo in the email, and the email text:

Last year, we were able to provide [general provision of an outcome here relevant to the charity] to a [recipient unit] with just $[small amount of money].

Key analyses

Specify exactly which analyses you will conduct to examine the main question/hypothesis. (optional)

We plan to perform standard nonparametric statistical tests of the affect of this treatment on

Average gift amount (including zeroes)
Incidence and number of people making a gift between control and test.

In particular, we will focus on Fisher’s exact test (for incidence) and the standard rank sum and t-tests for the donation amounts. If the aforementioned results are not statistically significant at the p=0.05 level or better, we do not plan to include statistical controls nor to do any interactions/differentiation of our results. We will report confidence intervals on our estimates, and make inferences on reasonable bounds on our effect, even if it is a ‘null effect’.

Secondary analyses

Any secondary analyses? (optional)

As a secondary concern, the ‘clickthrough’. rates
Rates of unsubscription from the mailing list (if available)
Rates of sign up for long-term regular donations (if available)

If the data is available, we will also aim to measure the impact on the long-run participation and donations of these email recipients over the course of subsequent promotions.

If the main analysis finds a statistically significant result $(p \lt 0.05)$ we will differentiate this result by likely measures of heterogeneity of the effect off impact information as cited in the literature. In particular, we will differentiate this by ‘large previous donors’ versus ‘small previous donors’ versus ‘previous non-donors.’

Sample size

How many observations will be collected or what will determine sample size? No need to justify decision, but be precise about exactly how the number will be determined. (optional)

The charity fundraising partner has informed us that they expect to send out roughly 330,000 emails. We’ve asked them to divide these evenly between treatment and control. If we do not obtain ‘tight bounds’ on the estimated effect, we will ask them to run a second trial with a comparable size. (See ‘stopping’ below).

Power calculations

Note: AsPredicted does not ask for this.

Response rates in previous such emails were extremely low: approximately 1 per 3,000 emails. Our power calculations suggest that we have .29 power to detect a 50% effect, and 0.90 power to detect approximately a 100% (doubling) on incidence:

statmod::power.fisher.test(0.0003,0.00045,150000,150000,alpha=0.01,nsim=10000)

$\rightarrow$ .2896 power

This is probably considered an ‘underpowered test’. We need roughly a 100% effect (a doubling of the donation incidence) to have 80% power here.

statmod::power.fisher.test(0.0003,0.0006,150000,150000,alpha=0.01,nsim=10000)

$\rightarrow$ 0.8963

Because of this limited power, we will ask the charity to run this trial a second time with an equivalent-sized sample. With a doubling of our sample we have roughly 0.8 power to detect a 50% impact on incidence:

stats::power.prop.test(,0.000327,0.000327*1.5,0.01,.8)

$\rightarrow$ n = 357007 (this refers to the number per treatment)

Other

Anything else you would like to pre-register? (e.g., data exclusions, variables collected for exploratory purposes, unusual analyses planned?) (optional)

Exclusions: we will exclude any emails that bounced or were not opened from our analysis. The latter is under the assumption that nothing about the email was visibly different before opening it between control and treatment, thus no differential selection. If we see this does not hold after implementation, we will not make this exclusion. If a group of respondents had somehow found out about our study (which is unlikely), and we learn this, we will drop them from our sample.

In response to unanticipated deviations from our plan or ‘surprises’, we will follow Columbia Green Lab standard operating protocol as is possible and reasonable.

Stopping rule

We will not ask the charity to stop this treatment in the middle of this campaign. We aim to continue this treatment in future charity appeals until we can statistically bound (with 95% confidence) the impact of the treatment on both incidence and average donation within a margin of 1/3 of the incidence and average donation in the control condition. If we have not obtained this after one trial, we will ask to charity to repeat the trial in its next comparable campaign, or series of campaigns, up to a doubling of expected size of this initial trial (i.e., up to roughly 660,000 observations). After this we are very likely to stop this particular trial even if we have not attained this tight bound.

27 Nov 2018 update on plans for second trial (see above)

Randomisation diagnostics If our charity partner gives us these variables, we will perform standard ‘randomisation check’ tests that pre-determined variables are balanced across treatment and control. If these are not balanced to a significant extent (an extent not likely to be due to chance; our diagnostic tests here should be powerful given our large sample size), we will inquire after the precise reason for the imbalance. We will only use an adjustment for this imbalance if we are given a meaningful explanation for the imbalance see discussion and references in the Green lab SoP.

Note on 27 Nov 2018: while it seems that the charity has sent out these emails, they are still waiting for the final outcome data, and putting the data together. They have not shared any outcome data with Donor’s Voice, nor with us as researchers, at the time that we are preregistering this. (They have indicated that they will share the data with Donor’s voice in the first week of December 2018.)