- We discuss the value and feasibility of “prespecification” – the publication of a study’s analysis plan prior to estimating program impacts – and provide an illustration using a recently published study.
- In part, prespecification is intended to protect against the problem of reporting false positive findings, but critics of prespecification argue that it is too complicated and limiting.
- However, we suggest that limiting full prespecification to a few key research questions, while not limiting reporting to only these questions, overcomes both these concerns and provides a level of transparency which cannot be achieved without it.
- This is illustrated with a recent publication examining the long-term effects of the Nurse Family Partnership on mortality.
- The study authors were able to transparently report non-significant findings for the prespecified analyses while also reporting meaningful and statistically significant findings on an outcome defined “posthoc” because the prespecified analysis plan provided the scaffolding upon which the findings could be placed.
“Prespecification” or “preregistration” – the publication of a study’s analysis plan prior to estimating program impacts – is an important activity which has long been practiced in medical research and is happily becoming more common in randomized control trials (RCTs) in social policy.
A critically important purpose of prespecification is to reduce the likelihood of “data mining” whereby researchers test many research questions and focus their reporting on the one or more statistically significant results, without clearly reporting on how many tests were actually conducted. Data mining is a concern because if, for example, a researcher conducts 20 statistical tests of a program that produces no true positive effects, more than half the time at least one false positive will be identified at a conventional level of statistical significance (p<0.05). If the researcher reports such a chance finding as a true effect, in the absence of prespecification, the reader will have no way of knowing how many other research questions were examined but not reported and, thus, how credible the finding is. This lack of transparency undermines the credibility of published findings but can be avoided with the publication of a prespecified analysis plan prior to estimating program impacts.
Prespecification, however, has its critics. In this post we describe the main criticisms, how they can be addressed, and provide an illustration of how a prespecified plan provides transparency for a reader to assess the strength of the evidence for a study’s findings and so enhances their credibility.
The main criticisms of prespecification are:
1) It limits reporting to what’s in the prespecified analysis plan and so restricts researchers’ ability to discover new, unexpected findings which weren’t anticipated by the plan; and
2) It’s too complicated to prespecify all analyses in advance.
These are legitimate concerns, but we believe they can be addressed, primarily by limiting full prespecification to only one or two key research questions, also known as the “confirmatory” hypotheses. By “full prespecification,” we mean that all analysis parameters are detailed sufficiently to leave no room for researcher post-hoc discretion. Typically, it’s relatively straightforward to use the program’s theory of change to select one or two main questions geared towards determining whether the program is achieving its primary objectives. In addition to these confirmatory hypotheses, researchers can conduct and report on additional analyses they believe will be informative (whether prespecified or not) and simply label them as “secondary” or “exploratory” while noting the results should be viewed as suggestive but not determinative of the program’s effectiveness. We believe this approach addresses the main criticisms of prespecification in that it isn’t overly complicated, and it doesn’t limit what can be learned from the study to the confirmatory hypotheses.
To illustrate that the above approach makes prespecification feasible and not overly limiting, we describe a recently published study, which we funded, that looked at the effects of the Nurse Family Partnership (NFP) program on mortality.
NFP is a home visiting program for pregnant women and new mothers that has previously shown positive effects on child and maternal outcomes in three RCTs—Elmira, NY; Memphis, TN; and Denver, CO—and has been widely implemented both in the U.S. and other countries. The researchers hypothesized NFP may have positively affected maternal and child mortality over the long term (20+ years) and were awarded a grant by the Laura and John Arnold Foundation to address this question.
The analysis plan indicated that the study would estimate effects on all-cause mortality for mothers and preventable-cause mortality for children. The researchers hypothesized that, because of the more at-risk population in Memphis, who had experienced high death rates in an earlier follow-up, they would be more likely to find effects on mortality in Memphis than in the two other sites. This led them to prespecify two confirmatory analyses – one analysis with the Memphis sample and one analysis with the pooled Elmira and Denver samples.
The results of the study were recently published. With respect to mothers, the abstract describes the headline result as “…no significant nurse home visiting-control difference in maternal mortality in Memphis or Elmira and Denver,” emphasizing the null result for the two prespecified confirmatory analyses. Unexpectedly, the study found a significant impact on external–cause mortality for the pooled sample across all three sites. This finding is summarized as, “Posthoc analysis, combining all 3 trials, suggested a reduction in external-cause mortality…” That is, the description is specific that the 3-site pooled results with a more qualified mortality outcome (external-cause versus all-cause) was not prespecified. In addition, it does not claim an effect was found, but only “suggested.” This kind of language is carried through the full text also, and the discussion section concludes primarily with suggestions for further research, specifically that future research designs need larger samples to investigate possible impacts of NFP on mortality.
When developing an analysis plan, the researchers considered two choices for confirmatory hypotheses: to analyze the Memphis site separately or to analyze all three sites together. They chose the former and their reporting reflects that choice. But they were not precluded from estimating effects for the sample pooled across all three sites and reporting those results. And they did so in a way that was appropriately qualified and transparent.
The authors of the NFP study are to be applauded for their honest and transparent reporting, as not all researchers report consistently with a prespecified plan, particularly in cases where their primary analyses don’t produce statistically significant results. But notably the authors were able to report transparently because the prespecified analysis plan provided the scaffolding upon which the findings could be placed. Without the plan, the researchers themselves wouldn’t have known which findings to emphasize, and readers would not know whether the reported findings may have been selected from among many others examined. The researchers weren’t precluded from reporting the posthoc analysis, and the value of the study is greater for their having done so. In the end, the researchers were free to explore important questions and analyses that were not prespecified while still transparently reporting the results of their pre-specified confirmatory analyses. A win-win.
Response provided by the lead study author
We invited the lead study author, David Olds, to provide written comments on our report. He appreciated the opportunity to respond and did not have comments to add.
 This is the primary reason why the Arnold Ventures Evidence-Based Policy team requires pre-specified analysis plans for all funded RCTs.
 For example, for an RCT of a tutoring program for struggling readers it would make sense to focus on the program’s impact on an objective measure of reading comprehension; or, in the case of a workforce program, it would be logical to zero in on the program’s impact on annual earnings.
 Importantly, sometimes assumptions underlying an analysis plan turn out to be faulty and a change in approach may be required. In such a situation, researchers can address these problems with revisions or amendments to the existing pre-analysis plan. However, they should do so in such a way that they are “blind” to the effects of any changes on the study’s ultimate findings.
 The Laura and John Arnold Foundation is now administered by Arnold Ventures, the staff of which produce Straight Talk on Evidence.
 For simplicity we don’t address the child mortality findings, but they were equally well-reported.
 This effect was significant at the 0.054 level and clearly reported as such.