- We review a randomized controlled trial (RCT) of a program designed to prevent unintended pregnancies among young women.
- The study was published in a leading, peer-reviewed scientific journal—The Lancet—and generated significant attention, including national press coverage, based on its reported positive findings.
- Our concern: The RCT actually found that the program had no significant effect on pregnancy rates for the full study sample, yet did not report that key finding in the study abstract. Following an all-too-common pattern in the research literature, the study instead portrayed its pregnancy results as positive based on a subgroup finding that is not reliable under accepted scientific standards.
- The study team’s response to our concern, and our rejoinder, follow the main report.
The Lancet, a leading scientific journal, recently published a randomized controlled trial (RCT) evaluating a program that trained reproductive health clinics to provide counseling about long-term reversible contraceptives (LARCs) to young women, and access to LARC devices—namely intrauterine devices (IUDs) and progestin implants. The study claimed to have found positive effects in preventing unintended pregnancies, which generated significant attention in the policy community as well as national press coverage in media outlets such as CNN and Reuters. The study abstract reports the main findings as follows (we’ve highlighted the key text in bold):
Findings: Of 1500 women enrolled, more at intervention than control sites reported receiving counselling on IUDs or implants (565 [71%] of 797 vs. 271 [39%] of 693, odds ratio 3.8, 95% CI 2.8-5.2) and more selected LARCs during the clinic visit (224 [28%] vs. 117 [17%], 1.9, 1.3-2.8). The pregnancy rate was lower in intervention group than in control group after family planning visits (7.9 vs. 15.4 per 100 person-years), but not after abortion visits (26.5 vs. 22.3 per 100 person-years). We found a significant intervention effect on pregnancy rates in women attending family planning visits (hazard ratio 0.54, 95% CI 0.34-0.85).
Interpretation: The pregnancy rate can be reduced by provision of counselling on long-term reversible contraception and access to devices during family planning counselling visits.
Source: Cynthia C. Harper, Corinne H. Rocca, Kirsten M. Thompson, Johanna Morfesis, Suzan Goodman, Philip D. Darney, Carolyn L. Westhoff, and J. Joseph Speidel, “Reductions in Pregnancy Rates in the USA with Long-Acting Reversible Contraception: A Cluster Randomized Trial,” The Lancet, vol. 386, issue 9993, 2015 , pp. 562-568 (linked here).
What’s missing here?
The abstract does not mention that the effect on pregnancy rates for the full sample of 1500 women was almost exactly zero and was not statistically significant.[i] Instead, the abstract highlights findings for specific subgroups—women attending family planning visits and women attending post-abortion visits. Given that these were just two of many subgroup effects that the researchers set out to examine in their pre-registered study plan (e.g., subgroups defined by age, race/ethnicity, education, pregnancy intentions, mental health, and domestic violence history),[ii] there is a high risk that the positive effect found for women attending family planning visits is the result of chance, rather than a true finding of effectiveness for that subgroup. The reason is straightforward: For each subgroup examined, the test for statistical significance has roughly a one in 20 chance of producing a false-positive result when the program’s true effect is zero. Since the study examined numerous subgroups, the chance of producing such a false positive is high.
Several years ago, another Lancet paper provided an excellent illustration of how subgroup effects such as those in the LARC study can often appear by chance. The paper used as an example a large well-conducted RCT of aspirin for the emergency treatment of heart attacks. The study found that aspirin was highly effective, resulting in a statistically significant 23 percent reduction in vascular deaths over a one-month period. To illustrate the unreliability of subgroup analyses, researchers subdivided the full sample of 17,000 patients into 12 subgroups according to the patients’ astrological birth sign. For two of the subgroups, Libra and Gemini, aspirin appeared to have no effect in reducing mortality: For these subgroups, the death rate in the aspirin group was slightly higher than for the control group. This was just a spurious result of course, showing that if you look at enough subgroups, you are bound to find some illusory subgroup effects.[iii]
For this reason, the Food and Drug Administration (as one example) generally does not accept subgroup findings as valid evidence of effectiveness in deciding whether to approve a new pharmaceutical drug or medical device for licensing.[iv]
Post-hoc subgroup effects such as those found in the LARC study can often be valuable as a source of hypotheses to test in future research, and it is therefore frequently useful for studies to conduct and report them with appropriate caveats as to their suggestive nature. However, a study abstract, which is intended to be a balanced summary of the study’s methods and key findings, should not focus on subgroup findings that are only suggestive in nature while omitting any mention of the most credible result—in this case, no significant impact on pregnancy for the full sample.[v]
Response provided by the study team at University of California, San Francisco (UCSF)
Experts on evidence agree: Our study meets science’s highest standards
The critique of our cluster randomized trial of a LARC intervention in this “Straight Talk” column is detailed and eye-catching, but misplaced. The leading scientific group that evaluates evidence in health care, the Cochrane Database of Systematic Reviews, selected our study for a comprehensive and thorough expert assessment. Their meticulous review of the quality and strength of the evidence on a range of important criteria gave our study the highest marks. The standards they use are exacting and difficult to meet in all categories. In fact, our trial was the only one to receive an overall assessment of high quality.
“Straight Talk” asserts we conducted a post-hoc subgroup analysis, and criticizes it as such, but this is not accurate. Our analysis was specified a priori and was conducted with a test of interaction for facility type, which is a far more stringent approach than stratified subgroup analysis to assess potential heterogeneity (as explained in a more recent Lancet article than the one they cite). This analysis was required to determine how the effect of LARC training differs in family planning clinics, where contraception is often covered by funding programs, such as Medicaid, compared to post-abortion contraceptive care, where there are prohibitive cost and practical barriers to adopting a contraceptive method. Evidence from our research demonstrates the effect of our training intervention, and also points to the importance of the removal of cost barriers for the adoption of highly effective contraceptive methods.
“Straight Talk” provocatively asks, “What’s missing here?” What is missing is a careful review of our article to understand the strengths and limitations. Our study was the first to show that a training for providers can increase women’s access to highly effective contraceptives and support women’s autonomy in decision-making. Our training reduced unintended pregnancy almost in half among women in family planning clinics, a finding upheld by several rigorous scientific reviews of international experts, including at the Lancet. Provider training is a scalable intervention to reach millions of women in need of contraception. In the current U.S. policy context of threatened funding cuts to contraception and family planning clinics in particular, high-quality evidence to improve women’s health outcomes should not be disregarded.
Rejoinder by LJAF Evidence-Based Policy team
We appreciate the UCSF team’s reply and agree with them—and with the Cochrane review they cite—that this was a high-quality RCT. Our concern is not about the quality of the study but about the overstatement of its findings. First, the study abstract fails to mention the central study finding that the Cochrane review makes clear in its summary of the study’s results—namely “the study groups [treatment and control] did not differ for pregnancy at one year based on unadjusted analysis as well as covariate adjusted analysis” (Cochrane review, page 15).[vi] In other words: There was no significant impact on pregnancy for the full sample.
Second, the study abstract instead highlights a specific subgroup finding that is not reliable under accepted standards because it was one of many subgroup effects that the study examined, and therefore could well have appeared by chance. The UCSF team correctly notes in their response that they specified their analysis in advance of the study and that the subgroup effect was statistically significant in a test of “interaction.” However, the 2005 Lancet article they cite in their response makes clear that for a subgroup analysis to be considered reliable, the analysis should not only be specified in advance, but it should also pre-specify (i) “a small number of potentially-important subgroups” and (ii) “the direction and magnitude of anticipated subgroup effects.”[vii] The goal is to prevent researchers from later sifting among many subgroup findings and consciously or unconsciously highlighting those that support their preferred position as true effects when they could easily be spurious.
The UCSF study meets neither of these two conditions. First, its pre-specified study plan (shown here) listed a whole host of subgroups to be examined—namely subgroups defined by “clinic visit type (post-abortion or family planning), pregnancy intentions, mental health and domestic violence, provider-patient interaction, male partner, sociodemographic (age, race/ethnicity, education) and policy variables (Medicaid expansion waiver states, mandates for contraceptive coverage for private insurance).” Second, the study plan did not state the anticipated direction or magnitude of the various subgroup effects. The 2005 Lancet article makes clear that a study failing to meet these conditions is at high risk of producing false subgroup findings, even if it uses interaction tests.
Our bottom line: We commend the study team for conducting a high-quality RCT addressing an important policy question. The RCT found no significant effect on pregnancy rates for the full sample, yet—following an all-too-common pattern in the research literature—portrays its pregnancy results as positive based on a subgroup finding that is not reliable under accepted scientific standards.
[i] The hazard ratio for the covariate-adjusted effect on pregnancy outcomes for the full sample was 0.99, which is extremely close to 1.00 (i.e., zero effect) – see table 4, row 1, column 2.
[ii] The study’s pre-registered analysis plan is posted here on ClinicalTrials.gov. Notably, the study did not sort clinics by type (i.e., family planning versus post-abortion) and then conduct random assignment within each clinic type. Such “stratification” of clinics might have signaled the researchers’ prior intent to examine effects for the two types of clinics separately as part of their main analysis.
[iii] Rory Collins and Stephen MacMahon, “Reliable Assessment of the Effects of Treatment on Mortality and Major Morbidity, I: Clinical Trials,” The Lancet, vol. 357, February 3, 2001, p. 375, linked here.
[iv] U.S. Department of Health and Human Services, Food and Drug Administration, Guidance for Industry: E9 Statistical Principles for Clinical Trials, September 1998, pp. 33-34, linked here.
[v] We recognize that, under certain specific conditions, subgroup analyses may have high credibility – namely, if the study clearly pre-specified a small number of subgroup effects as primary hypotheses of the study, and if the difference between the program’s effect on the subgroup and the rest of the sample is statistically significant (see Howard S. Bloom and Charles Michalopoulos, “When Is the Story in the Subgroups? Strategies for Interpreting and Reporting Intervention Effects on Subgroups,” MDRC Working Papers on Research Methodology, November 2010, linked here). Such conditions do not apply in the LARC RCT and, in any case, a balanced study abstract would report such subgroup findings along with the findings for the full sample (rather than omitting the latter).
[vi] LM Lopez, TW Grey, EE Tolley, and M Chen, “Brief educational strategies for improving contraception use in young people,” Cochrane Database of Systematic Reviews, issue 3, art. no.: CD012025. DOI: 10.1002/14651858.CD012025.pub2, 2016, linked here.
[vii] Peter M. Rothwell, “Subgroup analysis in randomised controlled trials: importance, indications, and interpretation,” The Lancet, vol. 365, 2005, pp. 176-86, linked here.