- We review an article published in this month’s issue of Pediatrics, a leading scientific journal, that reported findings from a randomized controlled trial of an early childhood home visiting program called Minding the Baby.
- The key reported finding is that the program significantly lowered the rate of obesity among the young children in participating families.
- Our first concern: Obesity was not one of the key outcomes that the study planned to examine, according to the study protocol; the obesity finding is instead the result of a post-hoc analysis. Under accepted scientific standards, the finding should therefore be considered preliminary and unreliable since it could be a “false-positive,” produced by chance as a result of the study’s measurement of numerous outcomes.
- Our second concern: There was a flaw in study implementation in at least one of the two study sites—namely, mothers were asked to join the study after learning which group they had been randomly assigned to, and a much higher percentage joined in the program than control group. This process could easily undermine the equivalence of the two groups and creates an “unacceptable threat of bias [i.e., inaccurate results]” under standards of the Institute of Education Sciences.
- The Pediatrics paper does not discuss either problem, and a reader who has not seen the earlier study reports and protocol would have no way to detect these key weaknesses.
- The study team’s response to our concerns, and our rejoinder, follow the main report.
Pediatrics—a leading scientific journal—recently published findings from a randomized controlled trial (RCT) of Minding the Baby, an early childhood home visiting program for low-income families (linked here, February 2018 issue). The study claims that the program produced a significant reduction in child obesity that has important policy implications. The paper’s abstract summarizes the study hypotheses, methods, and findings as follows (we display the key text in bold):
Background: Young children living in historically marginalized families are at risk for becoming adolescents with obesity and subsequently adults with increased obesity-related morbidities. These risks are particularly acute for Hispanic children. We hypothesized that the prevention-focused, socioecological approach of the “Minding the Baby” (MTB) home visiting program might decrease the rate of childhood overweight and obesity early in life.
Methods: This study is a prospective longitudinal cohort study in which we include data collected during 2 phases of the MTB randomized controlled trial. First-time, young mothers who lived in medically underserved communities were invited to participate in the MTB program. Data were collected on demographics, maternal mental health, and anthropometrics of 158 children from birth to 2 years.
Results: More children in the intervention group had a healthy BMI at 2 years. The rate of obesity was significantly higher (P < .01) in the control group (19.7%) compared with the intervention group (3.3%) at this age. Among Hispanic families, children in the MTB intervention were less likely to have overweight or obesity (odds ratio = 0.32; 95% confidence interval: 0.13–0.78).
Conclusions: Using the MTB program, we significantly lowered the rate of obesity among 2-year-old children living in low-socioeconomic-status communities. In addition, children of Hispanic mothers were less likely to have overweight or obesity at 2 years. Given the high and disproportionate national prevalence of Hispanic young children with overweight and obesity and the increased costs of obesity-related morbidities, these findings have important clinical, research, and policy implications.
There are two main reasons we believe these findings are unreliable.
First, the idea that the program would affect child obesity, particularly for the subgroup of Hispanic children, was not an original hypothesis of the study—despite the suggestion in the abstract that it was. The study’s original, pre-specified hypotheses, posted here on clinicaltrials.gov, were that the program would increase mothers’ reflective capacity (e.g., ability to envision her baby’s emotions, thoughts, and intentions), increase infant attachment, reduce rapid repeat childbearing, and reduce child abuse or neglect. Nothing is hypothesized about children’s weight.
In other words, the study’s examination of obesity, overweight, and body mass index (BMI) outcomes, both for the full sample and the subgroup of Hispanic children, are post-hoc analyses that were not set out in the original study protocol. This is problematic because, for each outcome that a study examines, there is roughly a one in 20 chance that the test for statistical significance will produce a false-positive result when the program’s true effect is zero. So if a study examines numerous outcomes (and so far this study has examined at least 30, including subgroup effects, based on the current paper and prior reports), it becomes a near certainty that the study will produce some false-positive findings. To minimize the chances of this, respected scientific authorities such as the Food and Drug Administration (FDA) and Institute of Education Sciences (IES) recommend that studies pre-specify a relatively small number of targeted outcomes against which the program’s effectiveness will be judged. If the study then goes on to measure additional outcomes in post-hoc analyses, as this study did, the resulting findings are considered only exploratory because of the high chance that they are erroneous. As the IES advises:
“Results from post-hoc analyses are not automatically invalid, but, irrespective of plausibility or statistical significance, they should be regarded as preliminary and unreliable unless they can be rigorously tested and replicated in future studies.” (Link, page 4).
Similarly, the latest FDA draft guidance points out that someone reading a study that reports findings from post-hoc analyses has no way to know how many different analyses were performed but not reported. This is a key reason the guidance recommends that:
“Presenting p-values [i.e., tests of statistical significance] … from analyses that were not pre-specified … is inappropriate because doing so would imply a statistically rigorous conclusion and convey a level of certainly about the effects that is not supported by that trial.” (Link, pages 8-9).
The second reason we believe the Minding the Baby RCT findings reported in Pediatrics are unreliable is that there was a flaw in the study’s implementation that partly undermined its randomized design: The researchers allowed women to self-select into the study sample after random assignment. As described in prior reports on this study (linked here and here), in at least one of the two study sites, eligible women were randomly assigned to either a treatment group that was offered participation in the program or a control group that was not, and then asked for their consent to participate in the study. Knowing their randomly-assigned condition, 88 percent of women in the treatment group at this site consented to participate versus just 69 percent of women in the control group. In other words, women self-selected into the sample based in part on knowledge of whether they were in the treatment versus control group, and they did so at very different rates between the two groups and so presumably for different reasons.
Such self-selection could easily undermine the equivalence of the two groups in measured characteristics (e.g., demographics, where there were in fact sizable group differences[i]) and unmeasured characteristics (e.g., motivation), so that the treatment versus control comparison is no longer “apples-to-apples” as would be hoped for in an RCT. As a result, the study cannot rule out the possibility that differences between the two groups in key characteristics—rather than the program—caused their different outcomes. (For this reason, it is generally preferable for studies to obtain participants’ consent prior to random assignment whenever possible.)
According to IES’s What Works Clearinghouse, which—based on simulation studies in education RCTs—has published standards for acceptable sample loss, the above rates of loss due to non-consent from the treatment versus control group place this study clearly in the zone of “unacceptable threat of bias [i.e., inaccurate results] under both optimistic and cautious assumptions.” (Link, see page 11)
Unfortunately, the Pediatrics paper— in contrast to earlier reports on this study—does not discuss the consenting process nor mention that treatment and control group mothers consented at different rates.[ii] Thus, someone reading the Pediatrics paper who has not also read the prior study reports would not be able to detect this flaw in the study.
In conclusion, we believe this study’s main contribution is to identify a program—Minding the Baby—that has preliminary evidence that merits further investigation in more rigorous studies. But we do not believe the findings are yet sufficiently reliable to justify the strong evidence claims in the Pediatrics article nor to provide confidence that the program would produce positive effects on child obesity or other outcomes if implemented elsewhere.[iii]
Response provided by Monica Ordway, lead author of the Pediatrics paper on the Minding the Baby RCT
We appreciate the opportunity to further discuss our findings and address the review by “Straight Talk on Evidence” that questions the reliability of our findings due to Type I error and our randomization strategy. Regarding Type I error (false positives), we direct interested readers to Aim 1b in our summary statement and “condition and disease” section on ClinicalTrials.gov where infant outcome variables includes early attachment, infant health, and developmental outcomes, which supports our hypothesis and justifies that the analysis was not post-hoc. The ClinicalTrials.gov system limits the number of primary and secondary measures that can be listed and the infant health was not included there. We anticipated that the infants would be generally healthy over the course of the 2-year program and therefore our infant health data collection was limited to growth and development, rates of hospitalizations, ER visits, and surgeries, as well as child maltreatment and compliance with well-child visits and the immunization schedule. Our findings support the growing literature reporting positive effects of home visiting on obesity, [iv], [v], [vi] and had a large effect size.
Self-selection resulting from cluster randomization was the second concern. We used a community-engaged approach in our research design and methods. This approach required buy-in from our community partners and one of the conditions for recruiting pregnant women at the 2 local community health centers was substituting subject randomization with cluster randomization and allowing the women to know their group assignment at the time of consent. We acknowledge that cluster randomization has some limitations. However, the importance of engaging front line community clinicians and clinic leaders as partners was vital to being able to conduct this trial. The difference in rates of retention between treatment arms at the consent stage was not statistically significant (p<.12). Additionally, there were no differences in our analysis sample between the intervention group and control group other than ethnicity and birthweight (which were included in analyses as covariates). As shown in our supplemental table 4, there were no differences in the number of families in the intervention and control groups among those excluded or in any demographic variables between included and excluded families. We were aware of the need to consider clustering in our analysis and, as indicated in our paper, our calculated ICC was low and group sizes were small.
It is often difficult to summarize the details and complexities of study design, methods, results, and discussion in the limited word count allowed by many journals and we hope we addressed some of the reviewer’s concerns in this brief response. As noted in our discussion, our findings raise a critical question: “What elements of the MTB intervention might have contributed to the greater likelihood of normal weight in our intervention group and the diminished likelihood of obesity in our intervention group?”[vii] Like the exemplary model of the Nurse Family Partnership, we believe that more work is needed to replicate effects and to identify the mechanisms that may directly affect childhood obesity.[viii] Providing support and education to pregnant and new mothers through home visitation has been cited as a promising approach to potentially preventing young childhood obesity before it starts.[ix]
Rejoinder by the LJAF Evidence-Based Policy team
We agree with lead author’s comment that the aims of the study, as described on clinicaltrials.gov, included determining the program’s effect on infant health and development. However, these broad aims could encompass literally dozens of potential outcomes, such as physical growth, malnutrition, overweight, obesity, vitamin deficiency, injuries, ingestions, illnesses, immunizations, IQ, receptive vocabulary, expressive vocabulary, anxiety, mood regulation, rule breaking, aggression, impulse control, sociability, attention, fine motor skills, gross motor skills, pre-literacy skills, and many more. Add to these the study’s aim to also measure maternal life course and there is potential for a sizable number of false-positive findings, as we describe in our report.
That is why accepted scientific practice, as reflected in the FDA and IES guidance cited in our report, is to pre-specify a limited number of targeted outcomes according to which the program’s effectiveness will be judged. The study team did so, specifying four primary and three secondary outcome measures in their study registration on clinicaltrials.gov. None of these included child obesity for either the full sample or the Hispanic subgroup. The obesity findings, in other words, are the result of post-hoc analyses. Per the IES guidance, such analyses “are not automatically invalid, but . . . should be regarded as preliminary and unreliable unless they can be rigorously tested and replicated in future studies.”
Regarding self-selection of women into the study sample based on their knowledge of whether they were in the treatment or control group—we appreciate the lead author’s acknowledgement that such self-selection occurred but believe that she understates the seriousness of the problem. According to the study team’s 2013 report on the consenting process at one of the two study sites, 88 percent of women in the treatment group consented to study participation versus 69 percent of women in the control group—a difference that is statistically significant at the 0.01 level[x] and therefore unlikely to be due to chance. Such differential self-selection could easily create differences in observable characteristics between the two groups, and may help explain why the treatment group was 18 percentage points more likely to be Hispanic and 19 percentage points less likely to be African American than the control group. More importantly, it could also create sizable differences in unobservable characteristics (e.g., motivation, resilience, family support) which, unlike the observable differences, cannot be controlled for in the study analysis and which therefore—per the IES standards cited in our report— place this study clearly in the zone of “unacceptable threat of bias.”
Finally, we restate our concern that the Pediatrics paper discussed neither the consenting/self-selection process nor the fact that obesity was not one of the study’s primary or secondary outcomes. Thus, a reader of the paper who has not seen the earlier study reports nor read the study registration would have no way to detect the key study weaknesses described above.
We thank the study author for her engagement in this colloquy and hope it is helpful to readers.
[i] There were sizable differences between the two groups in ethnicity (the treatment group was 18 percentage points more likely to be Hispanic and 19 percentage points less likely to be African American than the control group) and birthweight (treatment group infants weighed 7 percent less than control group infants, equating to an effect size 0.39 standard deviations).
[ii] The Pediatrics paper focuses on the consented sample of 237 children that has already been compromised by the differential consent rates, and goes on to analyze the 158 of these whose height and weight the researchers measured at the one- and two-year follow-ups.
[iii] The study has one other key limitation that we note here for completeness but do not describe in detail given space constraints. As discussed in this and earlier reports on the study, in at least one of the two study sites, the researchers did not randomly assign individual women to the treatment versus control conditions; instead they randomly assigned prenatal care groups, each containing 15 to 25 women, to the two conditions. This is a valid procedure (called “cluster” random assignment) but, if it is used, the study’s analysis must account for such clustering or it will likely overstate the statistical confidence of the findings. Unfortunately, the study’s analysis of obesity outcomes did not do this.
[iv] Heckman, J.J., Holland, M.L., Makino, K.K., Pinto, R., Rosales-Rueda, M. (July 2017) An Analysis of the Memphis Nurse-Family Partnership Program. National Bureau of Economics Research. NBER Working Paper No. 23610.
[v] Thorland, W., Currie, D & Colangelo, C. (2017). Status of high body weight among Nurse-Family Partnership children. Maternal Child Nursing, 42, 352-357.
[vi] Wen, L.M., Baur, L. A., Simpson, J. M, Rissel, C., Wardle, K & Flood, V. M. (2012). Effectiveness of home based early intervention on children’s BMI at age 2: Randomised controlled trial. BMJ, 344; e3732.
[vii] Ordway, M. R., Sadler, L. S., Holland, M. L., Slade, A., Close, N., & Mayes, L. C. (2018). A Home Visiting Parenting Program and Child Obesity: A Randomized Trial. Pediatrics. doi: 10.1542/peds.2017-1076.
[x] Specifically, 87.5 percent of the 72 women in the treatment group consented to participate in the study versus 68.7 of the 72 women in the control group. Plugging these numbers into an online calculator, this difference is highly statistically significant (p=0.007).