- The U.S. Department of Health and Human Services (HHS) recently reported early findings from a large randomized controlled trial (RCT) of home visiting services for at-risk families with young children. The services included four program types, or “models,” each with distinct features and protocols.
- The study findings, pooled across the four models, show weak or no effects on key child and parent outcomes when the children were 15 months of age.
- These early findings are disappointing, but not a surprise. In prior rigorous studies, only one of the four models (Nurse-Family Partnership) was found to produce important positive effects; the other three were not. For that reason, we and others had predicted at the study’s inception that the pooled (i.e., average) effects across the four models would be small or none.
- The study, unfortunately, does not report outcomes separately for each of the four models. What little model-specific information is reported suggests that on certain key outcomes (e.g., child emergency department visits), some models may indeed produce sizable positive effects.
- We urge HHS to fully report model-specific findings in this and future study follow-ups and, consistent with the HHS program’s design, use the results to reallocate funds toward the more effective models.
- The study team’s comment on our report, and our rejoinder, follow the main text.
In 2010, Congress established the federal Maternal, Infant, and Early Childhood Home Visiting (MIECHV) program, which funded a major expansion of home visiting program services for at-risk families who are expecting or have recently given birth to a child. The U.S. Department of Health and Human Services (HHS) subsequently commissioned a large randomized controlled trial (RCT) to evaluate the effectiveness of MIECHV-funded home visiting services as delivered by the four most widely-implemented program approaches, or “models.” The four models are: (i) Early Head Start’s Home-Based Option, (ii) Healthy Families America, (iii) Nurse-Family Partnership, and (iv) Parents as Teachers. These models differ in some key features such as type of home visitor (e.g., nurse versus paraprofessional); timing, duration, and content of the home visits; and level of risk of the enrolled families.
Last month (January 2019), HHS released early findings from the evaluation—an RCT conducted by MDRC with a sample of 4,229 families across 12 states. For each of the four models, eligible families were randomly assigned to either a treatment group that was offered the model’s home visiting services, or a control group that was not (but could access usual community services). Based on our careful review, we believe this is a very well-conducted study.[i]
The study has now reported initial results, pooled across the four models, when the children reached 15 months of age. While the findings are still early and could very well change in future follow-ups (e.g., since many families were still enrolled in the program and receiving home visits at 15 months), the program’s initial effects look weak. The following table shows the effects on the study’s pre-specified primary outcomes at the 15-month follow-up:
|Primary Outcomes||Treatment Group||Control Group|
|New pregnancy after study entry (%)||18.2%||17.6%|
|Receiving education or training (%)||23.3%||22.9%|
|Quality of the home environment (scale of 0-45)||38.8||38.5**|
|Parental supportiveness (scale of 1-7)||4.0||3.9|
|Frequency of minor physical assault during the past year (incidents/person)||2.1||2.2|
|Frequency of psychological aggression during the past year (incidents/person)||3.1||3.3+|
|Health insurance coverage for the child (%)||94.8%||95.3%|
|Number of Medicaid-paid well-child visits||5.0||5.1|
|Number of Medicaid-paid child emergency department visits||2.1||2.2*|
|Any Medicaid-paid health care encounter for injuries or ingestion (%)||25.7%||26.8%|
|Behavioral problems (scale of 31-93)||44.5||44.9+|
|Receptive language skills (IQ-type scale with a mean of 100, standard deviation of 15)||95.6||95.3|
Difference between the treatment and control group is statistically significant at the 0.10 level (+), 0.05 level (*), 0.01 level (**).
As you can see, there is little difference in outcomes between the treatment and control groups.[ii] These findings are disappointing, but they should not be a surprise. As we have noted previously (e.g., 2011 and 2017), prior rigorous studies have found that three of the four models in the HHS RCT—Early Head Start’s Home-Based Option, Healthy Families America, and Parents as Teachers—produced weak or no positive effects on child and family outcomes. The only one of the four models with credible prior evidence of important effects is the Nurse-Family Partnership (NFP). Thus, an evaluation such as the HHS RCT that measures home visiting’s pooled (i.e., average) effect across the various models is likely to produce disappointing results, since the effects of the one potentially effective model (NFP) will be diluted by the others that previous RCTs have found ineffective.
“This [evaluation] plan’s approach is not to evaluate the impact of specific models, but rather to estimate the impact of the HHS Home Visiting Program as a whole through a large randomized controlled trial. Like the 10 other randomized ‘whole-program’ evaluations that the federal government has funded since 1990 (Head Start, Upward Bound, Even Start, Job Corps, etc.), we believe this approach will (i) likely show that the Program’s overall impact on key outcomes is small or none; and (ii) miss the opportunity to identify the few models within the Program that produce sizable impacts ….”
A group of six major philanthropic foundations made a similar prediction in their comments to HHS in 2011 and, like us, urged a revision in the evaluation design to measure model-specific effects.
HHS heeded the input, and revised its evaluation plan so as to measure both the pooled effects across the various models and the effects of each model (see final plan, April 2013). In essence, the overall RCT as ultimately fielded is comprised of four sub-RCTs, one for each model, with the sample size of each sub-RCT (571 to 1,454 families) large enough to assess the model-specific effects.
We were therefore dismayed that the recent HHS report shows the treatment and control group outcomes for the pooled sample (summarized in the above table), but fails to report these outcomes for each of the four models. The HHS report does include a table[iv] showing the treatment-control differences for each model (i.e., treatment group outcome minus control group outcome). However, it is not possible for the reader to gauge whether these differences are of practical importance or statistically reliable without seeing the actual outcomes for each model’s treatment and control group—similar to what is shown in the above table for the pooled sample.
What little model-specific information is shown in the report suggests that, for two important effects found in prior evaluations of NFP—a reduction in child health/safety emergencies and mothers’ rapid repeat pregnancies—one may successfully replicate in the HHS study and the other may not. The potential positive news—i.e., silver lining in the HHS findings—is that, based on rough calculations, we believe NFP may have produced a sizable (20 to 25 percent) reduction in the number of Medicaid-paid child emergency department (ED) visits, and a modest (10 to 15 percent) reduction in likelihood of a Medicaid-paid health care encounter for injury or ingestion. We believe the effect on ED visits is likely to be highly statistically significant (p<0.01) and the effect on injury/ingestion is likely not statistically significant but still perhaps encouraging at this early follow-up, when the children have only recently entered the period of greatest risk of injury/ingestion. By contrast, NFP does not yet appear to be having an effect on the rate of repeat pregnancy.
But these are only guesstimates because the HHS report, frustratingly, does not provide the full model-specific information needed to know the true answers. HHS has commissioned an excellent study, but by not reporting the model-specific outcomes we believe it could damage the MIECHV program in two important ways. First, as the group of six philanthropic foundations noted in their comments to HHS in 2011, a “finding no or little impact when assessing the MIECHV [program] as a whole … will undermine the credibility of MIECHV and jeopardize its future sustainability, simply as a result of the limitations of the evaluation.” Second, it violates the “tiered evidence” design of MIECHV, articulated in 2009 by then-OMB director Peter Orszag, under which (i) funded models—including both established and new/emerging ones—are rigorously evaluated; and (ii) the evaluation findings are used to reallocate program funds from the less effective toward the more effective models, thereby evolving the overall MIECHV program toward greater effectiveness over time. How can this mechanism work when no one knows the model-specific outcomes?
We encourage HHS to take two fairly straightforward steps to address this issue. First, as part of the current 15-month follow-up, we urge HHS to publish a table of treatment and control group outcomes for each of the four models, analogous to that shown above for the pooled sample.[v] Second, in future study follow-ups, we urge HHS to measure and report (among other things) the model-specific outcomes for which prior evidence suggests important effects may be found. In the case of NFP, the prior evidence suggests effects are most likely to be found on measures of child health and safety (including abuse and neglect); rapid repeat births; and cognitive/educational outcomes for the subgroup of children born to mothers with low psychological resources.[vi] Although the prior evidence base for the other three models is weaker, similar model-specific hypotheses should be developed for them too, and corresponding findings included in future HHS study reports.
As the HHS study goes forward, reporting findings through kindergarten and potentially beyond, it may be that the hoped-for effects of NFP will not materialize. Or perhaps we will learn that one of the other models produces important positive effects. Whatever the findings, we need to know them in order to focus public funds on services that will truly improve the lives of vulnerable children and parents.
We end on a familiar theme. We know from the history of rigorous evaluations that surprisingly few social program models and strategies produce the hoped-for effects, and prior studies suggest that this same pattern applies in the field of home visiting. But this history also shows that exceptional models/strategies that produce important improvements in people’s lives do exist. MIECHV’s tiered-evidence structure enables it to both identify the exceptional models and focus program funds on them. We urge HHS to publish and act on the key evidence findings to make this process work.
Response provided by the MDRC study team in consultation with colleagues in the federal government:
We thank Straight Talk for highlighting recent findings from the Mother and Infant Home Visiting Evaluation (MIHOPE), and for the compliments on how the study was conducted and the information it provides to the field. However, the study team, in consultation with our colleagues in the federal government, would like to clarify and respond to a few points made in the commentary.
First, the Straight Talk commentary claims that the goal of MIHOPE was “to evaluate the effectiveness of MIECHV-funded home visiting services as delivered by the four most widely-implemented program approaches, or ‘models.’” This description mischaracterizes the goals and focus of the study. The legislation that authorized the Maternal, Infant, and Early Childhood Home Visiting (MIECHV) program called for an evaluation of “the effect of early childhood home visitation programs on child and parent outcomes” overall and for key subgroups of families.[vii] To address this legislative requirement, MIHOPE was designed to focus on the effects of home visiting for the full sample, with secondary explorations of how the impacts varied across subgroups of families, features of how local home visiting programs were implemented, and national models. An advisory committee to the Secretary of Health and Human Services (HHS) endorsed this approach as being of most use to the program and the home visiting field, urging the research team to avoid a horse race among models, and the evaluation design reflects this guidance.[viii] MIHOPE was not designed to be a study of these four evidence-based models. Rather, it included these four models because, at the time the study started, these were the models that states planned for wide use with MIECHV funding.
Second, the commentary argues that the report “violates the ‘tiered evidence’ design of MIECHV.” However, MIHOPE was designed to meet the legislative requirement for a national evaluation, which is distinct from the legislative requirements concerning how MIECHV funds could be used for evidence-based models and promising approach models (a.k.a., the tiered evidence design of MIECHV).[ix] While MIHOPE’s findings can feed into the ongoing HHS evidence review[x] that determines which models are considered evidence-based for purposes of MIECHV funding, that is not the primary purpose of this study.
Third, the commentary “guesstimates” at the effects of the Nurse-Family Partnership on several outcomes, arguing that the report “obscures a potentially important silver lining” by not presenting findings for each of the four individual models. However, this is unnecessary since Table 5.1 of the report shows impacts for the four national models and examines differences across the models to draw the same basic conclusions as the Straight Talk commentary. In particular, the estimated effects on emergency department visits for children differs significantly across the models, with the largest effect from the Nurse-Family Partnership, and the estimated effects on new pregnancies and health care for injuries and ingestions are small and do not differ significantly across the four models.
Finally, we appreciate Straight Talk’s expressed interest in the data collected by MIHOPE and recommendations about additional analyses and information that would be of interest to the field. As readers will likely appreciate, it is impossible for any study to answer all questions of interest initially. We are hearing from a range of stakeholders as we prioritize our ongoing work with these data. In addition, we are making the MIHOPE data available to researchers through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. We hope this will encourage other researchers to investigate research questions beyond the scope of the original study.
Rejoinder by Arnold Ventures’ Evidence-Based Policy team:
We thank the study team for their response and understand that, in their design and reporting of the study, they were following guidance from the HHS advisory committee. However, our key concern remains: The study’s focus on the pooled (i.e., average) effects across the four models is a recipe for disappointing findings, because prior RCTs have found no meaningful effects on child or family outcomes for three of the four models. For the same reason (i.e., most social program models do not produce the hoped-for effects while a few succeed) almost every large RCT in the history of social policy that has measured the average effect of heterogeneous models/strategies has produced discouraging findings, as we have discussed in previous Straight Talk reports .
As noted above, HHS partly heeded the input from us and others on the home visiting study’s design in 2011, revising its evaluation plan so as to be able to measure both the pooled effects across the various models and the effects of each model.[xi] To realize the full potential of this RCT to inform program and policy decisions, HHS should report the treatment and control group outcomes for each of the four models, as well as the statistical significance of each model’s effects, in this and future study follow-ups. This key information is not currently provided in table 5.1 or elsewhere in the HHS report.[xii]
Finally, we do not agree with the HHS advisory committee’s desire “to avoid a horserace among [the] models.” A horserace is at the very heart of the evidence-based policy concept: Use rigorous evaluations to find out what works—and what does not work—to improve people’s lives, and then use this evidence to reallocate funds toward the effective approaches. That is the vision set out in then-OMB Director Orszag’s discussion of the HHS home visiting program’s tiered-evidence design; it is reflected in Congress’ requirement that the “majority of grant funds [be] used for evidence-based models”;[xiii] and it is manifest in HHS’s stated intent “to give significant weight to the strength of the available evidence of the model or models” in selecting home visiting grant awardees.
The HHS RCT is generating critical evidence about the model-specific effects—evidence that already hints at some potentially encouraging (albeit still early) results for some models. It would be a disservice to the vulnerable families that participate in the HHS program, and to the taxpayer, not to fully report these results and use them to focus funding on services that truly improve people’s lives.
[i] For example, the study had successful random assignment (as evidenced by highly similar treatment and control groups), low-to-moderate sample attrition rates, and valid analyses.
[ii] Although the differences in outcomes between the treatment and control groups are very small, a few are statistically significant, as shown in the table. This is because the study had a very large sample—over 4,000 families—and studies of this size can often detect effects that may be statistically significant but are so small as to be of little practical importance.
[iii] In 2011, we were a nonprofit organization—the Coalition for Evidence-Based Policy. We joined the Laura and John Arnold Foundation (now Arnold Ventures) as its Evidence-Based Policy team in 2015.
[iv] Table 5.1 on page 102.
[v] Ideally, HHS would also publish a table of baseline characteristics of treatment and control group members for each of the four models, to show whether random assignment was successful in creating two equivalent groups for each model, and to enable readers to see the demographic characteristics and risk levels of families served by each model.
[vi] Mothers with “low psychological resources” are those who scored low at program entry on measures of mental health, self-confidence, and intelligence.
[vii] SEC. 511[42 U.S.C. 711](g)(2)(B).
[viii] Charles Michalopoulos, Anne Duggan, Virginia Knox, Jill H. Filene, Helen Lee, Emily K. Snell, Sarah Crowne, Erika Lundquist, Phaedra S. Corso, Justin B. Ingels (2013). Revised Design for the Mother and Infant Home Visiting Program Evaluation. OPRE Report 2013-18. Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.
[ix] SEC. 511[42 U.S.C. 711](g) describes the requirements for the evaluation. SEC. 511[42 U.S.C. 711](d)(3)(A) describes the requirements “for an early childhood home visitation program conducted with a grant made under this section”.
[x] See the Home Visiting Evidence of Effectiveness (HomVEE) review’s webpage for more information: homvee.acf.hhs.gov
[xi] Congress gave HHS great flexibility in how to design the evaluation of MIECHV, charging the agency to simply conduct “an assessment of the effect of early childhood home visitation programs on child and parent outcomes” that would examine (among other things) “the extent to which the ability of programs to improve participant outcomes varies across programs and populations.” The decision on whether and how to report pooled versus model-specific outcomes is clearly within HHS’s discretion.
[xii] As we noted in our main Straight Talk discussion, table 5.1 of the HHS report shows the treatment-control differences for each model (i.e., treatment group outcome minus control group outcome). However, it is not possible for the reader to gauge whether these differences are of practical importance or statistically reliable without seeing the actual outcomes for each model’s treatment and control group—similar to what is shown in the table we provide above for the pooled sample.
[xiii] 42 U.S.C 711(d)(3)(A)(ii)