The Fund supports networks of state health policy decision makers to help identify, inspire, and inform policy leaders.
The Milbank Memorial Fund supports two state leadership programs for legislative and executive branch state government officials committed to improving population health.
The Fund identifies and shares policy ideas and analysis to advance state health leadership, strong primary care, and sustainable health care costs.
Keep up with news and updates from the Milbank Memorial Fund. And read the latest blogs from our thought leaders, including Fund President Christopher F. Koller.
The Fund publishes The Milbank Quarterly, as well as reports, issues briefs, and case studies on topics important to health policy leaders.
The Milbank Memorial Fund is is a foundation that works to improve population health and health equity.
May 1, 2024
Quarterly Opinion
Elizabeth A. Stuart
Joshua M. Sharfstein
Feb 21, 2024
Dec 12, 2023
Nov 13, 2023
Back to The Milbank Quarterly Opinion
There is growing recognition that the caustic scientific debates about the impact of masks, social distancing, and remote schooling during the COVID-19 pandemic are less about the specific findings of individual studies and more about which kinds of studies and arguments should take precedence over others. The fundamental question is causality, the determination that an intervention leads to changes in an outcome. There are many types of evidence, from logic and biological data to observational studies and experimental trials. A better understanding of the strengths and weaknesses of different forms of evidence can reframe contentious discussions and ultimately lead to better decisions.
Today, the public debate over evidence can seem formulaic. In one corner is the archetypal true believer in the randomized controlled trial, who trumpets the fact that randomization eliminates many forms of bias and yet who does not seem to appreciate that, for many questions of policy, randomization is not desirable, ethical, or possible. In the other corner is the archetypal data scavenger, who works to integrate logic, biology, and observational data into a more comprehensive picture, and who seemingly has never met an uninformative study.
Both the true believer and data scavenger, however, often struggle to provide reliable answers to critical, urgent questions for policy. Neither fully appreciates the importance of well-designed, observational studies that can be rapidly deployed in the real world. Such studies are critical to developing meaningful answers to such questions as whether masks, restaurant closures, and remote schooling “worked.”
The widespread use of randomized controlled trials in medicine gives such studies a head start on primacy in policy debates. But the further away from a clinical intervention a question goes, the less effective this design becomes. That’s because unlike a medication, which is often expected to have the same biological effect for different people or in different settings, a policy can be implemented differently and be received differently in different environments. The same mask-wearing program can be received well in one society and lead to civil unrest in another. Similarly, a recommendation to stay at home is more feasible—and thus more easily implemented—for some populations than for others.
Outside the pandemic context, effective programs such as high impact tutoring for students can have variable effects depending on the implementation, tutoring quality, and fit with the rest of a student’s activities. And when the effects of an intervention do vary, even well-conducted randomized trials may not be very informative about the effects in individual locations.
Recognizing the limitations of such “gold standard” studies does not mean that anything goes. One weakness of the “all of the above” approach to evidence is that there are often conflicting signals in the broad pool of data, leaving plenty of room for picking and choosing. It is not uncommon for people to sample the wide world of evidence to reach very different conclusions—an outcome increasingly likely as science becomes more politicized. Constructing arguments with poor quality evidence is one reason that hydroxychloroquine became popular in the early days of the pandemic, even leading the Food and Drug Administration to provide a (later revoked) emergency use authorization for the medication.
Another danger in data scavenging is that some of the widely used and published study designs are quite weak. Pre-post studies, in particular, may be the easiest to conduct, but they may also mislead, as many contemporaneous factors can influence the results. Similarly, simply comparing outcomes in states with and without a particular policy of interest (e.g., stay at home orders, vaccine mandates, and remote schooling) does not account either for the many other ways in which those states likely differ, or for the different ways that those policies may have been implemented.
Effective causal inference requires more than what the archetypal true believer and the data scavenger have to offer. There are non-randomized, controlled study designs that should be considered of higher quality than many other observational studies. In fact, for pressing questions of policy, these designs may even be preferable to randomized controlled studies.
One valuable approach is to compare changes over time—before and after the policy went into effect—in places with and without the policy. These are known as “difference-in-differences” designs. Similarly, to understand whether interventions that seem promising in randomized controlled trials will continue to be effective in real-world use, it is helpful to take advantage of natural occurring variation in implementation. Thoughtful and thorough comparison of test scores in places with different levels of in-person schooling during the pandemic, combined with large-scale data and advanced statistical methods to adjust for a large set of factors, help to disentangle relationships between in-person schooling and learning loss, and how it varies across places and groups.
A key aspect of this type of causality assessment is in-depth knowledge of what is happening “on the ground.” During the pandemic, county or school district policies around in-person schooling did not always correspond to an individual family’s experiences. In state policy evaluations, simply using whether a policy is “on the books” does not account for whether it is being implemented. It is crucial that researchers do not just blindly analyze data without knowledge of where it comes from, what it means, and how it may or may not be accurate for the questions at hand.
Finding the sweet spot between the true believer and the data scavenger is crucial for evaluating causality in complex real-world situations. Unfortunately, doing so is especially difficult in the heat of the moment when agendas, arguments, and egos collide. Three steps can help to lower the temperature while also raising the level of understanding.
First, journalists, policymakers, and the general public can receive additional training on what evidence to trust. Schools of public health can create opportunities for free courses and lectures that explain the basics of causality assessment as well as common pitfalls.
Second, trustworthy data intermediaries can summarize studies and their contributions to causality assessment. One example is the Novel Coronavirus Research Compendium, which provided timely reviews of emerging evidence during the pandemic from a multidisciplinary set of experts, including physicians, epidemiologists, and statisticians.
Third, the fields of epidemiology and biostatistics can pay more attention to urgent questions in causality assessment. Evidence specialists can master how to design non-randomized studies that address possible biases—and thereby generate confidence in the results.
There will rarely be one study or study type that provides a simple answer regarding causality; instead, policymakers will need to make use of a causal crossword created by multiple sources of evidence. Put another way, it may not be possible to resolve all debates over evidence, but with greater understanding of the strengths and weaknesses of different types of studies, better debates can lead to better decisions.
Joshua M. Sharfstein is associate dean for public health practice and training at the Johns Hopkins Bloomberg School of Public Health. He served as secretary of the Maryland Department of Health and Mental Hygiene from 2011 to 2014, as principal deputy commissioner of the US Food and Drug Administration from 2009 to 2011, and as the commissioner of health in Baltimore, Maryland, from December 2005 to March 2009. From July 2001 to December 2005, Sharfstein served on the minority staff of the Committee on Government Reform of the US House of Representatives, working for Congressman Henry A. Waxman. He serves on the Board on Population Health and Public Health Practice of the Institute of Medicine and the editorial board of JAMA. He is a 1991 graduate of Harvard College, a 1996 graduate of Harvard Medical School, a 1999 graduate of the combined residency program in pediatrics at Boston Medical Center and Boston Children’s Hospital, and a 2001 graduate of the fellowship program in general pediatrics at the Boston University School of Medicine.