Ara Guler / Magnum Photos

The Case for Optimism

Marc A. Levin

March 27, 2024

Randomized controlled trials have made a big difference.

Randomized controlled trials have made a big difference.

We abandon them at our peril.

In a provocative recent article, professor Megan Stevenson argues that causal academic studies, and especially randomized controlled trials (RCTs), have largely proved fruitless in identifying interventions that can be widely replicated to reduce crime. Is Stevenson a canary in the coal mine, warning us of real and present danger? Or has she missed the forest for the trees? The answer is a bit of both — but more of the latter.

The criminal justice policy field would be shortchanged if researchers, advocates and funders abandoned causal research. Nonetheless, the reaction to her piece could helpfully boost the use of implementation science in attempts to replicate initial findings that are statistically significant and fuel greater collaboration among researchers and practitioners to develop the most promising causal research designs and settings. Such collaboration can “sharpen the questions we ask and the hypotheses we test and better [enable] us to interpret the results.”

Causal research on “what works” has contributed to the dramatic though uneven arc of progress in criminal justice policy and practice over the last half century. Such research has also helped us identify interventions that don’t work or, worse still, are counterproductive.

As Stevenson notes, the modern growth of causal research in criminal justice has in some ways been a response to a 1974 article by the sociologist Robert Martinson. That piece represented a course correction to the permissiveness that dominated the 1960s, replacing it with skepticism that any correctional intervention, other than prison, could be effective.

Over the next half century, however, this “nothing works” lamentation was nudged aside by dramatic advances in research and practice. Self-esteem programs that told offenders they should feel good about themselves regardless of their actions were replaced with approaches backed by causal research. These approaches, such as cognitive behavioral therapy and motivational interviewing, provide tools that allow people to hold themselves accountable by reckoning with their past errors and meeting benchmarks on the path to self-improvement.

Likewise, community supervision gradually evolved from what was largely a cookie-cutter enterprise to one that uses graduated sanctions and incentives and is tailored to the risks and needs of the individual. Today, many probation and parole agencies embrace the concept that supervision officers should be coaches helping those on their caseload succeed, rather than umpires waiting for them to fail.

The lamentation that “nothing works” was nudged aside by dramatic advances in research and practice.

To be sure, the road connecting causal research and policy change is long and uneven. But it does exist. Indeed, a 2020 paper notes that plans from the roughly 36 states that have enacted justice reinvestment have drawn upon the body of empirical research on “what works.”

Stevenson’s essay relies almost entirely on a 2006 literature review of 122 studies to make the case that precious few RCTs result in statistically significant positive findings. But the Coalition for Evidence-Based Policy (CEBP) has in its newsletters from September 2022 to December 2023 cataloged dozens of much more recent RCTs showing positive results.

These include many interventions that were developed or first studied after 2006, thereby excluding them from Stevenson’s consideration. Also, unlike the 2006 survey, the more recent studies inventoried by the CEBP cover a range of life outcomes, with some, like evaluations of the Denver Supportive Housing Program, showing a statistically significant decline in arrests among participants.

Stevenson argues that the complexity of human nature means we should have modest expectations for programs that require major life changes by participants. While this wisdom makes sense in lots of situations, it does not apply to more straightforward interventions and statutory changes, such as those affecting when someone in prison is released.

For instance, in my tour earlier this year of a women’s prison in Brazil as part of an ARrow Center for Justice delegation, we met new mothers who are allowed to be with their babies in prison for the first six months after birth. Research suggests this bonding yields long-term developmental benefits for babies, but a study might also look at whether such new mothers, who likely feel a mission and perhaps receive newfound respect, are less likely to recidivate than otherwise similar women. This could provide reassurance to policymakers that prioritizing new mothers for early release would not imperil public safety.

Not only does Stevenson ignore the role that causal research could play in this type of context to persuade policymakers to modify sentencing and release policies, she also overlooks the value of such research in validating methodologies for more effectively carrying out the longstanding functions of the criminal justice system, from probation to policing.

For example, research has found that using evidence-based probation practices that involve tailoring the level of supervision to a person’s risk, needs and responsivity, and using graduated responses, result in lower rates of revocation to incarceration and higher rates of completion. Similarly, in policing, considerable evidence indicates de-escalation training reduces excessive force incidents. Though Stevenson claims simple interventions that produce outsized benefits are exceedingly rare, text reminders about court dates, especially those that contain practical information on getting to the location, provide a recent example of just that.

A Google search of “probation officer quality” produces virtually no mentions of this phrase.

While Stevenson acknowledges that replication failures are often due to a lack of fidelity to the original model, she does not call for paying more attention to implementation science to improve how programs are executed. In recent years, researchers have made progress in helping us understand how and why certain programs succeed. For example, Stevenson points to an RCT that led to the creation of former President Obama’s “My Brother’s Keeper” initiative. This 2017 study found that a counseling program that guided youths to “think slow” rather than impulsively led to lower rates of delinquency. A follow-up study found the effects were less pronounced among subsequent cohorts, which the authors attributed to diminished counselor quality as the program scaled up.

Consider that research has established that teacher quality is the most important factor in student outcomes, but a Google search of “probation officer quality” produces virtually no mentions of this phrase. This suggests the problem is not that RCTs cannot identify interventions that work and could be successfully replicated, but rather that many systems lack the staff and conditions needed to successfully implement model programs. This often takes the form of insufficient staff training, excessive caseloads and high turnover rates that disrupt implementation continuity.

Indeed, some interventions may only be replicable if the underlying setting meets a certain baseline. Thus, an in-prison, cognitive behavioral therapy rehabilitation program that is shown to reduce recidivism in one facility may not yield the same result in a facility where physical conditions are much worse, where lockdowns interfere with programming and where staff have poor morale and poor relationships with those who are incarcerated.

As such, some interventions, particularly in prisons but perhaps also for those on community supervision, may only produce statistically significant results if the preexisting aspects of the participant’s experience meet a certain baseline. This suggests pursuing causal research alongside efforts to make corrections environments more conducive to program delivery.

Additionally, there could be considerable value in more RCTs that compare more than two scenarios. For example, one group of incarcerated people might get both a cognitive behavioral therapy program and a physical wellness program, whereas others would either get one or none.

Likewise, a place-based policing program that includes both stepped-up police patrols and environmental interventions like street lighting could be compared with three other similar micro-neighborhoods that get one or the other or none. Instead, Stevenson focuses only on causal research that tests a single intervention versus no intervention, when in fact individuals who are at a certain baseline of mental and physical health might be more prepared to change their thinking patterns, and that there are multiple factors that could combine to lead a dangerous neighborhood to reach a “tipping point” and become a much safer one.

The alternatives Stevenson proposes to RCTs and the evidence-driven reform movement are lacking. She suggests reversing the Biblical parable of teaching a man to fish by instead focusing on direct assistance. While we don’t need a study to prove that we accomplished something by feeding someone who is hungry, recidivism reduction cannot be eaten from a plate.

Another option she presents is shifting the focus to radical change, referencing prison abolition as an example of a change so sweeping that an RCT could not measure its impact due to the myriad intervening factors. Tilting at such windmills represents a distraction from realistic, targeted policies such as “second look” measures that can be supported by causal research. Though not an RCT, since people in prison cannot be randomly assigned, an analysis of Illinois data from the Council on Criminal Justice revealed the diminishing public safety dividends of incapacitation at the tail end of long sentences.

The problem is not that RCTs cannot identify interventions that work, but rather that many systems lack the staff and conditions needed to successfully implement model programs.

But perhaps the most important omission that Stevenson makes is that she fails to recognize that causal research finding null or even negative effects of certain programs has in fact catalyzed change. Examples of programs that have been phased out in no small part due to overwhelmingly discouraging causal research seeping into the public consciousness include D.A.R.E. and Scared Straight. In the case of Scared Straight, a meta-analysis found it increased delinquency. Simply leveraging causal research to put this program out of its misery improved outcomes.

Another example of a counterproductive intervention identified through causal research is over-supervising low-risk individuals on community supervision. Seminal research by the late Ed Latessa and Christopher Lowenkamp, replicated in 2022, demonstrates that low-risk individuals sent to residential programs are more likely to re-offend than they would have been on basic community supervision.

Yet another example is research demonstrating that probation and parole terms of a decade or more don’t reduce re-offending, which has helped persuade states like Minnesota and Texas to cap probation terms, which previously could have been decades, or, in Minnesota, even a lifetime.

When we consider the sharp reduction in crime since the 1990s, coupled with the precipitous decline in incarceration rates since 2007, it’s untimely to abandon the data-driven research that helped usher in these advances. Causal research must exist alongside threshold moral commitments such as the inherent value of human dignity and the imperative of avoiding prison conditions that inflict physical pain. Nonetheless, the reality of limited resources requires choosing among many potential interventions, and, in contexts where it is possible and ethical to conduct them, RCTs are arguably the single most accurate way to ascertain how those decisions will affect key metrics.

As someone who works to translate causal research into policy, I know datasets don’t improve lives if they just sit on the shelf. Nonetheless, my fellow Texans might say those making recommendations without RCTs and other causal research to back them up are “all hat and no cattle.” Or, as Michael Bloomberg famously said, “In God we trust. Everyone else, bring data.”