An officer from the New York Police department stands guard outside of a Tesla store window displaying a robot while another officer stands in the back of the frame investigating inside with
Mostafa Bassim / Anadolu / Getty Images

Police departments should harness the tech to improve accountability and training.

When a federal court found that the NYPD engaged in unconstitutional stop-and-frisk practices in 2013, it appointed a monitor to oversee reforms to these practices within the department and to ensure compliance with reforms that the ruling set in motion. These reforms include, for instance, ensuring that officers only conduct stops when they have articulable reasonable suspicion, that race is not improperly used as a justification or a basis for disparate treatment and that such interactions are properly documented. In the 13 years since, although the department has made progress, fully achieving these goals has remained elusive and the monitor’s repeated findings have underscored the department’s failures to reach full compliance on these critical issues. This slow and uneven progress is similar to other large departments under monitorships requiring complex sets of reforms. 

It’s no surprise. Large departments are being asked to demonstrate — and those responsible for oversight are being asked to verify — substantive progress in the behavior of thousands of officers across millions of interactions with the public. The tools at their disposal include interviews, ride-alongs, documentation reviews and statistical analyses of administrative records like stop reports, all of which provide a critical window into the problem. In the modern era where virtually all officers wear body-worn cameras (BWCs), random audits of BWC footage have now become a vital tool that offers the closest look at the reality on the ground — an actual glimpse into face-to-face interactions between officers and the public. Yet in the largest department in the country, where even a monitor making a herculean effort can manually review only a couple thousand videos out of the nearly five million recorded each year, a glimpse is all that anyone can hope for.

We don’t need to accept this as the best we can do. With the advent of new technologies, body-worn camera footage can be put to much better use for departments and the people they serve. We are computational social scientists who developed AI-based tools capable of reviewing body-worn camera footage with a high level of accuracy at an unprecedented scale. Since 2021, we have worked with NYPD’s independent monitor, now Mylan Denerstein, to apply these techniques to footage in order to evaluate street stops. Our report, recently submitted to the court, details the ongoing challenges the department faces when it comes to constitutional compliance. But more importantly for the future of the NYPD and policing in America writ large, it also demonstrates a promising path forward for using AI-powered tools to improve accountability, community trust and the effectiveness of reforms in police departments across the country.

Unlike many AI tools that have become controversial — from face recognition technology to drones to risk assessment tools — police body-worn cameras enjoy overwhelming public support. In fact, the cameras have a nearly 90% approval rating. Across the political spectrum, people see these cameras as essential tools for accountability and transparency. Here, we showcase how the footage from these cameras could be used to help address two particularly thorny problems in New York City: underdocumentation and the legality of consent searches.

Underdocumentation and constitutionality

When NYPD officers detain someone in an investigative encounter, that interaction graduates from an encounter to a “stop” where the person is not free to leave. This is an important legal distinction when it comes to the detainee’s rights, and one that the officer is required to document. Officers can only detain someone when they have individualized reasonable suspicion that that person has committed or will commit a crime. Because the context of a police-civilian engagement is so important, underreporting and mislabeling of stops is one of the key areas where constitutional compliance — and community trust — can erode. Incidents that feel like stops in every practical sense to the civilian, but that are never documented as such, are a blind spot for the department and the monitor alike; constitutional rights may be violated, and in an endless sea of everyday encounters, no one would know where to check.

Audits by the NYPD’s Monitor Team have shed light on the potential scale of the problem, with manual reviews finding underreporting rates of 27% to 41% in recent years — with specialized units that engage in proactive stops, frisks, and searches, such as the NYPD’s Neighborhood Safety Teams, underreporting rates are above 50%. But as with all manual reviews of body camera footage, these audits are necessarily random and sporadic. The monitor has no way of knowing which encounters to look at to find those most likely to have been mislabeled.

To address this issue, we analyzed a set of 2,858 encounters that the Monitor Team, along with retired court judges, had already audited. We used their expert judgments to train machine learning models to predict, for a given piece of footage, whether it is likely to be a stop in which someone is detained or merely an encounter in which they are legally free to leave. By parsing the language of officers and civilians in each video, these models were able to accurately distinguish between stops and mere encounters about 80% of the time. We were able to repeat this process using judges’ analyses of whether officers’ behavior during stops was constitutional — for instance, whether stops were conducted with reasonable suspicion, whether frisks and searches were conducted appropriately and so on — we trained our models, set them to work on a sample of known stops and came out with an accuracy rate of over 70%.

Of course, accuracy rates in the 70s and 80s will still produce plenty of false positives and negatives. But the purpose and potential of these models is not to replace human judgment on matters of constitutionality. What accuracy rates this high can do is dramatically narrow the pool of footage the department and Monitor needs to comb through by hand in order to find, assess and learn from likely episodes of underdocumentation.

That could be enormously helpful in two ways. First, it opens the potential for targeted audits. Rather than randomly grabbing a handful from the haystack and hoping to find a few needles, our models could quickly analyze millions of videos tagged as non-stop investigative encounters and point auditors directly to the specific encounters most likely to have been mislabeled. Our evidence suggests that review of even the top 1% of videos by model probability would likely uncover thousands of undocumented stops per month with very high auditing hit rates — a shortcut that could save departments and monitors significant time and resources.

Second, these models can provide a quantitative representation of encounters that has never existed before — in essence, a way of scoring language to measure elements like how “stop-like” an interaction seems and how constitutional an officer’s word choice feels. We found in the data that linguistic indicators for non-constitutionality include pointed questions like “why you running,” “what you got,” and “[got nothing/anything] on you,” as well as commands such as “stop” and “don’t reach.” These words themselves don’t constitute non-constitutional behavior, but measured in aggregate are predictive of non-constitutional behavior like stops conducted without reasonable suspicion. 

Metrics for scoring language in this way are enormously valuable for uncovering language patterns that departments can learn from and use as the basis for officer training. For example, our model quantified significant racial disparities bearing on Fourteenth Amendment concerns: The language officers use in non-stop interactions with Black and Hispanic civilians is about 5 to 11% closer to resembling language used in stops. Specifically, given only the language appearing in the footage, our trained models’ estimates of the probability that a given low-level interaction was a stop were on average 5-11% higher when the civilians in the interaction were Black or Hispanic. This suggests that, even in low-level police encounters, NYPD interactions with Black and Hispanic civilians take on a more “stop-like” tenor, raising questions about whether a reasonable person in such encounters would understand they were free to leave.

Our findings for the constitutionality of officers’ behavior during stops were perhaps more concerning. When we examined correctly documented stops — those most likely to be reviewed because they’ve been accurately labeled as more significant — we found little to no evidence of racial disparity in officers’ constitutional behavior. It was only when we examined stops that had been mislabeled as more casual encounters, the kind of videos that typically fall through the cracks of review, that we found large racial disparities: The model's estimated probability of non-compliance was 16% to 26% higher for stops of Black and Hispanic civilians. This finding further emphasizes the urgent need for departments seeking to root out racial disparities and 14th Amendment violations to improve documentation compliance. 

The second major category of constitutional compliance our models could be helpful in transforming involves civilians’ consent to being searched. The Fourth Amendment guarantees freedom from unreasonable searches and seizures. Given this, where officers rely on consent to justify a search, that consent must be voluntary. Whether a person’s consent to a search is voluntary under the Fourth Amendment can depend in significant part on what officers say — including whether they clearly communicate the nature of the request and whether their words indicate that the civilian has a meaningful opportunity to refuse. We used computational tools to analyze the language NYPD officers used to obtain consent for searches in 2023, representing 1,770 encounters and 3,695 associated videos.

Our goal was to measure the prevalence of explicit consent language and the clarity of officers’ requests. We began by asking a simple question: When an officer documents a request for consent to search, do the words “consent” or “search” appear in the transcript at any point for any video associated with the encounter? We were surprised to find just how rare these critical words were: “Search” was mentioned in only 46% of these encounters, and “consent” in only 13%. When we manually reviewed a sample of these instances, we found that less than half represented genuine requests to search — in many cases, the words “search” and “consent” were being used to reference past searches or proactive refusals of consent by civilians rather than as a legitimate means of initiating a search.

The NYPD Patrol Guide offers suggested phrasing for officers to use when seeking a consent search: “I can only search you if you consent. Do you understand? May I search you?” But our analysis found that documented consent search requests contained the phrases “consent,” “search,” and “you understand” in only 3% of interactions — and the exact phrasing suggested by the Patrol Guide never occurs. Taken together, our fundamental finding is that consent searches in which officers clearly and explicitly request consent are exceedingly rare.

If officers are not using explicit terms to request consent, what do they say? We identified two prominent alternatives: requests using forms of “can I check” appear in 37% of consent search interactions, and those using forms of “do you mind” in 17%. While “search” carries specific legal and constitutional meanings, it’s also a word that civilians are likely to ascribe some weight to — a softer word like “check” is more nebulous, and could very well minimize and obscure the nature of what the officer is seeking to do. 

By the same token, “do you mind” questions create ambiguity almost by definition: To “mind” is to ask that something not take place, so a civilian might need to say “yes” to decline. On the other hand, we find that in real interactions, people often also reply to such questions with “no” to signal that they want to reject the underlying request to search. This ambiguity could be entirely avoided with greater clarity in the phrasing of requests.

Just as we found in cases of mislabeling stops, we discovered measurable racial differences in how officers seek consent for searches. “Do you mind”-style questions appear in about 20% of interactions with Black civilians as compared with only 15% of interactions with civilians of other races. Moreover, we found that Black and Hispanic civilians heard 20% more commands in general throughout these consent search interactions, which may contribute to an atmosphere in which a civilian feels they have less agency and are therefore less free to decline a search.

These estimates are all likely conservative due to the potential for underreporting of consent search interactions — another issue which could be addressed by using predictive models to trawl the data at scale and unearth the likeliest mislabeled consent searches. Overall, we believe that our findings could be instrumental in helping the NYPD to pinpoint opportunities for reform, pursue those reforms more effectively and expend far fewer resources while doing so.

Merging technology and policy change

Predictive modeling of underdocumentation and compliance could have compounding benefits — a ripple effect for departments, monitors and the communities they serve. Targeted sampling could save valuable time for auditors (even as random audits remain a useful tool to establish baseline rates). Freeing up humans to evaluate the likeliest problem areas would allow them to learn more, faster. Language patterns revealed by the models could inform more targeted and effective officer training that is backed by real data. Those trainings could in turn be evaluated for their effectiveness by measuring whether troubling language patterns change. Even officers’ awareness that their mislabeling of stops or lax language initiating consent searches is more likely to be flagged for review could create a self-correcting effect that changes behaviors and contributes to better outcomes.

More broadly applied, AI-based tools offer the first opportunity to truly measure — quantifiably and at scale — which policy changes are actually being implemented on the ground, and ultimately which are most effective at improving constitutional policing, public trust and officer safety. The fact that these insights can be generated from already-existing body camera footage at a relatively low cost makes them a potentially game-changing asset for police departments — an unprecedented in-house capacity to track and tackle issues, measure geographic hotspots and trends over time and shed light on any key metric of interest.

The infrastructure and technology to support these kinds of analyses is already in place — the only barrier that remains is the will of local leaders. New York is in many ways uniquely positioned to lead the way for the nation on this front: Due to the How Many Stops Act, implemented starting in July 2024, the NYPD already collects more detailed information about encounters with the public than perhaps the majority of departments nationwide. As the largest department in the country, New York has a golden opportunity to set the standard for the use of AI for accountability and transparency.

At a time when police retirements are at record highs and recruitment is at record lows — when both officer wellness and public trust in law enforcement are issues of urgent concern — this new technology holds the potential to shift the dynamic. A clear, comprehensive, data-driven view into the realities of policing can be a launching pad for changes that make the profession safer, more responsive and more constitutionally sound, instilling greater trust in policing among the community in turn. 


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Vital City.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.