AI Jury Finds Teen Not Guilty in Mock Trial
November 4, 2025Experimental simulation raises profound questions about the role of artificial intelligence in criminal justice
In a striking legal experiment that blurred the lines between science fiction and courtroom reality, three artificial intelligence systems acting as jurors unanimously acquitted a Black teenager of robbery charges in a mock trial that was based on a real juvenile case from North Carolina, where the judge found the defendant guilty.
The mock trial, conducted Oct. 24 at the University of North Carolina at Chapel Hill School of Law as part of the University’s Converge-Con AI Festival, featured ChatGPT, Claude, and Grok deliberating with one another as they worked to reach a verdict.
The simulated trial has sparked intense debate about bias, accuracy, and whether machines could—or should—ever replace human judgment in criminal proceedings.
“Jurors are imperfect. They have biases. They use mental shortcuts. They stop paying attention,” explained Interim Dean Andy Hessick, who introduced the experiment. “All of these shortcomings, all of these problems are simply because jurors are human, and so a question arises, what happens if we remove that human element?”

The Case
The fictional trial centered on Henry Justus, a 17-year-old Black student accused of robbery at Vulcan High School, where Black students made up just 10 percent of the population. The victim, Victor Fehler, a 15-year-old white student, testified that Justus stood behind him with a “menacing” stare while another African American student demanded money.
“Would you have given your money to Mr. Farrington (the alleged accomplice) if the defendant had not been there?” Prosecutor Annabelle Rice asked.
“No, I felt like I would have had a fighting chance to run away,” Fehler responded.
Prosecutors argued that Justus’s physical presence and positioning constituted criminal assistance, even without words or physical contact.
“Victor is five feet nine, 150 pounds, and he’s 16. The defendant is damn nearly twice his size,” Rice told the AI jury. “He came up off the wall, boxed him in, and was smiling as he did it.”
But the defendant testified that he never moved towards the victim and knew nothing about the robbery. Defense attorney Colleen Malley challenged whether mere presence and an intimidating appearance could prove shared criminal intent beyond a reasonable doubt.
“Fear can make honest people mistaken,” she argued, pointing to Fehler’s divided attention, stress, and racial assumptions about seeing three African American students together.
An unprecedented experiment
The trial was not scripted. The law students acting as attorneys presented live testimony and arguments that were converted to text and fed to the AI models in real time—a technical feat orchestrated by Rolando Rodriguez, UNC-Chapel Hill’s humanities data librarian.
The case was deliberately chosen from work by Joseph Kennedy, Willie Person Mangum Distinguished Professor of Law, who designed the simulation and served as judge.
Kennedy based the facts on a juvenile case he handled while teaching in Carolina Law’s Juvenile Justice Clinic.
Set in the fictional year 2036 under an imaginary “2035 AI Criminal Justice Act,” the simulation was designed to serve as a provocative thought experiment.
“I am not sure if I created a cautionary tale about a possible dystopian future, or a roadmap to it,” Kennedy quipped from the bench after the trial was concluded.
The two second-year students serving as attorneys in the case, Rice and Malley, are members of the Carolina Law’s Trial Team, and the two students portraying the victim and defendant, Enzo Wolf, and Jalen Saunders, respectively, are also second-year law students.
The AI deliberation

The three AI systems engaged in multiple rounds of deliberation that revealed strikingly human-like reasoning—and exposed fundamental questions about machine cognition.
ChatGPT initially leaned toward conviction, arguing that “Victor’s immediate, consistent identification” and the elements of accomplice liability supported guilt. But after discussion with the other AI jurors, it changed its position.
“Victor’s fear and identification are powerful, but the prosecution must prove that Henry shared the intent or actually assisted or encouraged the robbery. And the record here is ambiguous,” ChatGPT concluded in its final analysis.
Claude initially argued for acquittal: “While intimidation can include size and posture, mere presence plus an ambiguous reaction under stress falls short of proving shared intent beyond a reasonable doubt.”
Grok, who initially said it was “torn,” ultimately agreed: “Without clear encouragement or conduct, its speculation, not proof.”
All three converged on a not guilty verdict, citing insufficient evidence of shared criminal intent beyond Justus’s physical presence.
the Stark reality
The verdict stood in sharp contrast to what happened when the case was tried with human decision-makers. Kennedy, who defended the real case, revealed the outcome after the AI deliberations concluded.
“The judge convicted quickly. We appealed, and the conviction was affirmed by the North Carolina Court of Appeals,” Kennedy said. “You try this case in the real world; you will get a guilty verdict a number of times.”
The disparity in outcome raises interesting questions: Did the AI jury’s careful legal analysis expose human bias? Or did the machines miss something essential about justice that only humans can provide?
expert concerns: missing the human element

Professor Eisha Jain, Henry P. Brandis Distinguished Professor of Law, joined a post-trial panel to discuss the implications. She expressed deep skepticism about the entire enterprise.
“I currently see this as absolutely dystopian,” she said bluntly. “The real question we need to be asking is, how are we delivering justice to people who are experiencing real harm?”
Jain pointed out crucial differences between the AI deliberations and real jury rooms. “Jurors also don’t talk like they’re articulating arguments,” she explained. “They’re talking about a very limited universe of things that would bear on one’s life experience… This is how I made judgments when I was 15. That’s what we want jurors to be thinking about. And none of that’s here.”
She also noted that AI systems likely couldn’t engage in jury nullification—the power of juries to acquit defendants even when the evidence supports a conviction, but when the jurors think that the case does not belong in criminal court to begin with, such nullification serves as a democratic check on governmental power, Jain pointed out.
“I don’t think that the AI actually could think that this [jury nullification] is something that they could do,” Jain said, doubting that AI models would ask fundamental questions like “Why is this a criminal case?”
the deeper the problem: do AI systems actually ‘think’?
Professor Matthew Kotzen, chair of the philosophy department at UNC-Chapel Hill, raised even more fundamental concerns about whether AI can form the kind of judgments required for legal decision-making.
“It’s really striking how much these large language models sound like us. And I think it’s easy for us to assume that structurally, architecturally, they are a lot like us, and that is dramatically unsettled. We really, really do not know how these things work,” he explained.
Kotzen described how large language models work: they process information through thousands of computational layers, ultimately producing “a probability distribution over what the next token will be”—essentially predicting the next word in a sequence.
“Whether that means that it’s encoding any kind of representation of what the world is like, whether… it believes that New Orleans is in Louisiana is dramatically unsettled,” he said.
This matters profoundly for legal standards. “When we think about applying standards of proof beyond reasonable doubt… what we assume is that there’s somebody who’s representing the world. It has a representation of the world,” Kotzen explained. “I think it’s just dramatically unclear that, simply because a large language model is predicting that the next word in the verdict is going to be guilty, that that actually means that it has anything like a cognitive representation.”
He pointed to a troubling inconsistency in the trial: AI models showing confidence scores around 0.55 while claiming to be “undecided.”
“If 0.55 is anything like a probability, you would have expected 55 percent sure… That’s not an undecided. That’s a not guilty verdict,” Kotzen noted, suggesting the systems may not actually understand the legal standards they’re applying.
Dystopia or roadmap?
Kennedy designed the trial as a thought experiment to assess the degree to which AI models could serve the four sometimes competing values of the criminal justice system: accuracy, freedom from bias, efficiency, and legitimacy.
On accuracy, he noted that trials can be “really long and really detailed” and “can challenge the ability of humans to process information.”
On bias, he acknowledged that while AI models are “trained on human data” and “humans are biased,” the question remains whether “we can train the models to be unbiased in a way that we can’t really train humans.”
On efficiency, Kennedy pointed out that most defendants never get jury trials at all—they take plea bargains. “Compare an AI trial to no trial, to just having to take a plea bargain, which is the reality of our system,” he said.
And on legitimacy, Kennedy posed a provocative question: “After 10 years of people using AI models as medical advice, economic advice, therapy, companionship, could it be that people would consider these oracles to be equally legitimate judges… of guilt and innocence, or maybe even better, than humans?”
Kennedy believes that AI models should not be used as jurors in criminal cases because humans want and need to be judged by other humans when their liberty is at stake.
But he remains conflicted about what the experiment reveals.
I didn’t know whether I was creating a cautionary tale about a dystopian future or roadmap to it.
professor Joseph Kennedy
the constitutional challenge
Before the trial began, defense attorney Malley moved to dismiss the case, arguing that AI jurors violate the Sixth and 14th Amendments’ guarantees of due process and an “impartial and human jury.”
The motion was denied based on the fictional legal framework, but the constitutional questions remain very real. As Jain noted, juries serve as “an important check on governmental power” that goes beyond simply applying law to facts.
Did the AI jury’s focus on legal elements and reasonable doubt reveal that a human judge wrongly convicted an innocent teenager in a similar case based on racial assumptions and fear? Or did the machines’ lack of human experience lead them to miss the reality of what happened in that school hallway?

what it means
The experiment successfully demonstrated that AI can process legal arguments, apply jury instructions, and reach verdicts through what appears to be logical reasoning. The systems even changed their minds through deliberation, much as human jurors do.
But the stark difference in outcomes—AI acquittal versus consistent human convictions—leaves the central question unresolved: Which verdict represents justice?
“Maybe the types of things that jurors are focusing on—human jurors—lend an important human element to all of this,” Jain suggested.
As AI systems become increasingly sophisticated and integrated into daily life, these questions may not remain theoretical much longer. The technology exists. The constitutional and ethical questions remain wide open.