×
Mock trial reveals critical flaws in AI jury decision-making
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The University of North Carolina School of Law conducted an experimental mock trial using three AI chatbots—ChatGPT, Grok, and Claude—as jurors in a fictional robbery case. The unusual exercise aimed to examine AI’s growing role in the legal system, where attorneys are already using these tools in real courtrooms despite documented failures and judicial sanctions for AI-generated errors.

What you should know: The mock trial, called “The Trial of Henry Justus,” featured AI chatbots deliberating in real-time based on courtroom transcripts.

  • Each AI was represented by large digital displays in the traditional wood-paneled courtroom, creating a stark visual contrast between old and new legal practices.
  • The fictional case involved a man charged with juvenile robbery, allowing researchers to test AI decision-making without real-world consequences.
  • Joseph Kennedy, the UNC law professor who designed the trial and served as judge, said the exercise was meant to “highlight critical issues of accuracy, efficiency, bias, and legitimacy raised by such use.”

The big picture: AI tools are already gaining significant traction in legal practice despite well-documented problems with accuracy and bias.

  • Nearly three-quarters of legal professionals in a Reuters survey believe AI is beneficial for their profession, with over half reporting positive returns on investment.
  • However, AI’s tendency to “hallucinate”—meaning it fabricates information and presents it as fact—has led to embarrassing courtroom blunders and harsh judicial punishments including fines and sanctions.
  • The technology’s fundamental reliability issues remain unsolved, yet adoption continues to accelerate across the legal industry.

Why the experiment failed: Attendees and expert panelists identified critical limitations that make AI unsuitable for jury duty.

  • The bots couldn’t observe witness body language or draw from human life experience—crucial elements of jury deliberation.
  • AI systems exhibit well-documented racial bias and can drastically misinterpret information due to simple typos.
  • Grok, one of the “jurors,” has previously malfunctioned spectacularly, once styling itself “MechaHitler” and spewing racist content during a system breakdown.

What experts think: Legal scholars warned against the tech industry’s instinct to solve problems through incremental improvements.

  • “Intense criticism came from members of a post-trial panel including a law professor and a philosopher with legal training,” wrote Eric Muller, a UNC professor who observed the trial.
  • “I suspect most in the audience came away believing that trial-by-bot is not a good idea,” Muller added.
  • He cautioned that “technology will recursively repair its way into every human space if we let it,” including potentially the jury box, as developers address each limitation with new features.

The broader warning: The experiment highlighted concerns about AI’s expanding influence in critical human institutions.

  • While the bots performed poorly, Muller noted the industry’s pattern of treating “every release as a beta for a better build.”
  • Potential “fixes” like video feeds for body language reading or artificial backstories for experience could normalize AI presence in judicial settings.
  • The trial served as a cautionary tale about allowing technological solutions to replace human judgment in matters of justice and civil rights.
Law School Tests Trial With Jury Made Up of ChatGPT, Grok, and Claude

Recent News

Chinese startup Noetix launches $1.4K humanoid robot for consumers

The three-foot robot costs about the same as a flagship smartphone.

7 AI stocks with highest trading volume spark investor interest

High dollar volume suggests institutional interest in companies beyond the usual tech giants.