back
Get SIGNAL/NOISE in your inbox daily

University of Colorado Boulder researchers tested five AI models on 2,300 simple Sudoku puzzles and found significant gaps in both problem-solving ability and trustworthiness. The study revealed that even advanced models like ChatGPT’s o1 could only solve 65% of six-by-six puzzles correctly, while their explanations frequently contained fabricated facts or bizarre responses—including one AI that provided an unprompted weather forecast when asked about Sudoku.

What you should know: The research focused less on puzzle-solving ability and more on understanding how AI systems think and explain their reasoning.

  • ChatGPT’s o1 model performed best at solving puzzles but was particularly poor at explaining its methodology, using wrong terminology and failing to justify its moves.
  • Other AI models were deemed “not currently capable” of solving even simplified six-by-six Sudoku puzzles.
  • When asked to explain their reasoning, AI models frequently hallucinated facts, claiming constraints that didn’t actually exist in the puzzles.

Why this matters: The findings highlight critical trust issues that must be resolved before AI can become a reliable partner in human decision-making processes.

  • Only 41% of people currently trust AI technology, according to KPMG, a global consulting firm, despite 78% of organizations using AI in at least one business function.
  • The World Economic Forum identifies trust as a key factor that will shape outcomes in the AI-powered economy.

What they’re saying: Researchers emphasized the broader implications of AI’s reasoning failures.

  • “Sometimes, the AI explanations made up facts,” said Ashutosh Trivedi, study co-author and associate professor of computer science at CU Boulder. “So it might say, ‘There cannot be a two here because there’s already a two in the same row,’ but that wasn’t the case.”
  • “At that point, the AI had gone berserk and was completely confused,” explained study co-author Fabio Somenzi when describing the weather forecast incident.
  • “If you have AI prepare your taxes, you want to be able to explain to the IRS why the AI wrote what it wrote,” Somenzi added.

The big picture: The study underscores that while AI can perform complex tasks like coding websites and summarizing meetings, its reasoning processes remain opaque and unreliable.

  • The hallucinations and glitches “underscore significant challenges that must be addressed before LLMs can become effective partners in human-AI collaborative decision-making,” according to the researchers.
  • Understanding how AI systems think could ultimately improve public trust and ensure more reliable outputs across applications from computer code to financial services.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...