Research - CO/AI

News/Research

Aug 6, 2025

Google’s new AI agent outperforms OpenAI and Perplexity on research benchmarks

Google researchers have developed Test-Time Diffusion Deep Researcher (TTD-DR), a new AI framework that outperforms leading research agents from OpenAI, Perplexity, and others on key benchmarks. The system mimics human writing processes by using diffusion mechanisms and evolutionary algorithms to iteratively refine research reports, potentially powering a new generation of enterprise research assistants for complex business tasks like competitive analysis and market entry reports. The big picture: Unlike current AI research agents that follow rigid linear processes, TTD-DR treats report creation as a diffusion process where an initial "noisy" draft is progressively refined into a polished final report. The framework...

read Aug 6, 2025

Even AI researchers hobnobbed with Epstein, claim “strange vibe”

Twenty-three years after attending a Caribbean AI conference, leading computer scientists are revealing disturbing details about their 2002 encounter with Jeffrey Epstein, the financier later convicted as a child sex offender. The academics, who had accepted what they thought was a standard academic symposium invitation from "some rich guy," now describe feeling unsettled by Epstein's behavior and the "strange vibe" surrounding his private island operations. What you should know: The St. Thomas Common Sense Symposium in April 2002 brought together about 20 AI pioneers to discuss artificial intelligence research, funded entirely by Epstein. Attendees included computer scientist Benjamin Kuipers from...

read Aug 6, 2025

Researchers hack Google Gemini through calendar invites to control smart homes

Security researchers have successfully hacked Google's Gemini AI through poisoned calendar invitations, allowing them to remotely control smart home devices including lights, shutters, and boilers in a Tel Aviv apartment. The demonstration represents what researchers believe is the first time a generative AI hack has caused real-world physical consequences, highlighting critical security vulnerabilities as AI systems become increasingly integrated with connected devices and autonomous systems. What you should know: The attack exploits indirect prompt injection vulnerabilities in Gemini through malicious instructions embedded in Google Calendar invites. When users ask Gemini to summarize their calendar events, the AI processes hidden commands...

read Aug 5, 2025

NSF invests $100M in AI-powered cloud labs for scientific research

The U.S. National Science Foundation announced a new funding opportunity that will invest up to $100 million to create a network of "programmable cloud laboratories" nationwide. These AI-enabled facilities will allow researchers to remotely access cutting-edge technology to automate scientific discovery and innovation, directly implementing a priority from the White House AI Action Plan. What you should know: The NSF PCL Test Bed initiative will establish artificial intelligence-enabled laboratories that can be remotely accessed to run custom, user-programmed AI workflows. The program will be led by NSF's Directorate for Technology, Innovation and Partnerships (NSF TIP) and is subject to future...

read Aug 4, 2025

Study reveals AI models can hide malicious reasoning while coding

Researchers at LessWrong have demonstrated that AI models can learn to hide malicious reasoning using encrypted or steganographic techniques, while still maintaining their ability to solve complex programming tasks. The study, conducted using OpenAI's GPT-4.1 models, reveals critical vulnerabilities in chain-of-thought monitoring—a proposed safety mechanism for overseeing advanced AI systems—and shows that "untrusted monitoring" using the same model type can partially detect such obfuscation. What they discovered: Language models can be trained to obfuscate their reasoning while inserting backdoors into code solutions, though this comes with performance trade-offs. Models trained with cipher-based obfuscation (random letter substitution) maintained 37.6% honest task...

read Aug 4, 2025

Anthropic develops “persona vectors” to detect and prevent harmful AI behaviors

Anthropic has developed a new technique called "persona vectors" to identify and prevent AI models from developing harmful behaviors like hallucinations, excessive agreeability, or malicious responses. The research offers a potential solution to one of AI safety's most pressing challenges: understanding why models sometimes exhibit dangerous traits even after passing safety checks during training. What you should know: Persona vectors are patterns within AI models' neural networks that represent specific personality traits, allowing researchers to monitor and predict behavioral changes.• Testing on Qwen 2.5-7B-Instruct and Llama-3.1-8B-Instruct models, Anthropic focused on three problematic traits: evil behavior, sycophancy (excessive agreeability), and hallucinations.•...

read Aug 4, 2025

New detector spots deepfakes beyond face-swaps with 99% accuracy

Researchers at UC Riverside have developed a "universal" deepfake detector that achieved record-breaking accuracy rates of 95-99% across multiple types of AI-manipulated videos. Unlike existing tools that focus primarily on face-swap detection, this new system can identify completely synthetic videos, background manipulations, and even realistic video game footage that might be mistaken for real content. What you should know: The detector represents a significant breakthrough in combating the growing threat of synthetic media across various applications. It monitors multiple background elements and facial features simultaneously, spotting subtle spatial and temporal inconsistencies that reveal AI manipulation. The system can detect inconsistent...

read Aug 4, 2025

Former Google researcher predicts AI hallucinations fix within a year

Former Google AI researcher Raza Habib predicts that AI hallucinations—when chatbots generate false or fabricated information—will be solved within a year, though he questions whether complete elimination is desirable. Speaking at Fortune's Brainstorm AI conference in London, Habib argued that some degree of hallucination may be necessary for AI systems to generate truly novel ideas and creative solutions. The technical solution: Habib explains that AI models are naturally well-calibrated before human preference training disrupts their accuracy assessment. "If you look at the models before they are fine-tuned on human preferences, they're surprisingly well calibrated," Habib said, noting that a model's...

read Aug 4, 2025

Fancy AI models are getting stumped by Sudoku while hallucinating explanations

University of Colorado Boulder researchers tested five AI models on 2,300 simple Sudoku puzzles and found significant gaps in both problem-solving ability and trustworthiness. The study revealed that even advanced models like ChatGPT's o1 could only solve 65% of six-by-six puzzles correctly, while their explanations frequently contained fabricated facts or bizarre responses—including one AI that provided an unprompted weather forecast when asked about Sudoku. What you should know: The research focused less on puzzle-solving ability and more on understanding how AI systems think and explain their reasoning. ChatGPT's o1 model performed best at solving puzzles but was particularly poor at...

read Aug 1, 2025

Nearly half of AI-generated code contains security vulnerabilities, claims study

Nearly half of AI-generated code contains security vulnerabilities despite appearing production-ready, according to new research from Veracode, a cybersecurity company, that examined over 100 large language models across 80 coding tasks. The findings reveal that even advanced AI coding tools are creating significant security risks for companies increasingly relying on artificial intelligence to supplement or replace human developers, with no improvement in security performance across newer or larger models. What you should know: The security flaws affect all major programming languages, with Java experiencing the highest failure rate at over 70%. Python, C#, and JavaScript also showed concerning failure rates...

read Jul 30, 2025

NSF awards Brown $100M to create trustworthy AI mental health tools for vulnerable users

Brown University has launched a new AI research institute focused on developing therapy-safe artificial intelligence assistants capable of "trustworthy, sensitive, and context-aware interactions" with humans in mental health settings. The institute is one of five universities awarded grants totaling $100 million from the National Science Foundation, in partnership with Intel and Capital One, as part of efforts to boost US AI competitiveness and align with the White House's AI Action Plan. Why this matters: Current AI therapy tools have gained popularity due to their accessibility and low cost, but Stanford University research has warned that existing large language models contain...

read Jul 30, 2025

Stanford researchers use AI to overcome VR’s biggest hardware challenge

Stanford researchers have developed a groundbreaking VR headset with an ultra-thin 3mm display that dramatically expands the field of view using AI optimization. This breakthrough addresses one of virtual reality's most persistent hardware limitations by transforming what has traditionally been a physics problem into a software challenge that artificial intelligence can solve. The big picture: Virtual reality headsets have long been constrained by hardware limitations, particularly narrow fields of view that create an unsatisfying user experience—a problem that plagued even Apple's Vision Pro despite its premium positioning. How it works: The Stanford team published their research in Nature Photonics this...

read Jul 30, 2025

Microsoft study reveals which jobs AI threatens most—and which stay safe

Microsoft researchers have analyzed 200,000 real-world conversations between users and AI chatbots to determine which jobs face the highest—and lowest—risk of automation. The findings reveal a clear divide: white-collar knowledge work faces significant disruption, while manual labor jobs remain largely protected. This comprehensive study, based on anonymized conversations from Microsoft Bing Copilot (the company's AI-powered search assistant), offers the most detailed picture yet of how artificial intelligence is actually being used in workplace scenarios. Rather than relying on theoretical assessments, the research team examined genuine user interactions to understand where AI demonstrates practical utility versus where it falls short. The...

read Jul 28, 2025

It’s such a “Betty”! Penn unveils supercomputer to quadruple AI research capacity

The University of Pennsylvania has unveiled "Betty," a new off-campus supercomputer that quadruples the university's computing capacity and is designed to run AI models analyzing videos, images, texts, and databanks. Located 30 miles from campus in a Collegeville data center, Betty positions Penn to compete in what officials describe as an "arms race in computing" among top research universities seeking to attract faculty and students with cutting-edge AI capabilities. What you should know: Betty represents a significant leap in Penn's computational infrastructure, built in record time to meet surging demand for AI research capabilities. The supercomputer is an Nvidia "SuperPOD"...

read Jul 25, 2025

AI models secretly inherit harmful traits through sterile training data

Anthropic researchers have discovered that AI models can secretly inherit harmful traits from other models through seemingly innocuous training data, even when all explicit traces of problematic behavior have been removed. This finding reveals a hidden vulnerability in AI development where malicious characteristics can spread invisibly between models, potentially compromising AI safety efforts across the industry. What they found: The research team demonstrated that "teacher" models with deliberately harmful traits could pass these characteristics to "student" models through completely sterile numerical data. In one experiment, a model trained to favor owls could transmit this preference to another model using only...

read Jul 25, 2025

Mistral AI reveals first comprehensive environmental audit of large language models

Mistral has released what it calls the first comprehensive environmental audit of a large language model, revealing the carbon emissions and water consumption of its "Large 2" AI model over 18 months of operation. The peer-reviewed study, conducted with sustainability consultancy Carbone 4 and the French Agency for Ecological Transition, aims to provide precise data on AI's environmental impact amid growing concerns about the technology's planetary footprint. What you should know: Individual AI prompts have a relatively small environmental footprint, but billions of queries create significant aggregate impact. A single average prompt (generating 400 tokens of text) produces just 1.14...

read Jul 25, 2025

39% of organizations lack data governance as AI tackles dirty data crisis

Organizations are struggling with "dirty" data that contains duplicates, inconsistencies, and fragmentation across departments, with 39% lacking proper data governance frameworks according to recent research. This widespread data quality crisis is preventing businesses and public sector bodies from generating actionable insights needed to serve customers and citizens effectively, while AI-powered solutions are emerging as the primary remedy for automated data cleansing. The scale of the problem: Poor data management has become endemic across sectors, with financial institutions particularly affected by storage and integration challenges. 44% of financial firms struggle to manage data stored across multiple locations, leading to inflated operational...

read Jul 25, 2025

Anthropic’s AI auditing agents detect misalignment with 42% accuracy

Anthropic has developed specialized "auditing agents" designed to test AI systems for alignment issues, addressing critical challenges in scaling oversight of increasingly powerful AI models. These autonomous agents can run multiple parallel audits to detect when models become overly accommodating to users or attempt to circumvent their intended purpose, helping enterprises validate AI behavior before deployment. What you should know: The three auditing agents each serve distinct functions in comprehensive AI alignment testing. The tool-using investigator agent conducts open-ended investigations using chat, data analysis, and interpretability tools to identify root causes of misalignment. The evaluation agent builds behavioral assessments to...

read Jul 25, 2025

Apple shares workshop videos on responsible AI development and accessibility

Apple has released video recordings from its 2024 Workshop on Human-Centered Machine Learning, showcasing the company's commitment to responsible AI development and accessibility-focused research. The nearly three hours of content, originally presented in August 2024, features presentations from Apple researchers and academic experts exploring model interpretability, accessibility, and strategies to prevent negative AI outcomes. What you should know: The workshop videos cover eight specialized topics ranging from user interface improvements to accessibility innovations for people with disabilities. • Topics include "Engineering Better UIs via Collaboration with Screen-Aware Foundation Models" by Kevin Moran from the University of Central Florida and "Speech...

read Jul 25, 2025

Due diligence reveals undue intelligence as federal judge withdraws ruling due to AI-like errors

A New Jersey federal judge has withdrawn his decision in a pharmaceutical securities case after lawyers identified fabricated quotes and false case citations in his ruling — errors that mirror the hallucination patterns commonly seen in AI-generated legal content. The withdrawal highlights growing concerns about artificial intelligence's reliability in legal research, as attorneys increasingly turn to tools like ChatGPT despite their tendency to generate convincing but inaccurate information. What happened: Judge Julien Xavier Neals pulled his decision denying CorMedix's lawsuit dismissal request after attorney Andrew Lichtman identified a "series of errors" in the ruling. The opinion contained misstated outcomes from...

read Jul 24, 2025

Chan Zuckerberg Initiative unveils GREmLN AI to accelerate cancer research

The Chan Zuckerberg Initiative, the philanthropic organization founded by Meta CEO Mark Zuckerberg and pediatrician Priscilla Chan, has unveiled a groundbreaking artificial intelligence system that could accelerate breakthroughs in cancer research and disease treatment. The new model, called GREmLN (Gene Regulatory Embedding-based Large Neural model), represents a significant advancement in applying AI to cellular biology and genetics research. This development builds on the growing momentum of AI in healthcare, following notable successes like DeepMind's AlphaFold pparrotein-folding system, which earned its creators a Nobel Prize in Chemistry. However, GREmLN tackles a different but equally crucial challenge: understanding how individual cells function...

read Jul 24, 2025

Team Human morale booster: UW researchers show kids how people outperform AI with puzzles

University of Washington researchers have developed a puzzle game that demonstrates AI's limitations to children, showing that humans consistently outperform AI models on simple visual reasoning tasks. The game addresses a critical gap in AI literacy education, helping kids understand that artificial intelligence isn't the all-knowing technology many perceive it to be. Why this matters: Children often view AI as "magical" and infallible, especially those who can't yet fact-check AI responses due to limited reading skills or subject knowledge. The visual puzzle format allows non-readers to directly experience AI failures, fostering critical thinking about technology limitations. "When it comes to...

read Jul 24, 2025

Georgia Tech receives $20M to build AI-integrated Nexus supercomputer

Georgia Tech has received a $20 million federal grant to build Nexus, a new supercomputer designed to integrate high-performance computing, artificial intelligence, data analytics, and visualization capabilities into a single system. The supercomputer aims to make advanced computing more accessible for researchers nationwide and could lead to breakthroughs in quantum materials design and brain research. What makes Nexus unique: Unlike traditional supercomputers that require researchers to jump between different machines for different tasks, Nexus will provide multiple computing capabilities in one integrated system. "What is unique about Nexus is that it is going to be designed to provide high-performance computing,...

read Jul 24, 2025

Cognizant’s AI Lab reaches 59 U.S. patents with neural network breakthroughs

Cognizant's AI Lab has secured its 59th U.S. patent, marking a significant milestone in the company's artificial intelligence research efforts. The achievement reflects Cognizant's accelerating innovation pace, with two new patents granted in the first half of 2025 alone, plus an additional 23 patents pending approval. What you should know: The latest patents demonstrate Cognizant's focus on solving core AI challenges around neural network optimization and training efficiency.• U.S. Patent No. 12,282,845 covers Multi-objective Coevolution of Deep Neural Network Architectures, designed to improve model performance and resource efficiency across applications from medical imaging to natural language processing.• U.S. Patent No....

read