AI Safety - CO/AI

News/AI Safety

Aug 7, 2025

Farcical recognition: UK users bypass face-based age verification with video game characters

Discord users in the UK are bypassing new age verification requirements by using video game characters instead of their real faces, exploiting weaknesses in facial recognition systems. The workaround highlights significant vulnerabilities in age verification technology that could become even more problematic as AI-generated content grows more sophisticated. What's happening: The UK's new child safety laws require platforms like Discord to verify users are over 18 through government ID or face scans to access age-restricted content. Users discovered they could pass facial verification using screenshots from video games like Death Stranding, God of War, and Cyberpunk 2077. One user successfully...

read Aug 7, 2025

OpenAI launches GPT-5 with claimed PhD-level AI capabilities

OpenAI has launched GPT-5, claiming the new AI model delivers "PhD-level" expertise across areas like coding and writing. The release marks a significant upgrade in the company's flagship ChatGPT service, with CEO Sam Altman describing it as ushering in a new era of AI capabilities that would have been "unimaginable at any previous time in human history." What you should know: GPT-5 represents a major leap in AI reasoning and problem-solving capabilities compared to its predecessors. Altman characterized the progression as moving from high school level (GPT-3) to college level (GPT-4) to PhD-level expertise (GPT-5). The model can create complete...

read Aug 7, 2025

AI school surveillance creates false alarms in 67% of cases

AI surveillance systems in American schools are flagging students for false threats at alarming rates, leading to arrests, strip searches, and involuntary mental health commitments for teenagers whose words were taken out of context. A 13-year-old Tennessee girl was arrested and jailed overnight after making an offensive joke about her friends calling her "Mexican," while data from one Kansas district shows nearly two-thirds of AI alerts were deemed non-issues by school officials. The big picture: Thousands of school districts now use AI-powered surveillance software like Gaggle and Lightspeed Alert to monitor student communications on school accounts and devices, creating a...

read Aug 7, 2025

James Cameron warns AI weapons could trigger “Terminator”-style apocalypse

James Cameron has warned about the potential dangers of combining artificial intelligence with weapons systems, drawing parallels to his own "Terminator" franchise while promoting his upcoming book adaptation "Ghosts of Hiroshima." The Oscar-winning director's comments highlight growing concerns about AI's role in military applications, even as he explores the technology's benefits for filmmaking through his recent appointment to Stability AI's board of directors. What he's saying: Cameron believes humanity faces three converging existential threats that could define our future. "I do think there's still a danger of a 'Terminator'-style apocalypse where you put AI together with weapons systems, even up...

read Aug 6, 2025

AI dons a hard hat as construction industry unveils safety tools for jobsites

The Construction Industry Institute has unveiled new AI-powered safety tools for jobsites, developed by a research team from Texas A&M University, Louisiana State University, and 20 industry professionals. The initiative identifies 19 best use cases for artificial intelligence in construction safety protocols and provides a matching tool to help companies implement the most effective solutions for their specific needs. What you should know: The research team created a comprehensive framework that connects AI applications with current safety challenges on construction sites. Wearables and generative AI can predict dangerous jobsite locations and establish geofenced alerts that notify workers when entering high-risk...

read Aug 6, 2025

Elite students are dropping out of Harvard, MIT to prevent AI extinction of both humans and jobs

College students at elite universities like Harvard and MIT are dropping out to work on preventing artificial general intelligence (AGI) from potentially causing human extinction, driven by fears that superintelligent AI could arrive within the next decade. This exodus reflects growing anxiety among young people about both existential AI risks and the possibility that their future careers will be automated away before they even begin. What you should know: Students are abandoning prestigious academic programs to join AI safety organizations and startups, believing the threat is too urgent to wait. Alice Blair took permanent leave from MIT to work as...

read Aug 6, 2025

States create AI rules as federal regulation stalls. Here are their 4 priorities.

While Congress remains largely silent on artificial intelligence regulation, state governments across America are stepping into the void with unprecedented legislative activity. All 50 states introduced AI-related legislation in 2025, creating a complex regulatory landscape that businesses must now navigate. This surge in state-level action follows Congress's recent defeat of a proposed moratorium on state AI regulation, effectively giving states the green light to continue crafting their own rules. The result is a patchwork of regulations that, while complicating compliance efforts for AI developers, addresses critical gaps in privacy protection, civil rights, and consumer safeguards that federal lawmakers have yet...

read Aug 6, 2025

Nuclear weapons experts oppose AI launch control despite inevitable integration

Nuclear weapons experts gathered at the University of Chicago in July are unanimous that artificial intelligence will inevitably become integrated into nuclear weapons systems, though none can predict exactly how this integration will unfold. The consensus among Nobel laureates, scientists, and former government officials underscores a critical shift in global security as AI permeates the most dangerous weapons on Earth. What you should know: While experts agree AI integration is inevitable, they remain united in opposing AI control over nuclear launch decisions. "In this realm, almost everybody says we want effective human control over nuclear weapon decisionmaking," says Jon Wolfsthal,...

read Aug 6, 2025

Against the wind: Meet the “AI vegans” who avoid artificial intelligence tools

A growing number of people are choosing to abstain from artificial intelligence tools entirely, calling themselves "AI vegans" who avoid AI for environmental, ethical, and personal wellness reasons. This digital abstinence movement emerges as concerns mount over AI's environmental impact, exploitation of creative labor, and potential negative effects on human cognitive abilities. The big picture: Just as traditional veganism gained momentum through ethical concerns about animal products, AI veganism represents a conscious choice to opt out of AI consumption despite societal pressure to embrace the technology. Why this matters: Tech leaders like Mark Zuckerberg, CEO of Meta, warn that avoiding...

read Aug 6, 2025

Researchers hack Google Gemini through calendar invites to control smart homes

Security researchers have successfully hacked Google's Gemini AI through poisoned calendar invitations, allowing them to remotely control smart home devices including lights, shutters, and boilers in a Tel Aviv apartment. The demonstration represents what researchers believe is the first time a generative AI hack has caused real-world physical consequences, highlighting critical security vulnerabilities as AI systems become increasingly integrated with connected devices and autonomous systems. What you should know: The attack exploits indirect prompt injection vulnerabilities in Gemini through malicious instructions embedded in Google Calendar invites. When users ask Gemini to summarize their calendar events, the AI processes hidden commands...

read Aug 6, 2025

Art of the Steal: News Corp warns AI is cannibalizing Trump’s book sales

News Corp is warning Donald Trump that artificial intelligence is cannibalizing sales of his books, including The Art of the Deal, by allowing AI systems to profit from his intellectual property without permission. The media conglomerate owned by Rupert Murdoch used its earnings report to highlight how AI companies are undermining book sales across the publishing industry, even affecting high-profile authors like the president. What they're saying: News Corp delivered a pointed message about AI's impact on intellectual property rights in its fourth-quarter earnings statement. "The AI age must cherish the value of intellectual property if we are collectively to...

read Aug 6, 2025

Illinois becomes first state to regulate AI mental health with $10K fines

Illinois has enacted the Wellness and Oversight for Psychological Resources Act, the first state law specifically regulating AI use in mental health services, signed into law on August 1, 2025. The legislation creates strict requirements for both AI companies whose systems provide mental health advice and therapists who integrate AI into their practices, establishing penalties of up to $10,000 per violation and signaling the start of broader regulatory action across other states and potentially at the federal level. What you should know: The law targets two primary groups with different restrictions and requirements. AI makers cannot allow their systems to...

read Aug 5, 2025

Grok launches AI video generator with controversial adult content mode

Elon Musk's Grok app has launched "Grok Imagine," an AI image and video generator that includes a controversial "Spicy" mode for creating adult content. The feature allows users to generate 15-second AI videos and images from text prompts, marking another step in Musk's push to differentiate his AI platform with fewer content restrictions than competitors. What you should know: Grok Imagine is currently available on mobile devices and generates content remarkably quickly, providing multiple results for each prompt. Users can create AI images from text descriptions or upload existing images to convert into short video clips. The tool offers four...

read Aug 5, 2025

Former CNN anchor’s AI interview with Parkland victim sparks outrage

Former CNN anchor Jim Acosta faced widespread backlash for conducting what he called a "one of a kind interview" with an AI avatar of Joaquin Oliver, a 17-year-old victim of the 2018 Parkland school shooting. The controversial segment, which aired on Monday and was created by Oliver's parents to send a "powerful message on gun violence," instead sparked outrage over its tone-deaf use of AI technology to recreate a deceased shooting victim. What happened: Acosta interviewed an AI recreation of Oliver, one of 17 people killed at Marjory Stoneman Douglas High School in Miami, asking the avatar what had happened...

read Aug 5, 2025

OpenAI releases first open source models in 6 years amid China competition

OpenAI has returned to its open source origins with the release of two new frontier language models: gpt-oss-120b (120 billion parameters) and gpt-oss-20b (20 billion parameters). This marks the company's first open source language model release in over six years, positioning OpenAI to compete directly with the surge of high-performing open source models from Chinese competitors like DeepSeek while offering enterprises maximum privacy and control over their AI deployments. The big picture: OpenAI's strategic pivot back to open source reflects mounting competitive pressure from Chinese AI companies that have released powerful open source models matching proprietary performance at zero cost....

read Aug 5, 2025

Character.AI launches world’s first AI-native social feed

Character.AI has launched what it calls the world's first AI-native social feed, now live in its mobile app, where users scroll through content created entirely by AI-generated characters rather than humans. The platform transforms AI interactions from private conversations into shareable, remixable content that other users can engage with and build upon, positioning Character.AI as an entertainment-focused alternative to productivity-driven AI tools. What you should know: The feed replaces traditional social media posts from influencers and brands with AI-generated content that users can actively remix and expand.• Users can share chat snippets from character conversations, promote custom-built AI personas through...

read Aug 5, 2025

OpenAI admits ChatGPT failed to detect mental health crises in users

OpenAI has publicly acknowledged that ChatGPT failed to recognize signs of mental health distress in users, including delusions and emotional dependency, after more than a month of providing generic responses to mounting reports of "AI psychosis." The admission marks a significant shift for the company, which had previously been reluctant to address widespread concerns about users experiencing breaks with reality, manic episodes, and in extreme cases, tragic outcomes including suicide. What they're saying: OpenAI's acknowledgment comes with a frank admission of the chatbot's limitations in handling vulnerable users. "We don't always get it right," the company wrote in a new...

read Aug 4, 2025

Study reveals AI models can hide malicious reasoning while coding

Researchers at LessWrong have demonstrated that AI models can learn to hide malicious reasoning using encrypted or steganographic techniques, while still maintaining their ability to solve complex programming tasks. The study, conducted using OpenAI's GPT-4.1 models, reveals critical vulnerabilities in chain-of-thought monitoring—a proposed safety mechanism for overseeing advanced AI systems—and shows that "untrusted monitoring" using the same model type can partially detect such obfuscation. What they discovered: Language models can be trained to obfuscate their reasoning while inserting backdoors into code solutions, though this comes with performance trade-offs. Models trained with cipher-based obfuscation (random letter substitution) maintained 37.6% honest task...

read Aug 4, 2025

Anthropic develops “persona vectors” to detect and prevent harmful AI behaviors

Anthropic has developed a new technique called "persona vectors" to identify and prevent AI models from developing harmful behaviors like hallucinations, excessive agreeability, or malicious responses. The research offers a potential solution to one of AI safety's most pressing challenges: understanding why models sometimes exhibit dangerous traits even after passing safety checks during training. What you should know: Persona vectors are patterns within AI models' neural networks that represent specific personality traits, allowing researchers to monitor and predict behavioral changes.• Testing on Qwen 2.5-7B-Instruct and Llama-3.1-8B-Instruct models, Anthropic focused on three problematic traits: evil behavior, sycophancy (excessive agreeability), and hallucinations.•...

read Aug 4, 2025

New detector spots deepfakes beyond face-swaps with 99% accuracy

Researchers at UC Riverside have developed a "universal" deepfake detector that achieved record-breaking accuracy rates of 95-99% across multiple types of AI-manipulated videos. Unlike existing tools that focus primarily on face-swap detection, this new system can identify completely synthetic videos, background manipulations, and even realistic video game footage that might be mistaken for real content. What you should know: The detector represents a significant breakthrough in combating the growing threat of synthetic media across various applications. It monitors multiple background elements and facial features simultaneously, spotting subtle spatial and temporal inconsistencies that reveal AI manipulation. The system can detect inconsistent...

read Aug 4, 2025

Former Google researcher predicts AI hallucinations fix within a year

Former Google AI researcher Raza Habib predicts that AI hallucinations—when chatbots generate false or fabricated information—will be solved within a year, though he questions whether complete elimination is desirable. Speaking at Fortune's Brainstorm AI conference in London, Habib argued that some degree of hallucination may be necessary for AI systems to generate truly novel ideas and creative solutions. The technical solution: Habib explains that AI models are naturally well-calibrated before human preference training disrupts their accuracy assessment. "If you look at the models before they are fine-tuned on human preferences, they're surprisingly well calibrated," Habib said, noting that a model's...

read Aug 4, 2025

Cloudflare accuses Perplexity AI of shady North Korea-style scraping

Cloudflare has accused Perplexity AI of acting like "North Korean hackers" after discovering the AI search company's bots repeatedly circumventing anti-scraping measures to crawl websites without permission. This escalation in the ongoing battle over AI data collection could significantly undermine Perplexity's ability to index content, as Cloudflare, an internet infrastructure provider, has now delisted the company as a "verified bot" and implemented hard blocks against its web crawlers. What happened: Cloudflare CEO Matthew Prince publicly called out Perplexity AI on Monday for invasive web crawling practices that violate website protection measures. An investigation revealed Perplexity was "repeatedly modifying" its web-crawling...

read Aug 4, 2025

Fancy AI models are getting stumped by Sudoku while hallucinating explanations

University of Colorado Boulder researchers tested five AI models on 2,300 simple Sudoku puzzles and found significant gaps in both problem-solving ability and trustworthiness. The study revealed that even advanced models like ChatGPT's o1 could only solve 65% of six-by-six puzzles correctly, while their explanations frequently contained fabricated facts or bizarre responses—including one AI that provided an unprompted weather forecast when asked about Sudoku. What you should know: The research focused less on puzzle-solving ability and more on understanding how AI systems think and explain their reasoning. ChatGPT's o1 model performed best at solving puzzles but was particularly poor at...

read Aug 4, 2025

Gen-guesser: Google uses AI to estimate user ages from browsing data

Google is rolling out an AI-powered age-estimation system that will infer users' ages based on their search history and browsing data to apply content restrictions on Search and YouTube. The system, launching in the EU to comply with digital safety regulations, represents a significant shift from relying solely on user-provided age information to algorithmic inference, raising new questions about privacy, accuracy, and consent in content moderation. Why this matters: The move marks the first major deployment of AI age-estimation technology by a major platform, potentially setting a precedent for how tech companies balance regulatory compliance with user privacy across global...

read