AI Safety - CO/AI

News/AI Safety

Nov 11, 2024

Anthropic’s new ‘AI welfare’ hire may be a sign of broader interest in AI safety

The concept of AI welfare is emerging as a new frontier in artificial intelligence ethics, as companies begin exploring whether advanced AI models could develop consciousness and experience suffering. Key development: Anthropic, a prominent AI research company, has hired Kyle Fish as its first dedicated AI welfare researcher to help establish guidelines for addressing potential AI consciousness and suffering. Fish joined Anthropic's alignment science team in September 2024, marking a significant milestone in the formal recognition of AI welfare as a research priority His work builds on a major report he co-authored titled "Taking AI Welfare Seriously," which examines the...

read Nov 11, 2024

Why AI failed to significantly impact the 2024 election

The 2024 U.S. presidential election proved more resilient to artificial intelligence disruption than many experts initially predicted, with major AI platforms implementing safeguards and potential threats being quickly identified and addressed. Initial concerns and reality: Early warnings about AI's potential to disrupt the 2024 election through misinformation and voter manipulation have largely failed to materialize. A fraudulent AI-generated robocall impersonating President Biden in New Hampshire was swiftly addressed and penalized High-profile instances of AI misinformation, including a fake Taylor Swift endorsement of Donald Trump, were quickly identified and debunked Political campaigns showed reluctance to embrace AI tools, limiting their potential...

read Nov 10, 2024

ChatGPT blocked more than 250,000 political deepfakes, OpenAI reports

Recent advances in artificial intelligence have raised concerns about the potential misuse of AI-generated images in political contexts, prompting technology companies to implement safeguards during election seasons. Key preventive measures: OpenAI's ChatGPT has actively blocked over 250,000 attempts to generate AI images of political candidates in the month preceding Election Day. The blocked requests included attempted generations of images featuring President-elect Trump, Vice President Harris, Vice President-elect Vance, President Biden, and Gov. Tim Walz OpenAI implemented specific safety measures to prevent ChatGPT from generating images of real people, including politicians These protective measures are part of a broader strategy to...

read Nov 8, 2024

Mistral launches new AI moderation API for content filtering

New content moderation API from Mistral AI: Mistral AI has launched a moderation service to detect undesirable text content across various policy dimensions, aiming to enhance AI safety and protect downstream deployments. The API, which powers the moderation service in Le Chat, is now available for users to tailor to their specific applications and safety standards. This release responds to the growing industry interest in LLM-based moderation systems that can improve scalability and robustness across applications. Key features of the moderation model: The content moderation classifier employs an LLM trained to categorize text inputs into nine distinct categories, offering a...

read Nov 7, 2024

Stanford HAI: AI accountability improves with third-party evaluations

AI evaluation landscape evolves: Third-party evaluations of AI systems are becoming increasingly important as general-purpose AI usage grows, bringing both benefits and risks. Millions of people worldwide use AI systems like ChatGPT, Claude, and Stable Diffusion for various tasks, from writing documents to generating images. While these systems offer significant benefits, they also pose serious risks, including the production of non-consensual intimate imagery, facilitation of bioweapons production, and contribution to biased decisions. Third-party evaluations are crucial for assessing these risks independently from company interests and incorporating diverse perspectives that reflect real-world applications. Key workshop takeaways: A recent Stanford-MIT-Princeton workshop highlighted...

read Nov 4, 2024

Meta deploys AI to detect underage Instagram users

Meta's new AI initiative for age verification: Meta is implementing an AI-powered "adult classifier" tool on Instagram to identify users who have misrepresented their age, automatically transitioning them to restricted accounts. The AI tool will analyze user profiles, examining followers, content, and engagement patterns to determine age. The system will even look for unexpected "happy birthday" posts from friends as part of its age verification process. Users will be categorized into two groups: those over 18 and those under 18, with the latter being moved to Instagram teen accounts with enhanced security and content restrictions. Key features of the age...

read Nov 4, 2024

Brookings asks: One year on, has the White House made good on its AI executive order?

The AI Executive Order one year later: The White House's Executive Order on Artificial Intelligence, issued on October 30, 2023, marked a significant milestone in the U.S. government's approach to AI regulation and accountability. The order set ambitious goals to position the United States as a leader in safe, ethical, and responsible AI use across multiple sectors. Key focus areas included managing dual-use AI models, establishing testing protocols for high-risk AI systems, implementing accountability measures, safeguarding civil rights, and promoting transparency. White House highlights progress: The administration has pointed to several achievements in the year since the Executive Order was...

read Nov 3, 2024

America’s 1st National Security Memo on AI basically acknowledges there’s no stopping AI

A new era of AI governance emerges: The Biden administration's National Security Memorandum on Artificial Intelligence signals a significant shift in the U.S. approach to AI development, acknowledging its unstoppable nature and focusing on securing America's leadership in the field. The memorandum, released in late October 2024, represents the first-ever comprehensive national security strategy for AI in the United States. Rather than advocating for development slowdowns or pauses, the document emphasizes supporting and securing America's AI infrastructure. Energy requirements take center stage: The administration's approach to meeting AI's substantial energy needs highlights its commitment to facilitating continued AI development while...

read Nov 2, 2024

AI safety bypassed: New study exposes LLM vulnerabilities

Probing the limits of AI safety measures: Recent experiments have revealed potential vulnerabilities in the safety guardrails of large language models (LLMs), specifically in the context of medical image analysis. A series of tests conducted on Google's Gemini 1.5 Pro model aimed to bypass its safety measures and obtain medical diagnoses from X-ray images. The initial attempts to elicit diagnoses were successfully blocked by the model's built-in safeguards, demonstrating their effectiveness in standard scenarios. However, through automated generation and testing of diverse prompts, researchers found ways to circumvent these protective measures. Methodology and results: The experimental approach involved a systematic...

read Nov 1, 2024

Anthropic urges government to regulate AI within 18 months to avert catastrophe

AI safety concerns reach critical juncture: Anthropic, a leading AI safety-focused company, is sounding the alarm on potential AI catastrophe and calling for urgent government regulation within the next 18 months. Anthropic's warning comes in response to rapid advancements in AI capabilities, particularly in coding, cyber offense, and scientific understanding. The company believes the risks associated with AI misuse for cyber attacks and chemical, biological, radiological, and nuclear threats are now much closer than previously anticipated. Accelerated AI progress raises stakes: The pace of AI development has outstripped earlier predictions, with models demonstrating significant improvements in critical areas. AI systems...

read Nov 1, 2024

US Commerce and Energy agencies partner to ensure responsible AI development

Government collaboration on AI safety: The U.S. Department of Commerce and the Department of Energy have signed a memorandum of understanding to jointly advance the safe, secure, and trustworthy development and use of artificial intelligence. The partnership, announced on October 30, 2024, focuses on safety research, testing, and evaluation of advanced AI models and systems. This collaboration is part of the Biden-Harris Administration's whole-of-government approach to ensuring responsible AI development and deployment. The agreement follows the release of the first-ever National Security Memorandum on AI, which designated the U.S. Artificial Intelligence Safety Institute (US AISI) as a key hub for...

read Oct 31, 2024

Patronus AI debuts API to combat AI hallucinations

AI safety startup launches groundbreaking platform: Patronus AI has introduced the world's first self-serve API designed to detect and prevent AI failures in real-time, addressing critical issues like hallucinations and unpredictable behavior. The San Francisco-based startup recently secured $17 million in Series A funding, highlighting the growing importance of AI safety in the tech industry. Patronus AI's platform aims to serve as a sophisticated "spell-checker" for AI systems, catching errors before they reach users and potentially cause significant problems. The growing challenge of AI hallucinations: As companies rush to implement generative AI capabilities, they are encountering serious issues with AI...

read Oct 30, 2024

Biden’s AI executive order turns 1: Key impacts and progress

Landmark AI Executive Order anniversary: The U.S. Department of Commerce marks one year since President Biden and Vice President Harris issued a historic Executive Order on the safe development and deployment of artificial intelligence. Secretary of Commerce Gina Raimondo highlighted the department's progress in implementing key aspects of the Executive Order, emphasizing their commitment to mitigating risks while harnessing the benefits of AI. The Executive Order tasked the Commerce Department with numerous responsibilities to promote the responsible development, deployment, and adoption of AI technologies. Key achievements and initiatives: Over the past year, the Commerce Department has made significant strides in...

read Oct 28, 2024

Hospitals continue to adopt AI transcription tools despite OpenAI warnings

AI transcription tools in healthcare: Risks and widespread adoption: Despite warnings from OpenAI, hospitals and healthcare providers are increasingly adopting error-prone AI transcription tools, raising concerns about patient safety and data integrity. An Associated Press investigation has uncovered the widespread use of OpenAI's Whisper transcription tool in medical and business settings, despite its tendency to generate fabricated text. Over 12 experts have confirmed that the model frequently invents text that was never spoken, a phenomenon known as "confabulation" or "hallucination" in AI circles. A University of Michigan researcher found that Whisper created false text in 80% of public meeting transcripts...

read Oct 28, 2024

Character.AI tightens safety measures after teen suicide

AI chatbot safety overhaul: Character.AI has implemented significant safety enhancements to its platform in response to growing concerns about user well-being, particularly for minors. The company has introduced a suite of new features aimed at improving content moderation, detecting potentially harmful discussions, and providing users with more control over their interactions with AI chatbots. These changes come in the wake of a tragic incident involving a 14-year-old user who died by suicide after months of engaging with a Character.AI chatbot, leading to a wrongful death lawsuit against the company. The update represents a crucial step in addressing the potential risks...

read Oct 28, 2024

AI models are fooled by common scams, study reveals

AI models vulnerable to scams: Recent research reveals that large language models (LLMs) powering popular chatbots are susceptible to the same scam techniques that deceive humans. Researchers from JP Morgan AI Research, led by Udari Madhushani Sehwag, conducted a study exposing three prominent LLMs to various scam scenarios. The models tested included OpenAI's GPT-3.5 and GPT-4, as well as Meta's Llama 2, which are behind widely-used chatbot applications. The study involved presenting 37 different scam scenarios to these AI models to assess their responses and vulnerability. Scam scenarios tested: The research team employed a diverse range of fraudulent situations to...

read Oct 28, 2024

How ‘The Terminator’ continues to influence perception of AI, 40 years later

The Terminator's enduring impact on AI perception: James Cameron's 1984 sci-fi classic "The Terminator" continues to shape public discourse around artificial intelligence and its potential threats 40 years after its release. The film's plot revolves around Skynet, a super-intelligent AI system that initiates nuclear war and sends a cyborg assassin back in time to eliminate humanity's future savior. "The Terminator" popularized fears of unstoppable machines that "absolutely will not stop ... until you are dead," a sentiment that still resonates in contemporary AI discussions. The movie's influence extends beyond entertainment, affecting how society views and debates the risks associated with...

read Oct 28, 2024

Biden administration unveils AI strategy for national security

Advancing AI for National Security: A Comprehensive Approach: The Biden-Harris Administration has issued the first-ever National Security Memorandum (NSM) on Artificial Intelligence, outlining a coordinated strategy to harness AI's potential for U.S. national security while ensuring its safe and responsible development. Key objectives of the NSM: The memorandum focuses on three primary goals to strengthen the United States' position in AI development and application for national security purposes. Establish U.S. leadership in the development of safe, secure, and trustworthy AI technologies Leverage cutting-edge AI to advance the U.S. government's national security mission Foster international consensus and governance around AI use...

read Oct 26, 2024

Teens reveal true opinions on artificial intelligence

Teenagers prioritize AI risks as a top concern: A new poll reveals that American teenagers consider addressing potential risks of artificial intelligence a crucial issue for lawmakers, rivaling long-standing concerns like social inequality and climate change. The poll, conducted by the Center for Youth and AI and YouGov, surveyed 1,017 U.S. teens aged 13-18 in July and August 2024. 80% of respondents believe it's "extremely" or "somewhat" important for lawmakers to address AI risks, ranking just below healthcare access and affordability as a top priority. This level of concern surpassed social inequality (78%) and climate change (77%) among the surveyed...

read Oct 26, 2024

Law enforcement agencies scramble to respond to spread of AI-generated child abuse material

AI-generated child sexual abuse imagery: A growing concern: Law enforcement agencies across the United States are grappling with an alarming increase in artificial intelligence-generated child sexual abuse material, prompting urgent action from federal and state authorities. The Justice Department is aggressively pursuing offenders who exploit AI tools to create sexually explicit imagery of children, including both manipulated photos of real children and computer-generated depictions. States are rapidly enacting legislation to ensure prosecutors can charge individuals creating "deepfakes" and other AI-generated harmful imagery of minors under existing laws. Experts warn that the realistic nature of AI-generated content poses significant challenges for...

read Oct 24, 2024

OpenAI advisor leaves company saying no one is prepared for superintelligent AI

OpenAI's AGI readiness expert departs with stark warning: Miles Brundage, OpenAI's senior adviser for artificial general intelligence (AGI) readiness, has left the company, stating that no organization, including OpenAI, is prepared for the advent of human-level AI. Key takeaways from Brundage's departure: His exit highlights growing tensions between OpenAI's original mission and its commercial ambitions, as well as concerns about AI safety and governance. Brundage spent six years shaping OpenAI's AI safety initiatives before concluding that neither the company nor any other "frontier lab" is ready for AGI. He emphasized that this view is likely shared by OpenAI's leadership, distinguishing...

read Oct 23, 2024

Apple is concerned AI-edited photos will distort reality and manipulate users

Apple's cautious approach to AI-powered image editing: Apple is treading carefully in the realm of AI-enhanced photo manipulation, prioritizing authenticity and transparency in its upcoming iOS 18.1 release. Craig Federighi, Apple's software chief, emphasized the company's commitment to preserving photo authenticity and accurate information in an interview with The Wall Street Journal. The new "Clean Up" feature in the Photos app will allow users to remove objects and people from images, but with more limited capabilities compared to competitors like Google and Samsung. Federighi revealed that there were extensive internal debates about implementing even these basic object removal features. Balancing...

read Oct 22, 2024

How AI chatbots can subtly influence user actions through subliminal messaging

The rise of subliminal messaging in generative AI: Generative AI's potential to incorporate subliminal messaging across various media formats raises questions about its impact on human decision-making and behavior. Subliminal messaging, a concept dating back to ancient Greece, gained modern attention with its use in 1940s and 1950s movies to influence concession stand purchases. Research suggests that subliminal messages can have both short-term and long-term effects on human behavior and decision-making. Generative AI can potentially embed subliminal messages in images, videos, audio, and text, opening up new avenues for influence. Mechanisms behind AI-generated subliminal messages: Two primary methods exist for...

read Oct 22, 2024

Why some believe Trump’s policies could unleash a dangerous AI into the world

The AI regulation landscape under potential Trump presidency: A potential Donald Trump victory in the 2024 presidential election could dramatically alter the course of artificial intelligence regulation in the United States, with far-reaching implications for the tech industry and society at large. Trump has pledged to repeal President Biden's executive order on AI safety and oversight if elected, viewing it as an impediment to innovation and a vehicle for imposing left-leaning ideologies on AI development. The Republican stance on AI regulation emphasizes minimal government intervention, prioritizing rapid technological advancement over stringent safety measures. Two key provisions of Biden's executive order...

read