×
Written by
Published on
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The large language models (LLMs) powering chatbots like ChatGPT are capable of impressive feats, but still routinely produce errors and can be made to behave in undesirable or harmful ways. Making these AI systems safe and robust to misuse is a critical challenge as they become more widely deployed.

Vulnerabilities rooted in fundamental properties of LLMs: Many of the problems with LLMs stem from how they work – by predicting the most likely next word based on statistical patterns in their training data:

  • Performance can fluctuate wildly depending on how common the output, task or input text is on the internet used for training, leading to puzzling failures on things humans find simple.
  • Determined users can exploit these properties to create “adversarial examples” – prompts carefully crafted to make models ignore their safety training and produce harmful outputs.

Alignment techniques aim to reduce undesirable behaviors: LLM developers use various approaches to bring the models’ behaviors more in line with human values:

  • Reinforcement learning from human feedback (RLHF) fine-tunes models by encouraging good responses and penalizing bad ones according to human preferences.
  • Additional “guardrail” systems can be added to block harmful outputs that bypass initial safety training.
  • However, making models both safe and useful is challenging, and safety measures are still routinely bypassed by determined attackers in a cat-and-mouse game.

Debate over scaling as a solution: There is disagreement about whether simply making LLMs larger and training them on more data will solve these issues:

  • Some argue that with enough scaling, failures will become so infrequent as to be a non-issue.
  • Others believe vulnerabilities are inherent to how LLMs work, and scaling could make models better at bypassing the very safety measures meant to constrain them.

Chaperone models may be needed for real-world use: Given that misuse seems impossible to entirely prevent, many believe LLMs should not be deployed without additional protective systems:

  • Combinations of rule-based algorithms and task-specific machine learning models could form a “sieve” to catch different failure modes.
  • While still not foolproof, such systems are harder to deceive than any single model alone.
  • An external system of verification and validation could be required to demonstrate LLMs’ safety before real-world use.

Broader context and implications: As impressive as recent advances in AI have been, models like ChatGPT still have major shortcomings when it comes to reliability and robustness. While techniques like RLHF show promise in aligning AI systems with human values, vulnerabilities rooted in the fundamental nature of language models mean that misuse remains an ever-present danger.

As these models become more ubiquitous, it’s critical that the public not be endangered by AI whose safety has been taken for granted. Just as drugs must demonstrate their safety and efficacy through clinical trials before deployment, companies developing and deploying AI systems may need to be held to similar standards through a rigorous system of external validation. While building perfectly safe AI may never be possible, the technologies exist to substantially reduce risks – but only if developers are required to use them. With the right combination of technical innovation and sensible governance, the worst outcomes are not inevitable.

AI is vulnerable to attack. Can it ever be used safely?

Recent News

How to Use Pixel Studio to Generate AI Images on the Google Pixel 9

Google's Pixel 9 introduces AI-powered image creation through the Pixel Studio app, enabling users to generate custom visuals from text prompts and edit existing photos.

AI’s Insatiable Need for Energy is Presenting Big Investment Opportunities

The rapid expansion of AI-driven data centers is straining US power infrastructure, requiring over $500 billion in investments and potentially consuming 12% of national electricity by 2030.

AI Tutors Double Student Learning in Harvard Study

Students using an AI tutor demonstrated twice the learning gains in half the time compared to traditional lectures, suggesting potential for more efficient and personalized education.