back
Get SIGNAL/NOISE in your inbox daily

AI experts launch unprecedented challenge for advanced artificial intelligence: Scientists are developing “Humanity’s Last Exam,” a comprehensive test designed to evaluate the capabilities of cutting-edge AI systems and those yet to come.

The initiative’s scope and purpose: The Center for AI Safety (CAIS) and Scale AI are collaborating to create the “hardest and broadest set of questions ever” to assess AI capabilities across various domains.

  • The test aims to push the boundaries of AI evaluation, going beyond traditional benchmarks that recent models have easily surpassed.
  • This project comes in response to rapid advancements in AI, such as OpenAI’s new o1 model, which has “destroyed the most popular reasoning benchmarks,” according to CAIS executive director Dan Hendrycks.

Evolution of AI testing: The current initiative builds upon previous efforts to assess AI capabilities, incorporating more complex and abstract reasoning tasks.

  • In 2021, Hendrycks co-authored papers proposing AI tests that compared model performance to undergraduate students.
  • While AI systems initially struggled with these tests, today’s models have “crushed” the 2021 benchmarks, necessitating more challenging evaluation methods.

Crowdsourcing expertise: The organizers are calling for submissions from experts across diverse fields to create a truly comprehensive and challenging exam.

  • Specialists in areas ranging from rocketry to philosophy are encouraged to contribute questions that would be difficult for non-experts to answer.
  • The submission deadline is set for November 1, with the potential for contributors to earn co-authorship on a related paper and prizes up to $5,000 sponsored by Scale AI.

Maintaining test integrity: To ensure the exam’s effectiveness, organizers are implementing measures to protect the test content and prevent AI systems from being unfairly advantaged.

  • The test criteria will be kept confidential and not released to the public.
  • This approach aims to prevent the answers from being incorporated into future AI training data, maintaining the test’s ability to challenge new systems.

Ethical considerations: While the test aims to be comprehensive, the organizers have set clear boundaries on the types of questions that will be included.

  • Notably, questions related to weapons have been explicitly excluded from the exam due to safety concerns about AI potentially acquiring such knowledge.

Implications for AI development and assessment: The creation of “Humanity’s Last Exam” reflects the rapid pace of AI advancement and the need for more sophisticated evaluation methods.

  • This initiative could provide valuable insights into the current capabilities and limitations of AI systems across various domains of human knowledge.
  • The results may inform future AI development strategies and help identify areas where human expertise still surpasses machine intelligence.

Looking ahead: As AI continues to evolve, the challenge of creating meaningful tests becomes increasingly complex.

  • The success of this exam could set a new standard for AI evaluation, potentially influencing how we measure and understand artificial intelligence capabilities in the future.
  • It also raises questions about the long-term implications of AI potentially surpassing human-level performance across a wide range of cognitive tasks.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...