When will AI be able to help solve its own alignment problems?

AI alignment? That’s a you problem, Artificial intelligence.

Artificial intelligence’s growing capabilities raise profound questions about when AI systems might assist with or even automate aspects of AI alignment research itself. While current frontier AI models demonstrate remarkable knowledge capabilities and outperform human experts on standardized exams, they still struggle with sustained, complex projects that require deep conceptual understanding. This paradox creates an opportunity to apply Metr’s law—the idea that AI systems will eventually automate tasks requiring t amount of human time—to predict when AI might meaningfully contribute to solving the alignment problem.

The capabilities gap: Current frontier AI systems demonstrate impressive knowledge and text prediction abilities while falling short of autonomous project execution.

Despite outperforming human experts on exams and knowledge-based tasks at a fraction of the cost, today’s most advanced AI agents cannot reliably handle even relatively basic computer-based work like remote executive assistance.
The most sophisticated AI systems possess considerable “expertise” but lack the capacity to independently conduct good research, which requires significant time investment even for purely theoretical work.

The alignment opportunity: Metr’s law provides a potential framework for predicting when AI could meaningfully contribute to alignment research.

The central question becomes: at what point will AI systems be able to “automatically do tasks that humans can do in time t” with sufficient capability to advance alignment research?
This framing helps distinguish between AI’s impressive pattern-matching abilities and the more complex requirements of conducting original research to solve alignment challenges.

Why this matters: The timeline for AI assistance in alignment research has significant implications for AI safety.

If alignment research remains exclusively human-driven for too long while capabilities rapidly advance, we may face scenarios where powerful systems emerge before adequate safety measures.
Conversely, if AI can meaningfully assist with alignment research relatively soon, it could help accelerate safety work to keep pace with capability development.

The critical question: The article frames a key consideration for the field through Metr’s law.

The central inquiry becomes determining the threshold time t at which AI can perform tasks that humans can complete in time t, where those tasks constitute meaningful alignment research.
This frames the debate around when AI might cross from being merely knowledgeable about alignment to being practically helpful in solving it.

When will AI be able to help solve its own alignment problems?

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development