New math proof offers hint at how to create superintelligence that is aligned with humans

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

A mathematical proof suggests that human-equivalent AI systems, when properly arranged, could lead to aligned superintelligent systems that maintain human values and governance structures.

Core premise and foundation: The argument builds on a strengthened version of the Turing Test, which posits that for any human, there exists an AI that cannot be distinguished from that human by any combination of machines and humans, even with significant computing power.

The “Strong Form” Turing Test requires that AI behavior be statistically indistinguishable from human behavior across various mental and physical states
Current language models have already demonstrated significant capabilities in human-like interaction, though not yet at this theoretical level
The argument relies on computationalism – the view that the brain fundamentally processes information in ways that can be replicated

Key definitions of friendly AI: The paper presents two distinct definitions for aligned artificial intelligence systems.

Definition i: An AI is considered friendly if it produces identical outcomes to the current human governance system
Definition ii: An AI is friendly relative to a specific utility function if it achieves the same results as the best possible human government within realistic constraints

The alignment proof: The mathematical argument demonstrates how to construct friendly AI systems through systematic replacement of human decision-makers.

The proof suggests replacing humans one at a time with AI copies, starting from top leadership positions
If any replacement produces detectably different outcomes, this would violate the Strong Turing Test assumption
The process continues until all relevant human positions are filled with AI equivalents
The same logic applies to creating optimal teams for specific utility functions

Practical implications: The approach separates technical challenges from ethical and political considerations.

Technical focus shifts to creating accurate functional clones of humans rather than black-box superintelligence
Political and ethical questions become centered on organizing these human-equivalent AIs effectively
Existing knowledge about human organizational systems becomes directly applicable
Implementation wouldn’t require actual human replacement – just AI systems processing inputs and generating outputs

Critical considerations: Several potential limitations and counterarguments warrant attention.

The “best possible human team” might still be suboptimal for complex challenges
The detection mechanism in the proof might be unnecessarily complex
Current efforts to pause AI development could limit humanity to purely human capabilities

Future directions and risk mitigation: The proof suggests moving away from black box AI systems could enhance safety without sacrificing capability.

Focus should shift toward developing interpretable, human-like AI systems
Avoiding “illegible” superintelligence may be crucial for maintaining control
Enforcement mechanisms for safe AI development remain an open challenge

Looking ahead: While this theoretical framework offers a path toward safe AI development, significant work remains in translating these mathematical concepts into practical implementation strategies and governance structures that can ensure adherence to these principles.

Turing-Test-Passing AI implies Aligned AI

lesswrong

Menu

New math proof offers hint at how to create superintelligence that is aligned with humans

Recent News

OpenAI chairman reveals AI erodes his identity as a programmer

Student’s AI model accidentally reconstructs real 1834 London protests through adjacent historical data

AI cameras target Somerset, UK’s deadly A361 bypass after 6 deaths

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

New math proof offers hint at how to create superintelligence that is aligned with humans

Recent News

OpenAI chairman reveals AI erodes his identity as a programmer

Student’s AI model accidentally reconstructs real 1834 London protests through adjacent historical data

AI cameras target Somerset, UK’s deadly A361 bypass after 6 deaths

Join the revolution

CO/AI

Resources

Join the revolution