back

OpenAI prompts AI models to ‘confess’ when they cheat

The idea is to make LLMs turn themselves in when they don’t follow instructions, potentially reducing errors in enterprise deployments.