back
Get SIGNAL/NOISE in your inbox daily
We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to examine and understand.

Language models like the ones behind ChatGPT have complex, sometimes surprising internal structures, and we don’t yet fully understand how they work.

This approach is an early step toward closing that gap, and a part of a broader set of efforts across OpenAI to make our systems more interpretable—developing methods that help us understand why a model produced a given output. In some cases that means looking at the model’s step-by-step reasoning, and in others it means trying to reverse-engineer the small circuits inside the network.

There’s still a long path to fully understanding the complex behaviors of our most capable models.

https://lnkd.in/gqWJyw_b