back
Get SIGNAL/NOISE in your inbox daily
Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language models.
For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves “sampling”, a process that converts the language model’s output into a probability distribution and probabilistically selects a token.
What might be more surprising is that even when we adjust the temperature down to 0This means that the LLM always chooses the highest probability token, which is called greedy sampling. (thus making the sampling theoretically deterministic), LLM APIs are still not deterministic in practice (see past discussions here, here, or here). Even when running inference on your own hardware with an OSS inference library like vLLM or SGLang, sampling still isn’t deterministic (see here or here).
For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves “sampling”, a process that converts the language model’s output into a probability distribution and probabilistically selects a token.
What might be more surprising is that even when we adjust the temperature down to 0This means that the LLM always chooses the highest probability token, which is called greedy sampling. (thus making the sampling theoretically deterministic), LLM APIs are still not deterministic in practice (see past discussions here, here, or here). Even when running inference on your own hardware with an OSS inference library like vLLM or SGLang, sampling still isn’t deterministic (see here or here).
Recent Stories
Jan 16, 2026
DataMesh launches Robotics platform for industrial embodied AI
The new solution uses executable digital twins to train and evaluate robots with dynamic processes, safety rules and task-based rewards.
Jan 16, 2026We’ve Built 12+ Vibe Coded Apps Used 800,000+ Times. I Love It. But I Still Have To Maintain Them Every Single Day.
The ‘prpsumer’ vibe coding revolution is real. I’m a mass convert. We’ve built 12+ AI-powered apps on SaaStr.ai, and the results have been staggering: 800,000+ total uses across our AI …
Jan 16, 2026Researchers Just Found Something That Could Shake the AI Industry to Its Core
Researchers found compelling evidence that AI models are actually copying copyrighted data, not "learning" from it.