LLMs don’t give the same answer to the same input. Until now. Like a calculator that sometimes says 2+2 is five, depending on when you hit the keys, LLMs have long behaved in ways we came to accept as unpredictable. Now, Mira Murati’s team at Thinking Machines Lab has fixed it. The issue was not “randomness” or “creativity,” but noise in the infrastructure. Tiny nondeterministic effects inside GPU kernels ripple forward and change completions even when the input is the same. Their fix, naturally explained in jargon, was to design batch-invariant kernels for matmul, attention, and RMSNorm, and the bottom line is clear: same input equals same output, every time. Why it matters: - Reliability in high-stakes fields: health and finance cannot accept answers that drift with server load. - Operational savings: deterministic outputs mean caching works, cutting GPU burn. - Open source transparency: the fix is published for anyone to use and build on. This does not solve the correctness of answers, since wrong outputs can still be wrong consistently. But it clears away one of the biggest barriers to making LLMs reliable at scale. Just days ago this was another open mystery in how LLMs work. Now it is progress — another win for science and engineering. Read more here: https://lnkd.in/d7iNCjzj
Reminds me of the Computer adage of "garbage in, garbage out". A little worrying if AI needs to be treated with kid-gloves, I mean, it is aimed at the masses.
Determinism would make testing easier
So now LLMs can be reliably wrong... actually, that is no surprise - so can people; and we believe that "uncertainty is built into the fabric of the universe."
Determinism across LLM versions?
Very interesting. Divergence between the creative and the mathematical models has meant they haven't been in the fore of this wave. They seemed more of a creative model that speaks and understands maths. This could be a massive improvement
Super interesting and completely logical!
Interesting 🤔
I’m curious if we will see this implemented at scale, or if frontier model providers will not do so, concerned with impacting creativity or the randomness that some are even calling a feature, not a bug, of LLMs.
My biggest issue with this article is the stated goal of of determinism and what this paper is actually describing. To understand it, you need to get your math hat on (and maybe a few beers) and dig into it. What the paper is describing is how to minimize the variance of mathematical calculations by controlling input batches. This isn't making making the models deterministic, it is eliminating variability though processing input control. This takes out some of the mathematical processing variance. First, I don't want my GenAI engine to be deterministic. That is what super computers do today, straight line and deterministic. In some cases, I want the outlier probability to show me things I never considered. Second, the AI engines are acting just like people in that if you ask one question in a focused discussion you get a focused answer. If you ask the same question mixed with 15 other questions in a noisy bar, you get a less focused answer. Also, if there is a goal of determinism in LLMs, the answer is eventually going to get to restricting responses to only the highest probability response (or something limited by the developers), which is the scariest outcome.
Graduate Software Engineer at Sage | AI Enthusiast | Computer Science Graduate
2wThis sounds very interesting! Also from my experience of tech, a model that is ‘reliably wrong’ is much better to work with than one that is intermittently wrong!!