Andre Leibovici’s Post

View profile for Andre Leibovici

Chief AI Officer @ ASI Solutions | Driving Secure AI & Sovereign Platforms for Enterprise & Government | Co-founder: CloudVector, Inzpec, Devonport Ventures| ex-Nutanix, VMware, Citrix

🚨 Why Mira Murati’s Breakthrough Matters — and What It Reveals About LLMs Today’s LLMs aren’t just sometimes wrong — they’re also not reproducible. Ask the same question 5 times and get 5 different answers — even with “deterministic” settings. Murati’s new lab, Thinking Machines, just showed why: GPU kernels used in inference aren’t batch-invariant. 🧠 In plain terms: if multiple users send prompts at once, the server batches them for efficiency. Tiny differences in operation order cause floating-point shifts — which cascade into different neurons and answers. If your request runs alone, you might avoid this. But in production, they almost never do. Their fix? New kernels (matmul, attention, RMSNorm) that guarantee: same input → same output, every time. Why this matters: • Compliance: Reproducibility is mandatory in regulated sectors. • Cost: Stable outputs enable caching, cutting GPU burn. •Productization: Expect enterprise “deterministic modes.” ⚠️ But: determinism ≠ correctness. You may now get the same wrong answer every time. Reliability needs both consistency and correctness. https://lnkd.in/gYPaE2S6

To view or add a comment, sign in

Explore content categories