When finetuning LLMs with 100k+ token examples in private cloud environments becomes a commodity, this is where the real benefits of large language models for business will start to be seen. Putting instructions into a 100k+ token long prompt and hoping for the best isn't a viable business strategy. Finetuning with such long prompts (especially including the hidden reasoning segment) is currently only available to the LLM builders themselves due to the scarcity of talent capable of setting up such expensive distributed computing systems. If you are waiting for what's next in AI: the democratization of long context finetuning is what's next.
Do you really need distributed compute for only 100k-tokens? It seems to me you would be better off paying for a reasonably large server and running local.
Democratizing LLMs transforms the landscape for businesses. Access to tools fosters innovation and creativity. 🌍 #AIRevolution
interesting ! like synthetic data generation for HRM build application kind or thing or some thing different ...! Andriy Burkov would love to hear if there is any specific Field or application we can take as example. or analogy from.
Making long-context finetuning accessible will be a game-changer for companies looking to leverage LLMs effectively.
What's next in AI needs to be a new architecture. There is only so much shit you can pile on top of shit before it collapses.
Private environments? - what does it mean? local DC? How many NVIDIA A100 need to compute 100k token? 200 pcs?
That's also a great argument to getting smaller models with extended context windows to work for you on premises. No one actually needs a general purpose model for business purposes instead of tuning it to make work easier.
Isn't it better and cheaper to Fund R&D towards Neuromorphic and cognition based computing instead of suggesting the impossible.?
No BS AI/ML Content | ML Engineer with a Plot Twist 🥷50M+ Views 📝
1mowhat infrastructure challenges block wider access to 100k-token tuning?