Trends in AI Task Completion

Explore top LinkedIn content from expert professionals.

  • View profile for Sharada Yeluri

    Engineering Leader

    19,554 followers

    A lot has changed since my #LLM inference article last January—it’s hard to believe a year has passed! The AI industry has pivoted from focusing solely on scaling model sizes to enhancing reasoning abilities during inference. This shift is driven by the recognition that simply increasing model parameters yields diminishing returns and that improving inference capabilities can lead to more efficient and intelligent AI systems. OpenAI's o1 and Google's Gemini 2.0 are examples of models that employ #InferenceTimeCompute. Some techniques include best-of-N sampling, which generates multiple outputs and selects the best one; iterative refinement, which allows the model to improve its initial answers; and speculative decoding. Self-verification lets the model check its own output, while adaptive inference-time computation dynamically allocates extra #GPU resources for challenging prompts. These methods represent a significant step toward more reasoning-driven inference. Another exciting trend is #AgenticWorkflows, where an AI agent, a SW program running on an inference server, breaks the queried task into multiple small tasks without requiring complex user prompts (prompt engineering may see end of life this year!). It then autonomously plans, executes, and monitors these tasks. In this process, it may run inference multiple times on the model while maintaining context across the runs. #TestTimeTraining takes things further by adapting models on the fly. This technique fine-tunes the model for new inputs, enhancing its performance. These advancements can complement each other. For example, an AI system may use agentic workflow to break down a task, apply inference-time computing to generate high-quality outputs at each step and employ test-time training to learn unexpected challenges. The result? Systems that are faster, smarter, and more adaptable. What does this mean for inference hardware and networking gear? Previously, most open-source models barely needed one GPU server, and inference was often done in front-end networks or by reusing the training networks. However, as the computational complexity of inference increases, more focus will be on building scale-up systems with hundreds of tightly interconnected GPUs or accelerators for inference flows. While Nvidia GPUs continue to dominate, other accelerators, especially from hyperscalers, would likely gain traction. Networking remains a critical piece of the puzzle. Can #Ethernet, with enhancements like compressed headers, link retries, and reduced latencies, rise to meet the demands of these scale-up systems? Or will we see a fragmented ecosystem of switches for non-Nvdia scale-up systems? My bet is on Ethernet. Its ubiquity makes it a strong contender for the job... Reflecting on the past year, it’s clear that AI progress isn’t just about making things bigger but smarter. The future looks more exciting as we rethink models, hardware, and networking. Here’s to what the 2025 will bring!

  • View profile for Mrukant Popat

    💥 Igniting Innovation in Engineering | CTO | AI / ML / Computer Vision, OS - operating system, Platform firmware | 100M+ devices running my firmware

    5,101 followers

    𝗔𝗜’𝘀 𝗔𝗯𝗶𝗹𝗶𝘁𝘆 𝘁𝗼 𝗖𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗟𝗼𝗻𝗴 𝗧𝗮𝘀𝗸𝘀 𝗶𝘀 𝗗𝗼𝘂𝗯𝗹𝗶𝗻𝗴 𝗘𝘃𝗲𝗿𝘆 𝟳 𝗠𝗼𝗻𝘁𝗵𝘀 A groundbreaking study reveals that AI models are rapidly improving in their ability to autonomously complete long, complex tasks. The length of tasks (measured by how long they take human experts) that generalist AI agents can complete with 50% reliability has been doubling every ~7 months for the last 6 years. If this exponential trend continues, AI could independently complete week-long tasks within 2-4 years—a shift that could redefine automation, productivity, and workforce dynamics. 🔹 Key insights: ✅ Current AI models can solve short, structured tasks but struggle with substantive projects. ✅ Performance has skyrocketed on benchmarks, yet real-world utility remains limited. ✅ Measuring AI by the length of tasks it can complete provides a better forecast of its true capabilities. ✅ If trends hold, AI capable of autonomously handling month-long projects could emerge before 2030. The implications are massive—from software development to executive assistance to scientific research. But with great power comes great responsibility. How do we prepare for an AI-driven future where machines can handle complex, multi-week tasks? 📄 Full paper & GitHub repo in comments. 💡Are we ready for AI that works on projects lasting weeks or even months? #AI #Automation #ArtificialIntelligence #FutureOfWork #MachineLearning #TechTrends

  • View profile for John Bailey

    Strategic Advisor | Investor | Board Member

    15,692 followers

    Researchers at METR @METR just published a new paper that shows that the length of tasks AI agents can complete autonomously has been doubling every 7 months since 2019 - essentially revealing a "Moore's Law" of sorts that can help us better understand the trajectory of AI capabilities. Key Takeaways: - To measure AI progress in a way that compares to humans, the study introduces a new metric: the 50%-task-completion time horizon. This represents the longest task an AI agent can complete correctly half the time, based on how long it usually takes a human expert to finish the same task. - AI’s ability to complete long, complex tasks has been doubling every 7 months since 2019. - If this trend continues, AI agents could independently handle tasks that take humans a month by 2028-2031. - The biggest drivers of improvement: Better reasoning, tool use, and adaptability—not just bigger models. It will be interesting to see how approaches like OpenAI and Google's Deep Research impact this. - AI still struggles with messy, real-world tasks that require intuition, judgment, and seeking out missing information. Paper: https://lnkd.in/d7bW6RTV METR thread: https://lnkd.in/dp4-Y64v Great thread on the background of the paper from Elizabeth (Beth) Barnes: https://lnkd.in/dfuvi6ZY

Explore categories