Engineering a performant AI system is all about tradeoffs. As one example, when creating a vector store over which to perform retrieval augmented generation, what size of embeddings should you choose? Researchers at DeepMind sought to characterize the limitations of embedding based retrieval systems as a function of the embeddings size, providing both theoretical analyses and a new benchmark called LIMIT. In this talk we'll discuss the paper, the broader context, and applications. On the Theoretical Limitations of Embedding-Based Retrieval Paper: https://lnkd.in/g7ur79Ze Repo: https://lnkd.in/gkQdkjen Factual Knowledge Acquisition in Pretraining: https://lnkd.in/gvanC-HE Hybrid Search: https://lnkd.in/gQPsZSCq Matryoshka Models: https://lnkd.in/gP-ZduUi
More Relevant Posts
-
🔥 Dijkstra Defeated? A New Shortest Path Algorithm Takes the Lead! Hey everyone! Just came across an exciting breakthrough in graph algorithms that I had to share. For decades, Dijkstra’s algorithm was the gold standard for finding shortest paths in graphs, but now researchers have developed a faster, deterministic algorithm that beats it—at least for sparse graphs! Instead of sorting everything like Dijkstra does, this new method cleverly breaks the graph down into smaller parts and uses smart pivot selection to skip a lot of unnecessary work. Imagine slicing through complexity with divide-and-conquer magic, combining the best of Dijkstra and Bellman-Ford. The result? Way faster shortest path calculations with a cool new time complexity around O(mlog2/3n)O(mlog2/3n)! This isn’t just theory—this could supercharge AI, network routing, and real-time systems where every millisecond counts. As a student fascinated by algorithms, I��m pumped to see how this changes the game! Here is a great interactive demo for shortest path algorithms, including Dijkstra’s algorithm https://lnkd.in/d-BVFwGB #Algorithms #GraphTheory #DijkstraDefeated #AI #MachineLearning #TechInnovation
To view or add a comment, sign in
-
-
It’s tempting to throw AI at every problem, but sometimes the old-school way is still the best. You have to analyze the input space: AI makes sense when it’s truly nondeterministic or when the space is too large, and a probabilistic approach adds value. But if the problem is deterministic, the smartest move is to pick the optimal data structure and algorithm. It’ll save you both money and latency.
To view or add a comment, sign in
-
A New Breakthrough in AI Compression: Chain-of-Thought Based Pruning I spent this weekend going through one of the most fascinating research papers I've read in a while, and trust me, this one is worth your time. Paper: Reasoning Models Can Be Accurately Pruned via Chain-of-Thought Reconstruction By: MIT [Full Paper (10 pages)] (https://lnkd.in/dMXv7PiG) Here's the quick story Most large reasoning models (like DeepSeek-R1) are powerful but painfully expensive to run. The more they think the longer their chain-of-thought the more compute they need. But here's the twist: When you prune (compress) these models using traditional methods, they not only become less accurate… They often get slower too because they start generating longer, messier reasoning steps. This new research changes the game. It introduces something called RAC – Reasoning-Aware Compression. Instead of pruning based only on input data, RAC uses the model's own reasoning traces during calibration. Basically, it "learns how it thinks" and keeps the parts that actually matter for reasoning. The result? Up to 50% smaller models without losing accuracy Faster inference (sometimes 4x faster!) And reasoning quality that's almost identical to the original Why this matters: This is not just a research trick, it's a deployment-level breakthrough. It means we can run powerful reasoning models on smaller hardware, at lower cost, and at real-world scale. And it's another reminder of something I deeply believe: The future of AI isn't just about making models smarter. It's also about making them efficient, accessible, and deployable without losing what makes them powerful. My takeaway: If you're working on reasoning agents, AI infrastructure, or cost optimization, this paper is a must-read. It's 10 pages, but it might just reshape how you think about model compression. Comment your thoughts! Repost to help your network :) #AIResearch #ReasoningModels #ChainOfThought #ModelCompression
To view or add a comment, sign in
-
A recent paper by Weller, Boratko, Naim, and Lee (2025) — On the Theoretical Limitations of Embedding-Based Retrieval, formalizes something many of us in IR and RAG have suspected: embedding-based retrieval has inherent ceilings that cannot be overcome simply by scaling model size or training data. 🔑 Key Contributions: - Derives theoretical bounds showing that the number of distinct top-k retrieval outcomes is limited by the embedding dimension, placing a hard cap on expressiveness. - Uses a “free embedding optimization” setup to demonstrate that even with unconstrained embeddings optimized directly, these bounds manifest in practice. - Introduces LIMIT, a benchmark designed to expose these constraints, on which even state-of-the-art embedding models fail despite the dataset’s simplicity. ⚙️ Implications: - Single-vector embeddings cannot, in principle, represent all retrieval functions we may desire. - Improvements in embedding models (scale, pretraining, fine-tuning) will eventually plateau due to these theoretical expressiveness limits. - Future retrieval architectures will likely require hybrid approaches: multi-vector embeddings, symbolic reasoning, structured indexing, or query-dependent representations. This work is a reminder that some bottlenecks in AI systems are not just engineering problems but theoretical ones — and overcoming them will require rethinking retrieval beyond the current embedding paradigm. #InformationRetrieval #Embeddings #RAG #AIResearch #MachineLearning #AI
To view or add a comment, sign in
-
Why We Can’t Rely on Single-Vector Embeddings for Complex Retrieval . . A breakthrough paper from DeepMind sheds light on a fundamental problem with embedding-based retrieval systems—the single-vector paradigm has hard, mathematical limits. ** Key Insights:** Retrieval tasks often demand combining documents in various nuanced ways, but a single fixed-dimensional embedding can’t represent all possible top-k combinations—regardless of how much data or training you throw at it. The authors introduce the LIMIT dataset, crafted to stress test these limits. And guess what? Even state-of-the-art models fall short, often failing to retrieve simple, obvious matches. Workaround comparison: traditional sparse retrievers like BM25 and multi-vector architectures yield substantially better performance—highlighting a need to rethink our reliance on single-vector approaches. ** Why This Matters:** This isn’t just academic—it impacts real-world systems that rely on embeddings for search, recommendation, and knowledge retrieval. As we ramp up toward more complex, reasoning-based AI systems, these limitations become bottlenecks we can’t ignore. #AI #MachineLearning #InformationRetrieval #Embeddings #DeepLearning #ResearchLimitations
To view or add a comment, sign in
-
-
From Pattern Matching to Reasoning Search. LightOn late-interaction open-source stack is moving semantic search beyond theory, turning cutting-edge research into real-world AI retrieval systems. 🔹 ModernBERT : Re-imagining the encoder 🔹 PyLate : effortless training multi-vector models in hours, not weeks 🔹 FastPlaid : scaling performance for enterprise 🔹 PyLate-rs : bringing SOTA retrieval to the browser 👉🏻 Explore the journey : https://lnkd.in/eQKi7enR
To view or add a comment, sign in
-
-
At LightOn we are literally building reasoning at Enteprise Scale. We used to have information retrieval techniques powering numerous search engines. We are now building the stack that connects Enterprise scale documents collections to Generative AI. We're beyond traditional RAG.
From Pattern Matching to Reasoning Search. LightOn late-interaction open-source stack is moving semantic search beyond theory, turning cutting-edge research into real-world AI retrieval systems. 🔹 ModernBERT : Re-imagining the encoder 🔹 PyLate : effortless training multi-vector models in hours, not weeks 🔹 FastPlaid : scaling performance for enterprise 🔹 PyLate-rs : bringing SOTA retrieval to the browser 👉🏻 Explore the journey : https://lnkd.in/eQKi7enR
To view or add a comment, sign in
-
-
Memory Management in AI Agents & RAG One of the biggest challenges in building intelligent AI systems is memory. An AI agent doesn’t just need to process the current prompt — it needs to remember past conversations, store knowledge, and retrieve the right information at the right time. Here’s how memory works in an AI agent: 🔹 Short-Term Memory → Keeps the recent conversation or context (like the LLM’s token window). 🔹 Long-Term Memory → Stores knowledge, user preferences, and historical data in vector databases, knowledge graphs, or summaries. 🔹 Episodic Memory → Captures events & experiences to personalize interactions. 🔹 Procedural Memory → Remembers workflows, how-tos, and processes. Now combine this with Retrieval-Augmented Generation (RAG), and the agent gets access to external knowledge bases. Instead of memorizing everything, it: 1.Encodes queries into embeddings 2. Retrieves the most relevant documents from a vector DB 3.Feeds them back into the LLM for a smarter response With proper memory management, AI agents become: ✅ More scalable ✅ Context-aware ✅ Personal and consistent ✅ Capable of handling dynamic knowledge I created a visual architecture diagram to show how short-term, long-term, and RAG-based memory work together to power intelligent agents. What do you think is the most important type of memory for future AI agents — episodic personalization or knowledge retrieval? #AI #ArtificialIntelligence #GenerativeAI #RAG #VectorDatabases #MachineLearning #AITools
To view or add a comment, sign in
-
-
Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in complex Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%. Details in our research blog: https://lnkd.in/gQvNJptH Most interestingly perhaps, reflective prompt optimization can beat SFT on the same data, or can stack with it as observed in Better Together (https:// arxiv.org/abs/2407.10930). In practice it also requires fewer labels and can take in richer user feedback (ALHF: https://lnkd.in/g5yJUkKA)
To view or add a comment, sign in
-
-
Databricks makes it easy to tune your Gen AI for cost or quality in your specific domain. Check out the research from Matei and team to see how!
Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in complex Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%. Details in our research blog: https://lnkd.in/gQvNJptH Most interestingly perhaps, reflective prompt optimization can beat SFT on the same data, or can stack with it as observed in Better Together (https:// arxiv.org/abs/2407.10930). In practice it also requires fewer labels and can take in richer user feedback (ALHF: https://lnkd.in/g5yJUkKA)
To view or add a comment, sign in
-