Computer Science > Computation and Language

arXiv:2509.17396v3 (cs)

[Submitted on 22 Sep 2025 (v1), revised 11 Oct 2025 (this version, v3), latest version 19 May 2026 (v4)]

Title:EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Authors:Minsoo Kim, Arnav Kundu, Han-Byul Kim, Richa Dixit, Minsik Cho

Abstract:Modern large language models (LLMs) extend context lengths to millions of tokens, enabling coherent, personalized responses grounded in long conversational histories. This ability, however, hinges on Key-Value (KV) caching, whose memory grows linearly with dialogue length and quickly becomes the bottleneck in resource-constrained environments. An active line of research for reducing memory bottleneck is KV cache compression, which seeks to limit cache size while preserving accuracy. Yet existing methods face two major limitations: (i) evicting the KV cache after full-context prefill causes unbounded peak memory, and (ii) query-dependent eviction narrows the cache to a single query, leading to failure cases in multi-turn conversations. We introduce EpiCache, a training-free KV cache management framework for long conversational question answering (LongConvQA) under fixed memory budgets. EpiCache bounds cache growth through block-wise prefill and preserves topic-relevant context via episodic KV compression, which clusters conversation history into coherent episodes and applies episode-specific KV cache eviction. We further design an adaptive layer-wise budget allocation strategy that measures each layer's sensitivity to eviction and distributes the memory budget across layers accordingly. Across three LongConvQA benchmarks, EpiCache improves accuracy by up to 40%, maintains near-full KV accuracy under 4-6x compression, and reduces latency/memory by up to 2.4x/3.5x, enabling efficient multi-turn interaction under strict resource limits. Our code is available at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2509.17396 [cs.CL]
	(or arXiv:2509.17396v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2509.17396

Submission history

From: Minsoo Kim [view email]
[v1] Mon, 22 Sep 2025 06:56:35 UTC (5,997 KB)
[v2] Thu, 25 Sep 2025 10:24:14 UTC (5,999 KB)
[v3] Sat, 11 Oct 2025 09:04:23 UTC (5,999 KB)
[v4] Tue, 19 May 2026 20:55:44 UTC (2,181 KB)

Computer Science > Computation and Language

Title:EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators