Skip to content

MGDDestiny/Lava

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LAVa

LAVa is a kv cache compression method aiming to imporve kv cache eviction performance based on theoretical analysis. It supports dynamic head budget allocation and dynamic layer budget allocation.

Usage

Requirements

transformers==4.41.1
flash-attn==2.4.0

datasets
tiktoken
jieba
rouge_score

Installation

https://github.com/MGDDestiny/Lava/
cd Lava
make i

Quick Start

python inference.py -m /path/of/mistral_or_qwen/model

Evaluations

LongBench

cd ./experiments/LongBench
bash eval_longbench_lava.sh

Needle_In_A_Haystack

cd ./experiments/needle_in_haystack
bash eval_needle_lava.sh

Ruler

cd ./experiments/ruler
bash eval_ruler_lava.sh

Ruler

cd ./experiments/InfiniteBench
bash src/eval_infinite_lava.sh

Citation

If you found our work valuable, please cite:

@misc{shen2025lavalayerwisekvcache,
      title={LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation}, 
      author={Yiqun Shen and Song Yuan and Zhengze Zhang and Xiaoliang Wang and Daxin Jiang and Nguyen Cam-Tu},
      year={2025},
      eprint={2509.09754},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.09754}, 
}

Acknowledgement

We extend our gratitude to Adakv, SnapKV and PyramidKV for their contributions of open-source code, which have significantly facilitated the advancement of this project.

About

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages