Skip to content

Feature/prompt compression#345

Merged
Ingvarstep merged 4 commits into
mainfrom
feature/prompt_compression
Apr 18, 2026
Merged

Feature/prompt compression#345
Ingvarstep merged 4 commits into
mainfrom
feature/prompt_compression

Conversation

@Ingvarstep

Copy link
Copy Markdown
Collaborator

Summary

Introduces prompt compression for uni-encoder GLiNER models (span, token, and relation-extraction variants). For a fixed label set, prompt embeddings are computed once and reused at inference — the encoder no longer re-processes the <>label1<>...<> prefix on every call, shortening the input sequence and cutting attention cost for a meaningful speedup, especially for short sequences.

What's included

  • BaseGLiNER.compress_prompt_embeddings(texts, labels, rel_labels=None, batch_size=8, ...) — runs a forward pass over a calibration corpus, extracts the pre-projection <> (and optionally <>) token representations, averages them per label, and stores the resulting (L, D) matrix as a non-trainable parameter on the underlying model.
  • Precomputed inference path — config.precomputed_prompts_mode=True switches forward / predict_entities to look up the stored embeddings instead of prepending label tokens. State travels through state_dict, so save_pretrained / from_pretrained round-trip the compressed model automatically.
  • Relation-extraction support — pass rel_labels=... to compress <> prompts for relex models alongside entities.
  • End-to-end distillation (opt-in) — distill=True makes compress_prompt_embeddings a one-call pipeline: the pre-compression model first generates pseudo-labels over texts, prompts are compressed, and the compressed model is fine-tuned on those pseudo-labels to recover the small accuracy drop from averaging. Exposes distill_threshold, distill_epochs,
    distill_lr, distill_batch_size, distill_output_dir, and distill_train_kwargs for control.
  • Benchmark script — benchmarks/eval_compressed_biomed.py compares raw vs. compressed (optionally distilled) GLiNER on knowledgator/biomed_NER, reporting F1, latency, and speedup.
  • Docs — docs/usage.md gains a "⚡ Prompt Compression" section covering the basic flow, relation extraction, and end-to-end distillation.

Usage

 model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")                                                                                                                           
                                                                                                                                                                                       
 # One-call compression + distillation                                                                                                                                                 
 model.compress_prompt_embeddings(
     texts=calibration_texts,                                                                                                                                                          
     labels=["person", "organization", "location", "date"],
     batch_size=16,                                                                                                                                                                    
     distill=True,                       
     distill_epochs=3,
 )                                                                                                                                                                                     
                                             
 model.save_pretrained("./gliner-compressed")                                                                                                                                          

Trade-offs

  • Label set becomes fixed per compressed model — adding labels requires re-running compression.
  • Prompt averaging loses some context sensitivity; enabling distill=True typically recovers it.
  • Applies to uni-encoder variants only (bi-encoder already uses a separate label encoder).
@Ingvarstep Ingvarstep merged commit 7d87fd4 into main Apr 18, 2026
@Ingvarstep Ingvarstep deleted the feature/prompt_compression branch April 18, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant