💧Liquid LFM2.5: How To Run & Fine-tune

Run and fine-tune LFM2.5 Instruct and Vision locally on your device!

Liquid AI releases LFM2.5, including their instruct and vision model. LFM2.5-1.2B-Instruct is a 1.17B parameter hybrid reasoning model trained on 28T tokens and RL, delivering best-in-class performance at the 1B scale for instruction following, tool use, and agentic tasks. See Hugging Face Jobs on using Codex to train LFM!

LFM2.5 runs on under 1GB RAM and achieves 239 tok/s decode on AMD CPU. You can also fine-tune it locally with Unsloth.

Text LFM2.5-Instruct Vision LFM2.5-VL

Dynamic GGUFs

16-bit Instruct

LFM2.5-1.2B-Instruct-GGUF

LFM2.5-1.2B-Instruct

Model Specifications:

Parameters: 1.17B
Architecture: 16 layers (10 double-gated LIV convolution blocks + 6 GQA blocks)
Training Budget: 28T tokens
Context Length: 32,768 tokens
Vocabulary Size: 65,536
Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish

⚙️ Usage Guide

Liquid AI recommends these settings for inference:

temperature = 0.1
top_k = 50
top_p = 0.1
repetition_penalty = 1.05
Maximum context length: 32,768

Chat Template Format

LFM2.5 uses a ChatML-like format:

LFM2.5 chat template:

Tool Use

LFM2.5 supports function calling with special tokens <|tool_call_start|> and <|tool_call_end|>. Provide tools as a JSON object in the system prompt:

🖥️ Run LFM2.5-1.2B-Instruct

📖 llama.cpp Tutorial (GGUF)

1. Build llama.cpp

Obtain the latest llama.cpp from GitHub. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU.

2. Run directly from Hugging Face

3. Or download the model first

4. Run in conversation mode

🦥 Fine-tuning LFM2.5 with Unsloth

Unsloth supports fine-tuning LFM2.5 models. The 1.2B model fits comfortably on a free Colab T4 GPU. Training is 2x faster with 50% less VRAM.

Free Colab Notebook:

LFM2.5 is recommended for agentic tasks, data extraction, RAG, and tool use. It is not recommended for knowledge-intensive tasks or programming.

Unsloth Config for LFM2.5

Training Setup

Save and Export

🎉 llama-server Serving & Deployment

To deploy LFM2.5 for production with an OpenAI-compatible API:

Test with OpenAI client:

📊 Benchmarks

LFM2.5-1.2B-Instruct delivers best-in-class performance at the 1B scale and offers fast CPU inference with low memory usage:

💧 Liquid LFM2.5-1.2B-VL Guide

LFM2.5-VL-1.6B is a vision LLM built on top of LFM2.5-1.2B-Base and tuned for stronger real-world performance. You can now fine-tune it locally with Unsloth.

Running Tutorial Fine-tuning Tutorial

Dynamic GGUFs

16-bit Instruct

LFM2.5-VL-1.6B-GGUF

LFM2.5-VL-1.6B

Model Specifications:

LM Backbone: LFM2.5-1.2B-Base
Vision encoder: SigLIP2 NaFlex shape-optimized 400M
Context length: 32,768 tokens
Vocabulary size: 65,536
Languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish
Native resolution processing: Handles images up to 512×512 pixels without upscaling and preserves non-standard aspect ratios without distortion
Tiling strategy: Splits large images into non-overlapping 512×512 patches and includes thumbnail encoding for global context
Inference-time flexibility: User-tunable maximum image tokens and tile count for speed/quality tradeoff without retraining

⚙️ Usage Guide

Liquid AI recommends these settings for inference:

Text: temperature=0.1, min_p=0.15, repetition_penalty=1.05
Vision: min_image_tokens=64, max_image_tokens=256, do_image_splitting=True

Chat Template Format

LFM2.5-VL uses a ChatML-like format:

LFM2.5-VL chat template:

🖥️ Run LFM2.5-VL-1.6B

📖 llama.cpp Tutorial (GGUF)

1. Build llama.cpp

Obtain the latest llama.cpp from GitHub. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU.

2. Run directly from Hugging Face

🦥 Fine-tuning LFM2.5-VL with Unsloth

Unsloth supports fine-tuning LFM2.5 models. The 1.6B model fits comfortably on a free Colab T4 GPU. Training is 2x faster with 50% less VRAM.

Free Colab Notebook:

LFM2.5-VL-1.6B SFT LoRA notebook

Unsloth Config for LFM2.5

Training Setup

Save and Export

📊 Benchmarks

LFM2.5-VL-1.6B delivers best-in-class performance:

Model

MMStar

MM-IFEval

BLINK

InfoVQA (Val)

OCRBench (v2)

RealWorldQA

MMMU (Val)

MMMB (avg)

Multilingual MMBench (avg)

LFM2.5-VL-1.6B

50.67

52.29

48.82

62.71

41.44

64.84

40.56

76.96

65.90

LFM2-VL-1.6B

49.87

46.35

44.50

58.35

35.11

65.75

39.67

72.13

60.57

InternVL3.5-1B

50.27

36.17

44.19

60.99

33.53

57.12

41.89

68.93

58.32

FastVLM-1.5B

53.13

24.99

43.29

23.92

26.61

61.56

38.78

64.84

50.89

📚 Resources

PreviousDeepSeek-R1-0528 NextMagistral

Last updated 10 days ago

Was this helpful?