Skip to content

hmwang2002/InternSVG

Repository files navigation

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

         

πŸ“š Introduction

We present the InternSVG family, an integrated data–benchmark–model suite.

  • 🧩 SAgoge Dataset β€” The largest and most comprehensive multimodal dataset for SVG tasks, spanning icons, long-sequence illustrations, scientific diagrams, and dynamic animations. It provides rich hierarchical structures and diverse attributes, supporting tasks of varied difficulty levels.
  • πŸ“Š SArena Benchmark β€” A companion benchmark offering unified task definitions and standardized evaluation protocols, aligned with SAgoge’s domains and difficulty spectrum. It enables consistent comparison across SVG understanding, editing, and generation tasks.
  • πŸ€– InternSVG Model β€” A unified multimodal large language model (MLLM) for SVG understanding, editing, and generation.

πŸ”₯ News

  • [2025-10-13] πŸŽ‰ We release the SArena benchmark. πŸ€—Benchmark
  • [2025-10-13] πŸ‘‹ Upload paper and init project. Read

πŸ“ Open-Source Plan

  • Evaluation code
  • SArena benchmark
  • SAgoge dataset
  • Fine-tuning scripts
  • Model weights
  • Paper

πŸ“Œ Quick Start

βš™οΈ Installation

git clone https://github.com/hmwang2002/InternSVG.git
cd InternSVG

conda create -n internsvg python=3.9 -y
conda activate internsvg
pip install -r requirements.txt

# install clip
pip install git+https://github.com/openai/CLIP.git

Download ViCLIP.

mkdir sarena_ckpt
cd sarena_ckpt
# You need to login first and have the access to the repo https://huggingface.co/OpenGVLab/ViCLIP. Use the command "huggingface-cli login" to login.
huggingface-cli download --resume-download OpenGVLab/ViCLIP ViClip-InternVid-10M-FLT.pth --local-dir .
cd ..

(Optional) If you need to simplify your own SVG code, install svgo.

conda install nodejs
npm install -g svgo

πŸ“Š SArena Benchmark

Download

The SArena benchmark is available here. You can use the huggingface_hub command to download directly:

hf download InternSVG/SArena SArena.zip --repo-type dataset --resume-download --local-dir PATH_TO_YOUR_DIR
unzip SArena.zip

After extraction, you will get:

SArena/
β”œβ”€β”€ animation/
β”‚   β”œβ”€β”€ overall/
β”‚   β”œβ”€β”€ svg/
β”‚   β”œβ”€β”€ video/
β”‚   β”œβ”€β”€ text2sani.jsonl
β”‚   └── video2sani.jsonl
β”‚
β”œβ”€β”€ chemistry/
β”‚   β”œβ”€β”€ images/
β”‚   β”œβ”€β”€ svg/
β”‚   β”œβ”€β”€ img2svg.jsonl
β”‚   └── text2svg.jsonl
β”‚
β”œβ”€β”€ illustration/
β”‚   β”œβ”€β”€ images/
β”‚   β”œβ”€β”€ svg/
β”‚   β”œβ”€β”€ caption.jsonl
β”‚   β”œβ”€β”€ img2svg.jsonl
β”‚   └── text2svg.jsonl
β”‚
β”œβ”€β”€ Icon/
β”‚   β”œβ”€β”€ edit/
β”‚   β”‚   └── data/
β”‚   β”‚       β”œβ”€β”€ color_complex.jsonl
β”‚   β”‚       β”œβ”€β”€ color_simple.jsonl
β”‚   β”‚       β”œβ”€β”€ crop.jsonl
β”‚   β”‚       β”œβ”€β”€ flip.jsonl
β”‚   β”‚       β”œβ”€β”€ opacity.jsonl
β”‚   β”‚       β”œβ”€β”€ outline.jsonl
β”‚   β”‚       β”œβ”€β”€ rotate.jsonl
β”‚   β”‚       β”œβ”€β”€ scale.jsonl
β”‚   β”‚       β”œβ”€β”€ styletransform_openmoji.jsonl
β”‚   β”‚       └── translate.jsonl
β”‚   β”‚
β”‚   β”œβ”€β”€ generation/
β”‚   β”‚   β”œβ”€β”€ images/
β”‚   β”‚   β”œβ”€β”€ svg/
β”‚   β”‚   β”œβ”€β”€ caption.jsonl
β”‚   β”‚   β”œβ”€β”€ img2svg.jsonl
β”‚   β”‚   └── text2svg.jsonl
β”‚   β”‚
β”‚   └── understanding/
β”‚       └── sarena_un.jsonl

Inference

Template scripts for inference can be found in the scripts/inference/ folder.

For example, for the icon/illustration/chemistry generation task, you can modify the script above by specifying your own paths and API configuration.

#!/bin/bash
export PYTHONPATH=$(pwd):$PYTHONPATH

BASE_URL="BASE_URL"
API_KEY="API_KEY"
MODEL_NAME="MODEL_NAME"
TEXT2SVG_TEST_PATH="PATH_TO_TEXT2SVG_TEST_PATH"
IMG2SVG_TEST_PATH="PATH_TO_IMG2SVG_TEST_PATH"
OUTPUT_DIR="PATH_TO_OUTPUT_DIR"
RETRY=1
TEMPERATURE=0.0
MAX_TOKENS=4000
MAX_WORKERS=32

python metrics/inference/inference.py \
--base_url ${BASE_URL} \
--api_key ${API_KEY} \
--model_name ${MODEL_NAME} \
--text2svg_test_path ${TEXT2SVG_TEST_PATH} \
--img2svg_test_path ${IMG2SVG_TEST_PATH} \
--output_dir ${OUTPUT_DIR} \
--temperature ${TEMPERATURE} \
--max_tokens ${MAX_TOKENS} \
--max_workers ${MAX_WORKERS}

Then run:

bash scripts/inference/gen/demo.sh

Specifically, for SVG animation generation task, a template inference script is provided at scripts/inference/animation/demo.sh.

When all test samples have been processed, each SVG file needs to be converted into an MP4 video for metric evaluation. Use the script utils/svg_animate.py to generate MP4 files. Note that we need two resolutions: 448Γ—448 and 128Γ—128. Before running, modify the OUTPUT_DIRS and FILE_DIRS variables in the run_all_mp() function. (Notably, in our code, if the output path contains '_128', it will automatically use the 128Γ—128 resolution.)

The directory structure of the test files is as follows:

evaluate
β”œβ”€β”€ .vscode
β”œβ”€β”€ animation/gpt4o
β”‚   β”œβ”€β”€ text2sani
β”‚   β”‚   β”œβ”€β”€ svg/
β”‚   β”‚   β”œβ”€β”€ video/
β”‚   β”‚   β”œβ”€β”€ video_128/
β”‚   β”‚   └── output.jsonl
β”‚   └── video2sani
β”‚       β”œβ”€β”€ svg/
β”‚       β”œβ”€β”€ video/
β”‚       β”œβ”€β”€ video_128/
β”‚       └── output.jsonl

Evaluate

The scripts/evaluate/ directory contains template scripts for running evaluation across different domains (e.g., icon, illustration, chemistry, and animation).

Each subfolder corresponds to a specific domain:

scripts/evaluate/
β”œβ”€β”€ icon/
β”‚   β”œβ”€β”€ edit/
β”‚   β”œβ”€β”€ gen/
β”‚   └── un/
β”œβ”€β”€ illustration/
β”œβ”€β”€ chem/
└── animation/

Below is a demo for evaluating generation tasks (Text-to-SVG and Image-to-SVG):

#!/bin/bash
export PYTHONPATH=$(pwd):$PYTHONPATH

python evaluate_gen.py \
    --model_name "GPT-4o" \
    --text2svg_test_dir "PATH_TO_TEXT2SVG_RESULTS" \
    --img2svg_test_dir "PATH_TO_IMG2SVG_RESULTS" \
    --tokenizer_path "PATH_TO_TOKENIZER" \
    --test_file_path "PATH_TO_TEST_JSONL" \
    --gt_img_dir "PATH_TO_GT_IMAGES" \
    --gt_svg_dir "PATH_TO_GT_SVGS" \
    --caption_path "PATH_TO_CAPTIONS" \
    --bench_name "Icon"

If your model does not support either the Text-to-SVG or Image-to-SVG task, simply set the corresponding test directory argument (--text2svg_test_dir or --img2svg_test_dir) to an empty string.

πŸ“œ Acknowledgements

We would like to thank Kiyotaka, yinlikestudy, and quentin-77 for their valuable contributions to this project.

The InternSVG model is developed based on InternVL and further fine-tuned with LLaMA-Factory for SVG understanding, editing, and generation tasks.

We also acknowledge the following open-source efforts that have contributed to advancing SVG understanding and generation:

License

InternSVG is licensed under the Apache License 2.0.

πŸ“– Citation

@article{wang2025internsvg,
  title={InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models},
  author={Wang, Haomin and Yin, Jinhui and Wei, Qi and Zeng, Wenguang and Gu, Lixin and Ye, Shenglong and Gao, Zhangwei and Wang, Yaohui and Zhang, Yanting and Li, Yuanqi and others},
  journal={arXiv preprint arXiv:2510.11341},
  year={2025}
}

About

Official repository of InternSVG.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published