Skip to content
/ veif Public

A pragmatic, high-performance, multi-resolution image format

License

Notifications You must be signed in to change notification settings

octu0/veif

Repository files navigation

veif

veif is a pragmatic, high-performance image format designed for efficient multi-resolution delivery.

The core philosophy of veif is speed and efficiency—not just in compression, but in distribution.

Motivation

In modern web services and applications, handling user-uploaded images typically requires generating multiple static files for different display sizes (e.g., original.jpg, large.jpg, medium.jpg, thumbnail.jpg).

This traditional approach has significant drawbacks:

  1. Storage Redundancy: Storing multiple versions of the same image wastes disk space.
  2. Computational Cost: The server must decode, resize, and re-encode the source image multiple times to generate these variants.
  3. Management Complexity: Managing multiple file artifacts for a single logical image increases system complexity.

The Solution: One Master File, Multiple Resolutions

veif solves this by adopting a multi-resolution architecture, optimized for high-speed processing.

figure0

With veif, you generate one single master file. The server stores only this file. When a client needs a specific resolution, the server (or the application logic) simply extracts the necessary data layers from the master file.

  • Need a Thumbnail? -> Extract Layer 0 only.
  • Need a Preview? -> Extract Layer 0 + Layer 1.
  • Need Full Detail? -> Extract All Layers.
  • Need Speed? -> Use One mode (Single layer, no progressive structure).

This approach eliminates the need for server-side resizing or re-compression. The "transcoding" process is replaced by efficient binary slicing (demuxing), drastically reducing server CPU load and storage requirements.

One Mode

The One mode is designed for scenarios where speed is the top priority. Unlike the default multi-resolution format, it stores the image as a single data layer without a progressive structure. This reduces processing overhead for both encoding and decoding while maintaining the same image quality.

DATA LAYOUT

                                     VEIF File Structure (Default)
+--------------------------------------------------------------------------------+
|                                         Container                              |
+--------------------------+--------------------------+--------------------------+
|          Layer 0         |          Layer 1         |          Layer 2         |
+-------------+------------+-------------+------------+-------------+------------+
| Size (4B)   | Body (N)   | Size (4B)   | Body (N)   | Size (4B)   | Body (N)   |
| UInt32 BE   |            | UInt32 BE   |            | UInt32 BE   |            |
+-------------+------------+-------------+------------+-------------+------------+

                                     VEIF File Structure (One Mode)
+--------------------------------------------------------------------------------+
|                                         Container                              |
+--------------------------+-----------------------------------------------------+
|          Layer 0         |
+-------------+------------+
| Size (4B)   | Body (N)   |
| UInt32 BE   |            |
+-------------+------------+

                                     Layer Body Structure
+--------------------------+-------------------------------------+---------------------------------+
|          Header          |              Metadata               |             Payload             |
+--------------+-----------+------------+------------+-----------+---------------------------------+
| Magic (4B)   | Layer(1B) | Width (2B) | Height(2B) | QStep(1B) |           See Below             |
| 'V''E''I''F' | 0 - 2     | UInt16 BE  | UInt16 BE  | UInt8     |                                 |
+--------------+-----------+------------+------------+-----------+---------------------------------+

                                     Payload Structure
+------------------------------------+------------------------------------+------------------------------------+
|              Y Plane               |              Cb Plane              |              Cr Plane              |
+------------+-----------------------+------------+-----------------------+------------+-----------------------+
| Count (2B) |       Blocks...       | Count (2B) |       Blocks...       | Count (2B) |       Blocks...       |
| UInt16 BE  | [Len(2B)+Data] * Count| UInt16 BE  | [Len(2B)+Data] * Count| UInt16 BE  | [Len(2B)+Data] * Count|
+------------+-----------------------+------------+-----------------------+------------+-----------------------+

Performance

Layer Resolution Size Image
Layer0 1/4 4.74KB Layer0
Layer1 1/2 11.81KB Layer1
Layer2 1 28.34KB Layer2
One(no layer) 1 40.44KB One
original 1 213.68KB original

Quality (Size vs MS-SSIM)

compare size mm-ssim

Speed (Time vs MS-SSIM)

compare speed mm-ssim

Thumbnail Speed (Total Time)

veif (DecOnly) vs JPEG (FullDec+Resize+Enc+Dec)

compare thumbnail mm-ssim

Resolution Quality (160p - 1280P)

JPEG (Red) vs veif (Blue)

compare resolution quality mm-ssim

When to use veif? (vs. AVIF / JPEG XL)

While next-generation formats like AVIF or JPEG XL focus on achieving the absolute highest compression ratios, they often come with significant computational complexity.
veif takes a more pragmatic approach, optimizing for delivery speed and infrastructural efficiency over extreme compression.

  • Use AVIF or JPEG XL when:

    • You need maximum compression for archival storage.
    • Saving every single byte of bandwidth is your absolute top priority, and you can afford the high CPU cost (and latency) for encoding and decoding.
  • Use veif when:

    • You need to serve multiple resolutions (thumbnail, preview, full-size) dynamically and instantly from a single master file.
    • You want to completely eliminate server-side resizing and re-encoding costs (e.g., slicing the binary directly at the CDN edge).
    • You are processing user-uploaded images in real-time and require lightning-fast, highly stable encode/decode speeds comparable to heavily optimized JPEG implementations.

Usage

The core API is Foundation-free and uses [UInt8] for all platforms (including WASM). On macOS, convenience wrappers that accept/return Data are also available.

Core API ([UInt8], all platforms)

import veif

// --- Default (Progressive, 3-layers) ---

let encoded: [UInt8] = try await encode(img: ycbcr, maxbitrate: 200 * 1000)
let (layer0, layer1, layer2) = try await decode(r: encoded)

// --- Speed Mode (Single layer) ---

let encodedOne: [UInt8] = try await encodeOne(img: ycbcr, maxbitrate: 200 * 1000)
let decodedOne = try await decodeOne(r: encodedOne)

// --- Individual Layer Handling ---

let (encodedLayer0, encodedLayer1, encodedLayer2) = try await encodeLayers(img: ycbcr, maxbitrate: 200 * 1000)
let decodedLayers = try await decodeLayers(layers: encodedLayer0, encodedLayer1, encodedLayer2)

macOS Convenience API (Data)

import veif

let data = try Data(contentsOf: URL(fileURLWithPath: "src.png"))
let ycbcr = try pngToYCbCr(data: data)

// Encode → Data
let encoded: Data = try await encodeImage(img: ycbcr, maxbitrate: 200 * 1000)

// Decode → YCbCrImage
let (layer0, layer1, layer2) = try await decodeImage(r: encoded)

// Speed Mode
let encodedOne: Data = try await encodeImageOne(img: ycbcr, maxbitrate: 200 * 1000)
let decodedOne = try await decodeImageOne(r: encodedOne)

// Individual Layers
let (encodedLayer0, encodedLayer1, encodedLayer2) = try await encodeImageLayers(img: ycbcr, maxbitrate: 200 * 1000)
let decodedLayers = try await decodeImageLayers(data: encodedLayer0, encodedLayer1, encodedLayer2)

Online DEMO

veif wasm demo

Internals

  • Color Space: YCbCr 4:2:0
  • Transform: Multi-Resolution Discrete Wavelet Transform (LeGall 5/3) 2-level 2D block transform
    • Macroblock DWT (no block artifacts)
    • 3-Layer Progressive Encoding
      • Layer 0: Thumbnail (Base LL band)
      • Layer 1: Medium Quality (Adds HL, LH, HH of level 1)
      • Layer 2: High Quality (Adds HL, LH, HH of level 0)
  • Quantization: Sampling-based Rate Control
    • Predicts optimal step size by probing 8 key regions (corners, center, edges) to meet target bitrate
    • Frequency Weighting: Applies different quantization steps based on frequency bands (Low: 1x, Mid: 2x, High: 4x) to preserve visual quality
    • Signed Mapping: Interleaves positive and negative values into unsigned integers (0, -1, 1, -2, 2...) for efficient variable-length coding
  • Entropy Coding: Zero-run Rice coding
    • RLE zero-run cap (maxVal=64) for stability
  • Multi-Resolution: 3-layer structure — Layer0 (1/4) → Layer1 (1/2) → Layer2 (1/1)
  • SIMD Pipeline:
    • DWT: Vertical and horizontal lifting steps are fully vectorized using SIMD instructions (SIMD4/8/16) for LeGall 5/3 transform.
    • Quantization: Block-based quantization and dequantization use AVX/NEON for parallel processing.
  • Memory Management:
    • Unsafe buffer pointers are used throughout the pipeline (ImageReader, DWT, Quantization) to minimize ARC overhead and bounds checking.
    • Object Pooling & Instance Reuse.

Why Swift?

I strongly prefer explicit processing over implicit "magic." I want to see exactly what the code is doing under the hood.

I have built many products in Go. Go is a fantastic language that lets you focus on algorithms, and it is incredibly fast in most scenarios. However, when it comes to writing SIMD instructions, Go essentially relies on C wrappers (cgo) or raw assembly, which feels no different than just writing C. I felt the same way about Rust; maintaining the structural integrity of an algorithm while writing SIMD instructions is incredibly important to me.

Algorithms and data structures are inextricably linked. Because of this, I initially reached for Halide, a tool I have chosen for many image and audio processing tasks in the past. While Halide's performance is excellent, its drawback for me was that the connection between the algorithm and the data structure was not explicit enough.

Next, I tried Zig for its high expressiveness. It was great at first, but I encountered situations where my code would no longer build after a version update. Language stability is critical. Code that breaks simply because the compiler version goes up severely lacks maintainability.

In contrast, Swift allows you to write SIMD code that stays completely true to your data structures. In fact, I originally wrote reference implementations for this project in Go and Halide, but rewriting it in Swift allowed me to establish the best of both worlds: algorithmic explicitness and top-tier performance.

While Swift is a highly modern language, it is entirely possible to write explicit, transparent code as long as you consciously avoid relying on its implicit behaviors. Furthermore, with rigorous tuning, I was able to achieve the same level of execution speed that I previously got from Halide.

Therefore, I chose Swift for this project because it currently offers the highest maintainability while realistically delivering the maximum possible performance.

CLI Usage

encode (save 3 layers)

$ swift run -c release veif-enc -bitrate 100 src.png /output/dir

decode (output 3 layers)

$ swift run -c release veif-dec src.veif /output/dir

other tool

benchmark

$ swift run -c release example -benchmark ./docs/src.png

compare

$ swift run -c release example -compare ./docs/color.png /path/to/output/dir

BUILD & test

build:

$ make build

test:

$ swift test --filter veifTests

for wasm

build:

$ make wasm

test:

$ WASM_BUILD=1 swift test --filter wasmTests

License

MIT

About

A pragmatic, high-performance, multi-resolution image format

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages