Pie is a high-performance, programmable LLM serving system that empowers you to design and deploy custom inference logic and optimization strategies.
Note 🧪
This software is in a pre-release stage and under active development. It's recommended for testing and research purposes only.
Run Pie server in Docker and connect to it using the pie-cli client.
Prerequisites:
- NVIDIA GPU with Docker and NVIDIA Container Toolkit
- SSH key pair (generate with
ssh-keygen -t ed25519if needed) pie-clibinary (download from GitHub releases)
Step 1: Start Pie Server
docker run --rm --gpus all -p 8080:8080 \
--name pie-server \
-e PIE_AUTH_USER="$(whoami)" \
-e PIE_AUTH_KEY="$(cat ~/.ssh/id_ed25519.pub)" \
-v ~/.cache:/root/.cache \
pieproject/pie:latestThe server will start with the name pie-server and authenticate using your SSH public key. Models are cached in ~/.cache/ for persistence.
Step 2: Configure and Test Connection
In a new terminal:
# Configure pie-cli (uses localhost:8080 by default)
pie-cli config init --enable-auth true
# Test connection
pie-cli pingStep 3: Run Text Completion
Copy the example inferlet and run it:
# Copy inferlet from container (one-time)
docker cp pie-server:/workspace/example-apps/text_completion.wasm ./
# Submit for execution
pie-cli submit ./text_completion.wasm -- --prompt "What is the capital of France?"Note: The first run may take a few minutes for model download and kernel compilation. Subsequent runs are much faster thanks to caching.
-
Configure a Backend: Navigate to a backend directory and follow its
README.mdfor setup: -
Add Wasm Target: Install the WebAssembly target for Rust:
rustup target add wasm32-wasip2
This is required to compile Rust-based inferlets in the
example-appsdirectory.
Build the CLIs and the example inferlets.
-
Build the engine
pieand the client CLIpie-cli:From the repository root, run
cd pie && cargo install --path .
Also, from the repository root, run
cd client/cli && cargo install --path .
-
Build the Examples:
From the repository root, run
cd example-apps && cargo build --target wasm32-wasip2 --release
-
Create default configuration file:
Substitute
$REPOto the actual repository root and runpie config init python $REPO/backend/backend-python/server.py -
Download the model:
The default config file specifies the expected model. Run the following command to download it.
pie model add qwen-3-0.6b
-
Test the engine:
Run an inferlet directly with the engine. Due to JIT compilation of FlashInfer kernels, the first run will have very long latency.
pie run \ $REPO/example-apps/target/wasm32-wasip2/release/text_completion.wasm \ -- \ --prompt "Where is the capital of France?"
-
Create User Public Key:
If you don't already have a key pair in
~/.ssh, generate one with the following command. By default, the private key will be generated in~/.ssh/id_ed25519and the public key in~/.ssh/id_ed25519.pub. Please make sure the passphrase is empty.ssh-keygen
In addition to ED25519, you can also use RSA or ECDSA keys.
-
Create default user client configuration file:
The following command creates a default user client configuration file using the current Unix username and the private key in
~/.ssh.pie-cli config init
-
Register the user on the engine:
Run the following command to register the current user on the engine.
my-first-keyis the name of the key and can be any string.catreads the public key from~/.ssh/id_ed25519.puband pipes it topie auth add.cat ~/.ssh/id_ed25519.pub | pie auth add $(whoami) my-first-key
-
Start the Engine:
Launch the Pie engine with the default configuration.
pie serve
-
Run an Inferlet:
From another terminal window, run
pie-cli submit \ $REPO/example-apps/target/wasm32-wasip2/release/text_completion.wasm \ -- \ --prompt "Where is the capital of France?"