GitHub - pie-project/pie: Pie: Programmable LLM Serving

Pie: Programmable serving system for emerging LLM applications

Pie is a high-performance, programmable LLM serving system that empowers you to design and deploy custom inference logic and optimization strategies.

Note 🧪

This software is in a pre-release stage and under active development. It's recommended for testing and research purposes only.

Getting Started

Docker Installation

Run Pie server in Docker and connect to it using the pie-cli client.

Prerequisites:

NVIDIA GPU with Docker and NVIDIA Container Toolkit
SSH key pair (generate with ssh-keygen -t ed25519 if needed)
pie-cli binary (download from GitHub releases)

Step 1: Start Pie Server

docker run --rm --gpus all -p 8080:8080 \
  --name pie-server \
  -e PIE_AUTH_USER="$(whoami)" \
  -e PIE_AUTH_KEY="$(cat ~/.ssh/id_ed25519.pub)" \
  -v ~/.cache:/root/.cache \
  pieproject/pie:latest

The server will start with the name pie-server and authenticate using your SSH public key. Models are cached in ~/.cache/ for persistence.

Step 2: Configure and Test Connection

In a new terminal:

# Configure pie-cli (uses localhost:8080 by default)
pie-cli config init --enable-auth true

# Test connection
pie-cli ping

Step 3: Run Text Completion

Copy the example inferlet and run it:

# Copy inferlet from container (one-time)
docker cp pie-server:/workspace/example-apps/text_completion.wasm ./

# Submit for execution
pie-cli submit ./text_completion.wasm -- --prompt "What is the capital of France?"

Note: The first run may take a few minutes for model download and kernel compilation. Subsequent runs are much faster thanks to caching.

Manual Installation

Prerequisites

Configure a Backend: Navigate to a backend directory and follow its README.md for setup:
- Python Backend
Add Wasm Target: Install the WebAssembly target for Rust:
```
rustup target add wasm32-wasip2
```
This is required to compile Rust-based inferlets in the example-apps directory.

Step 1: Build

Build the CLIs and the example inferlets.

Build the engine pie and the client CLI pie-cli:

From the repository root, run
```
cd pie && cargo install --path .
```
Also, from the repository root, run
```
cd client/cli && cargo install --path .
```

Build the Examples:

From the repository root, run

cd example-apps && cargo build --target wasm32-wasip2 --release

Step 2: Configure Engine and Backend

Create default configuration file:

Substitute $REPO to the actual repository root and run
```
pie config init python $REPO/backend/backend-python/server.py
```
Download the model:

The default config file specifies the expected model. Run the following command to download it.
```
pie model add qwen-3-0.6b
```

Test the engine:

Run an inferlet directly with the engine. Due to JIT compilation of FlashInfer kernels, the first run will have very long latency.

pie run \
    $REPO/example-apps/target/wasm32-wasip2/release/text_completion.wasm \
    -- \
    --prompt "Where is the capital of France?"

Step 3: Run an Inferlet from a User Client

Create User Public Key:

If you don't already have a key pair in ~/.ssh, generate one with the following command. By default, the private key will be generated in ~/.ssh/id_ed25519 and the public key in ~/.ssh/id_ed25519.pub. Please make sure the passphrase is empty.
```
ssh-keygen
```
In addition to ED25519, you can also use RSA or ECDSA keys.
Create default user client configuration file:

The following command creates a default user client configuration file using the current Unix username and the private key in ~/.ssh.
```
pie-cli config init
```
Register the user on the engine:

Run the following command to register the current user on the engine. my-first-key is the name of the key and can be any string. cat reads the public key from ~/.ssh/id_ed25519.pub and pipes it to pie auth add.
```
cat ~/.ssh/id_ed25519.pub | pie auth add $(whoami) my-first-key
```
Start the Engine:

Launch the Pie engine with the default configuration.
```
pie serve
```

Run an Inferlet:

From another terminal window, run

pie-cli submit \
    $REPO/example-apps/target/wasm32-wasip2/release/text_completion.wasm \
    -- \
    --prompt "Where is the capital of France?"

Name		Name	Last commit message	Last commit date
Latest commit History 1,138 Commits
.githooks		.githooks
.github/workflows		.github/workflows
backend		backend
client		client
example-apps		example-apps
inferlet-api		inferlet-api
inferlet-js		inferlet-js
inferlet-macros		inferlet-macros
inferlet		inferlet
pie		pie
scripts		scripts
test-apps		test-apps
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting Started

Docker Installation

Manual Installation

Prerequisites

Step 1: Build

Step 2: Configure Engine and Backend

Step 3: Run an Inferlet from a User Client

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

Languages

License

pie-project/pie

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Docker Installation

Manual Installation

Prerequisites

Step 1: Build

Step 2: Configure Engine and Backend

Step 3: Run an Inferlet from a User Client

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Packages