A SwiftUI application that creates an OpenAI-compatible API server using Apple's on-device Foundation Models. This allows you to use Apple Intelligence models locally through familiar OpenAI API endpoints.
- OpenAI Compatible API: Drop-in replacement for OpenAI API with chat completions endpoint
- Streaming Support: Real-time streaming responses compatible with OpenAI's streaming format
- On-Device Processing: Uses Apple's Foundation Models for completely local AI processing
- Model Availability Check: Automatically checks Apple Intelligence availability on startup
- π§ Tool Using (WIP): Function calling capabilities for extended AI functionality
- macOS: 26 beta 2 or greater
- Apple Intelligence: Must be enabled in Settings > Apple Intelligence & Siri
- Xcode: 26 beta 2 or greater (MUST match OS version for building)
This project is implemented as a GUI application rather than a command-line tool due to Apple's rate limiting policies for Foundation Models:
"An app that has UI and runs in the foreground doesn't have a rate limit when using the models; a macOS command line tool, which doesn't have UI, does."
β Apple DTS Engineer (Source)
- Launch the app
- Configure server settings (default:
127.0.0.1:11535) - Click "Start Server"
- Server will be available at the configured address
Once the server is running, you can access these OpenAI-compatible endpoints:
GET /health- Health checkGET /status- Model availability and statusGET /v1/models- List available modelsPOST /v1/chat/completions- Chat completions (streaming and non-streaming)
curl -X POST http://127.0.0.1:11535/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "apple-on-device",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"stream": false
}'from openai import OpenAI
# Point to your local server
client = OpenAI(
base_url="http://127.0.0.1:11535/v1",
api_key="not-needed" # API key not required for local server
)
response = client.chat.completions.create(
model="apple-on-device",
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
temperature=0.7,
stream=True # Enable streaming
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")This project is licensed under the MIT License - see the LICENSE file for details.