🎯 Run PaddleOCR-VL on CPU with OpenAI-compatible API
在 CPU 上运行 PaddleOCR-VL,提供 OpenAI 兼容 API
Break the GPU barrier! This project enables you to run PaddleOCR-VL (PaddlePaddle's excellent OCR model) on CPU with an OpenAI-compatible API interface.
| Feature | Official PaddleOCR | This Project |
|---|---|---|
| CPU Support | ❌ GPU Required | ✅ Full CPU Support |
| API Format | Custom | OpenAI-compatible |
| Framework | PaddlePaddle | Pure PyTorch |
| Deployment | Manual setup | One-line script |
| Service Management | Manual | tmux automation |
- 🖥️ CPU-Friendly - Optimized for CPU inference with automatic memory management
- 🔌 OpenAI-Compatible - Drop-in replacement for OpenAI's vision API
- 🚀 Easy Deployment - One command to install and start
- ⚡ Streaming Support - Real-time response streaming
- 💾 Auto Memory Cleanup - Prevents memory leaks for long-running services
- 🖼️ Multiple Formats - Supports JPEG, PNG, BMP, Base64, and URLs
- Python 3.12.3
- 16GB+ RAM (recommended for CPU inference)
- Ubuntu/Linux (Windows users please use WSL)
# Clone repository
git clone https://github.com/Think-Core/paddleocr-vl-cpu.git
cd paddleocr-vl-cpu
# One-click setup (downloads model automatically ~2GB)
chmod +x quick_init_setup.sh
./quick_init_setup.shThe script will:
- ✅ Create Python virtual environment
- ✅ Install all dependencies
- ✅ Download PaddleOCR-VL-0.9B model (~2GB)
- ✅ Verify installation
# Start server
./quick_start.sh start
# Check status
./quick_start.sh status
# View logs
./quick_start.sh attach # Press Ctrl+B then D to detach
# Stop server
./quick_start.sh stopServer runs on http://localhost:7777
import requests
import base64
# Read and encode image
with open("document.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# OCR request
response = requests.post(
"http://localhost:7777/v1/chat/completions",
json={
"model": "paddleocr-vl",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Extract all text from this image"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_b64}"
}
}
]
}],
"max_tokens": 2048
}
)
print(response.json()["choices"][0]["message"]["content"])import json
response = requests.post(
"http://localhost:7777/v1/chat/completions",
json={
"model": "paddleocr-vl",
"messages": [...], # same as above
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data != '[DONE]':
chunk = json.loads(data)
content = chunk["choices"][0]["delta"].get("content", "")
print(content, end='', flush=True)curl -X POST http://localhost:7777/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "paddleocr-vl",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Read this document"},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,{image_b64}"}
}
]
}]
}'- 📄 Document Digitization - Convert scanned documents to text
- 🧾 Invoice Processing - Extract invoice information automatically
- 📋 Form Recognition - Parse structured forms and tables
- 🔍 ID Card Reading - Extract text from ID cards and certificates
- 📚 Book Scanning - Digitize printed books and articles
- 📝 Handwriting OCR - Recognize handwritten text (limited accuracy)
🔴 This server has NO authentication by default!
Do NOT expose directly to the internet!
For production deployment:
- ✅ Use reverse proxy (Nginx/Caddy) with HTTPS
- ✅ Add API key authentication
- ✅ Configure firewall rules
- ✅ Enable rate limiting
- ✅ Set up monitoring
- ❌ CPU-only (no GPU acceleration in this version)
- ❌ Single request processing (no parallel requests)
- ❌ No built-in authentication
- ❌ Model files (~2GB) not included in repository
# Manual download
mkdir -p models && cd models
git lfs install
git clone https://www.modelscope.cn/PaddlePaddle/PaddleOCR-VL.git PaddleOCR-VL-0.9B- Reduce image size
- Lower
max_tokens - Restart service:
./quick_start.sh restart
# Check port usage
lsof -i :7777
# View logs
./quick_start.sh attachpaddleocr-vl-cpu/
├── paddleocr_api.py # Main API server
├── requirements_frozen.txt # Locked dependencies
├── quick_init_setup.sh # Setup script
├── quick_start.sh # Service management
├── python_version.txt # Python version lock
└── models/ # Model files (gitignored)
- PaddleOCR Team - For the excellent PaddleOCR-VL model
- @hjdhnx - For inspiration from this discussion
- FastAPI - Modern web framework
- Transformers - Hugging Face's ML library
MIT License - see LICENSE file for details.
Note: PaddleOCR-VL model has its own license. Check the model repository for details.
- 🐛 Bug Reports: GitHub Issues
- 💬 Discussions: GitHub Discussions
- ⭐ Star this project if you find it helpful!
打破 GPU 限制!本项目让你能在 CPU 上运行 PaddleOCR-VL(飞桨团队优秀的 OCR 模型),并提供 OpenAI 兼容的 API 接口。
| 特性 | 官方 PaddleOCR | 本项目 |
|---|---|---|
| CPU 支持 | ❌ 需要 GPU | ✅ 完整 CPU 支持 |
| API 格式 | 自定义 | OpenAI 兼容 |
| 框架依赖 | PaddlePaddle | 纯 PyTorch |
| 流式输出 | ❌ | ✅ 实时流式 |
| 部署方式 | 手动配置 | 一行命令 |
| 服务管理 | 手动管理 | tmux 自动化 |
- 🖥️ CPU 友好 - 针对 CPU 推理优化,自动内存管理
- 🔌 OpenAI 兼容 - 可直接替换 OpenAI 视觉 API
- 🚀 快速部署 - 一条命令完成安装和启动
- ⚡ 流式支持 - 实时流式响应
- 💾 自动清理 - 防止长时间运行的内存泄漏
- 🖼️ 多格式支持 - 支持 JPEG、PNG、BMP、Base64、URL
- Python 3.12.3
- 16GB+ 内存(CPU 推理推荐配置)
- Ubuntu/Linux(Windows 用户请使用 WSL)
# 克隆仓库
git clone https://github.com/Think-Core/paddleocr-vl-cpu.git
cd paddleocr-vl-cpu
# 一键安装(自动下载模型 约2GB)
chmod +x quick_init_setup.sh
./quick_init_setup.sh脚本会自动完成:
- ✅ 创建 Python 虚拟环境
- ✅ 安装所有依赖
- ✅ 下载 PaddleOCR-VL-0.9B 模型(约 2GB)
- ✅ 验证安装
# 启动服务器
./quick_start.sh start
# 检查状态
./quick_start.sh status
# 查看日志
./quick_start.sh attach # 按 Ctrl+B 然后 D 退出
# 停止服务
./quick_start.sh stop服务运行在 http://localhost:7777
import requests
import base64
# 读取并编码图片
with open("document.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# OCR 请求
response = requests.post(
"http://localhost:7777/v1/chat/completions",
json={
"model": "paddleocr-vl",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "识别这张图片中的所有文字"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_b64}"
}
}
]
}],
"max_tokens": 2048
}
)
print(response.json()["choices"][0]["message"]["content"])import json
response = requests.post(
"http://localhost:7777/v1/chat/completions",
json={
"model": "paddleocr-vl",
"messages": [...], # 同上
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data != '[DONE]':
chunk = json.loads(data)
content = chunk["choices"][0]["delta"].get("content", "")
print(content, end='', flush=True)curl -X POST http://localhost:7777/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "paddleocr-vl",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "识别文字"},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,{image_b64}"}
}
]
}]
}'- 📄 文档数字化 - 将扫描文档转换为文本
- 🧾 发票处理 - 自动提取发票信息
- 📋 表单识别 - 解析结构化表单和表格
- 🔍 证件识别 - 提取身份证、证书等文字
- 📚 书籍扫描 - 数字化纸质书籍和文章
- 📝 手写识别 - 识别手写文字(准确度有限)
🔴 服务器默认无身份验证!
切勿直接暴露到公网!
生产部署建议:
- ✅ 使用反向代理(Nginx/Caddy)配置 HTTPS
- ✅ 添加 API 密钥验证
- ✅ 配置防火墙规则
- ✅ 启用速率限制
- ✅ 设置监控告警
- ❌ 仅支持 CPU(本版本不含 GPU 加速)
- ❌ 单次请求处理(不支持并发请求)
- ❌ 无内置身份验证
- ❌ 模型文件(约 2GB)不包含在仓库中
# 手动下载
mkdir -p models && cd models
git lfs install
git clone https://www.modelscope.cn/PaddlePaddle/PaddleOCR-VL.git PaddleOCR-VL-0.9B- 压缩图片尺寸
- 降低
max_tokens - 重启服务:
./quick_start.sh restart
# 检查端口占用
lsof -i :7777
# 查看日志
./quick_start.sh attachpaddleocr-vl-cpu/
├── paddleocr_api.py # API 服务器主程序
├── requirements_frozen.txt # 锁定的依赖版本
├── quick_init_setup.sh # 安装脚本
├── quick_start.sh # 服务管理脚本
├── python_version.txt # Python 版本锁定
└── models/ # 模型文件(已忽略)
- PaddleOCR 团队 - 提供出色的 PaddleOCR-VL 模型
- @hjdhnx - 来自 这个讨论 的启发
- FastAPI - 现代化 Web 框架
- Transformers - Hugging Face 的 ML 库
MIT 许可证 - 详见 LICENSE 文件。
注意: PaddleOCR-VL 模型有自己的许可证,请查看 模型仓库 了解详情。
- 🐛 错误报告:GitHub Issues
- 💬 问题讨论:GitHub Discussions
- ⭐ 觉得有用请点星!