Chunkr is a self-hostable API for converting pdf, pptx, docx, and excel files into RAG/LLM ready data
11 semantic tags for layout analysis | OCR + Bounding Boxes | Structured HTML and markdown
Try it out!
·
Report Bug
·
Contact
- Go to chunkr.ai
- Make an account and copy your API key
- Create a task:
curl -X POST https://api.chunkr.ai/api/v1/task \ -H "Content-Type: multipart/form-data" \ -H "Authorization: ${YOUR_API_KEY}" \ -F "file=@/path/to/your/file" \ -F "model=HighQuality" \ -F "target_chunk_length=512" \ -F "ocr_strategy=Auto"
- Poll your created task:
curl -X GET https://api.chunkr.ai/api/v1/task/${TASK_ID} \ -H "Authorization: ${YOUR_API_KEY}"
-
Prerequisites:
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support, optional)
-
Clone the repo:
git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr- Set up environment variables:
# Copy the example environment file
cp .env.example .env
# Configure your environment variables
# Required: LLM_KEY as your OpenAI API key- Start the services:
With GPU (recommended):
docker compose up -dOnly CPU:
docker compose -f compose-cpu.yaml up -d- Access the services:
- Web UI:
http://localhost:5173 - API:
http://localhost:8000
- Web UI:
Note: The default configuration (
docker compose up -d) requires an NVIDIA CUDA GPU. For systems without a GPU, use the CPU deployment option.
- Stop the services when done:
docker compose downFor production environments, we provide a Helm chart and detailed deployment instructions:
- See our detailed guide at
kube/README.md - Includes configurations for high availability and scaling
For enterprise support and deployment assistance, contact us.
This project is dual-licensed:
- GNU Affero General Public License v3.0 (AGPL-3.0)
- Commercial License
To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.
- 📧 Email: mehul@lumina.sh
- 📅 Schedule a call: Book a 30-minute meeting
- 🌐 Visit our website: chunkr.ai