docbt

Documentation Build Tool

Generate YAML documentation for dbt models with optional AI assistance. Built with Streamlit for an intuitive and familiar web interface.

📖 Why docbt

docbt (Doc Build Tool) is utility designed to streamline dbt (Data Build Tool) documentation workflows. Connect your data and generate professional YAML documentation ready for your DBT projects. Do this using the assistance provided by the UI and even chat with AI models to 100x your productivity!

👔 Target Audience

Analytics Engineers: streamline your dbt workflow and maintain consistent data modelling.
Data Engineers: ensure data quality across your infrastructure through thorough testing.
Data Managers: automate tedious tasks and help your team focus on delivering value.
AI Enthusiasts: Experiment with local LLMs or cloud providers for automation tasks.

✨ Key Features

🛠️ Non-AI Support: Generate documentation without requiring AI models.
🤖 Multiple LLM Providers: Choose from OpenAI's GPT models, local Ollama, or LM Studio.
💬 Interactive Chat: Ask questions about your data and get specific recommendations.
🔧 Developer Mode: Token metrics, response times, parameters, prompts and debugging information.
⚙️ Advanced Configuration: Fine-tune generation parameters.
🧠 Chain of Thought: View AI reasoning process (when available).
📈 Real-time Metrics: Monitor API usage, token consumption, and performance.
🔌 Multiple Data Sources: Connect to Snowflake, BigQuery, and more for seamless data integration.

⏳ More to come

More Tests Coverage: automation of dbt utils, dbt expectations and dbt-data-reliability packages.
Sources: use docbt to automate source declaration and documentation.
Extra LM providers: use Gemini, Grok, Claude and others to streamline your work.
Extra Data Sources: connect to Databricks, PostgreSQL, Redshift and others.
One-click analytics: gain critical insights into your data to better assign tests.

� Contents

�🚀 Quick Start

Prerequisites

Python 3.10 or higher
uv (recommended). poetry or good old pip for package management
Optional: Ollama, LM Studio, or OpenAI API key for AI assistance
Optional: Docker, Docker Compose for containerized deployment

📦 Installation

We recommend always isolating your code within a virtual environment and installing the package in it to avoid dependency issues.

Using uv

# Create a virtual enfironment
uv venv

# Activate your virtual environment
source .venv/bin/activate

# Install package version of your choice
uv add docbt                    # For base package with no data platform
uv add "docbt[snowflake]"       # For adding Snowflake provider
uv add "docbt[bigquery]"        # For adding BigQuery provider
uv add "docbt[all-providers]"   # For adding all available data providers
uv add "docbt[dev]"             # For development

# (alternatively) use uv pip
uv pip install docbt

# Verify installation
docbt --version

# Run the application
docbt run

Using Poetry

# Initialize or navigate to your project
# If you don't have a pyproject.toml yet
poetry init

# Add docbt to your project
poetry add docbt                    # For base package with no data platform
poetry add "docbt[snowflake]"       # For adding Snowflake provider
poetry add "docbt[bigquery]"        # For adding BigQuery provider
poetry add "docbt[all-providers]"   # For adding all available data providers

# Development dependencies (optional)
poetry add --group dev "docbt[dev]"

# Activate the Poetry shell
poetry shell

# Verify installation
docbt --version

# Run the application
docbt run

Using pip

# Create virtual environments
python -m venv env

# Activate it
source env/bin/activate

# Install package version of your choice
pip install docbt                    # For base package with no data platform
pip install "docbt[snowflake]"       # For adding Snowflake provider
pip install "docbt[bigquery]"        # For adding BigQuery provider
pip install "docbt[all-providers]"   # For adding all available data providers
pip install "docbt[dev]"             # For development

# Verify installation
docbt --version

# Run the application
docbt run

🔧 Building from Source

Building from source gives you access to the latest development features and allows you to contribute to the project. We recommend using uv for faster dependency resolution and installation. This is also what we, the developers, use.

Using uv (Recommended)

# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Create and activate a virtual environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in editable mode with all dependencies
uv pip install -e .                    # Base installation
uv pip install -e ".[snowflake]"       # With Snowflake support
uv pip install -e ".[bigquery]"        # With BigQuery support
uv pip install -e ".[all-providers]"   # With all data providers
uv pip install -e ".[dev]"             # With development tools

# Verify installation
docbt --version

# Run the application
docbt run

Using pip

# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Upgrade pip
pip install --upgrade pip

# Install in editable mode
pip install -e .                    # Base installation
pip install -e ".[snowflake]"       # With Snowflake support
pip install -e ".[bigquery]"        # With BigQuery support
pip install -e ".[all-providers]"   # With all data providers
pip install -e ".[dev]"             # With development tools

# Verify installation
docbt --version

# Run the application
docbt run

Using Poetry

# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Install dependencies
poetry install

# Install with extras
poetry install --extras "snowflake bigquery"

# Activate the virtual environment
poetry shell

# Run the application
docbt run

Using Pipenv

# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Install dependencies
pipenv install --dev

# Activate the virtual environment
pipenv shell

# Install in editable mode
pip install -e .

# Run the application
docbt run

Development Setup

For contributors and developers:

# Clone and navigate to the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Install with development dependencies (using uv)
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Install pre-commit hooks (optional but recommended)
pre-commit install

# Run tests
make test

# Run linting and formatting
make lint
make format

# Check code quality
ruff check .
ruff format .

# Run specific test files
pytest tests/server/test_server.py -v

Verifying Your Installation

After building from source, verify everything works:

# Check version
docbt --version

# View help
docbt help

# Run the server
docbt run

# Run with custom settings
docbt run --port 8080 --log-level DEBUG

Using Make (Recommended for Contributors)

If you're contributing to the project, using Make provides the easiest setup experience with automated tasks.

Prerequisites:

Make (usually pre-installed on Linux/macOS)
Git

# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Create virtual environment (Make will use uv automatically)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install all dependencies with one command
make install

# Create .env file from template (keeps section headers, removes comments)
make env

# Edit .env with your credentials
nano .env  # or your preferred editor

# Install pre-commit hooks (optional but recommended)
make pre-commit

# Verify installation by running tests
make test

# Run the application
docbt run

Common Make commands for development:

make help          # Show all available commands
make install       # Install dependencies
make env           # Create .env from .env.example
make test          # Run tests
make test-cov      # Run tests with coverage report
make lint          # Check code quality
make format        # Auto-format code
make check         # Run format check + lint
make ci            # Run all CI checks (format, lint, test)
make pre-commit    # Install pre-commit hooks

For detailed information on all Make commands, see Make Commands Guide.

Troubleshooting Build Issues

Missing Build Tools:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install python3-dev build-essential

# macOS (requires Homebrew)
brew install python@3.10

# Windows (requires Visual Studio Build Tools)
# Download from: https://visualstudio.microsoft.com/downloads/

Dependency Conflicts:

# Clear pip cache
pip cache purge

# Or with uv
uv cache clean

# Reinstall from scratch
rm -rf .venv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

Permission Issues:

# Don't use sudo with pip/uv in virtual environments
# If you get permission errors, ensure you're in an activated venv
source .venv/bin/activate

🎯 Usage

View live demo app

docbt comes equipped with a command line tool which supports the commands:

--version: prints the version of the package.
help: will print very detailed information about commands and options you can use to run the app.
run: run the Streamlit app with the option to specify host, port, log level.

Data Tab

Provide the app with data to start working with it

Upload: CSV, JSON from your local storage
Data Warehouse: connect to your data platform like Snowflake or BigQuery
Context Integration: Data automatically included in AI conversations
Statistics and EDA: (coming soon)

Node Tab

Here you can set up the configuration for your node

Provide specific config: customize your config with platform-specific properties
Configure node properties: from materialization to meta-tags
Apply node-level data tests: (coming soon)

Columns Tab

Here you can set up the configuration, documentation and tests for your columns

Sidebar and Config Tab

See the end result of your work in real time

Preview Configuration: Interactive visual representation of generated YAML
Real-time Updates: see changes live as you configure your documentation using the UI
AI Suggestions: use LLMs to generate node and column level descriptions, suggest constraints and data tets

AI Tab

Configure your AI provider and settings

Choose Provider: OpenAI, Ollama, or LM Studio
Developer Mode: Enable advanced settings and metrics
System Prompt: Customize AI context and behavior (developer mode)
Generation Parameters: Control temperature, max tokens, top-p, stop sequences, etc.

Chat Tab

Interact with your AI assistant with in-context data sample

Ask questions about DBT best practices or your data in general
Get recommendations for data modeling and data use cases
Just have whatever type of conversation you want with your model
Enable "Chain of Thought" to see AI reasoning

🔧 Configuration Overview

The behavior of the app can be configured through usage of environment variables. You can find an example environment in the repo. Usage of make env (for developers) will also spawn your own .env file to work with. Alternatively, copy the .env.example contents into .env to make use of docbt's python-dotenv feature. Or just export the environment variables/inject them into your environment of choice.

Logging Configuration

Control the verbosity of docbt's logging output to help with debugging or reduce noise in production.

Setting Log Level:

You can configure the logging level in two ways:

CLI Flag (highest priority):

docbt run --log-level DEBUG

Environment Variable (used if no CLI flag provided):

# In .env file
DOCBT_LOG_LEVEL=DEBUG

# Or export directly
export DOCBT_LOG_LEVEL=DEBUG

Available Log Levels:

TRACE - Most verbose, includes all internal details
DEBUG - Detailed debugging information (useful for troubleshooting)
INFO - General informational messages (default)
SUCCESS - Success messages only
WARNING - Warning messages and above
ERROR - Error messages and above
CRITICAL - Only critical errors

Examples:

# Use DEBUG level for troubleshooting
docbt run --log-level DEBUG

# Use environment variable for persistent configuration
echo "DOCBT_LOG_LEVEL=DEBUG" >> .env
docbt run

# Reduce logging noise in production
docbt run --log-level WARNING

Note: The CLI flag always takes precedence over the environment variable. If neither is specified, the default level is INFO.

LLM Providers

# Enable/disable AI usage
DOCBT_USE_AI_DEFAULT=false

# Enable/disable developer more for advanced features
DOCBT_DEVELOPER_MODE_ENABLED=true
DOCBT_SHOW_CHAIN_OF_THOUGHT=true

# You can choose which provider will appear as your default
DOCBT_LLM_PROVIDER_DEFAULT=openai/ollama/lmstudio

OpenAI

We recommend working with gpt-5 series but you can use the Fetch Models button to use whatever OpenAI has to offer.

gpt-5-nano: good for most tasks and very cheap - fails to produce valid structured output with large sample size or too many cols
gpt-5-mini: handles itself better than nano, worse at long context than gpt-5. Good middle-ground.
gpt-5: the best of the gpt-5 series but the most expensive. Use sparingly.

# Set your API key
export DOCBT_OPENAI_API_KEY="sk-..."

# Or add to .env file
DOCBT_OPENAI_API_KEY=sk-...

# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OPENAI=true

Ollama (OSS)

We recomment using models such as:

Qwen3 series especially in the 4B to 14B range

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull qwen3:4b

# Start server (default: http://localhost:11434)
ollama serve

# Set host and port environment variables
DOCBT_OLLAMA_HOST=localhost
DOCBT_OLLAMA_PORT=11434

# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OLLAMA=true

LM Studio (OSS)

Some models we would recommend are:

Qwen3-4b-instruct-2507 or the 8B/14B variant
Qwen3-4b-thinking-2507 or the 8B/14B variant
Qwen3-30B-A3B if your GPU permits

Note: some models are incapable of producing valid structured outputs. For example, oddly enough, gpt-oss cannot. Experiment and find out what works for your usecase and hardware. Increasing context window in LM-Studio can troubleshoot bugs, especially with data that has lots of columns.

Download from lmstudio.ai
Browse models and download the ones you want
Enable "Local Server" (default: http://localhost:1234) from UI

# Set host and port environment variables
DOCBT_LMSTUDIO_HOST=localhost
DOCBT_LMSTUDIO_PORT=1234

# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_LMSTUDIO=true

Advanced Parameters

In Developer Mode, fine-tune AI generation with inference parameters

API Timeout: amount of seconds until API call fails
Max Tokens: Maximum response length (100-4000)
Temperature: Creativity level (0.0-2.0)
- 0.0: Deterministic, focused
- 1.0: Balanced
- 2.0: More creative, random
Top P: Nucleus sampling (0.0-1.0)
Stop Sequences: Custom stop words/phrases

Note: gpt-5 series does not support temperature (always 1), top-p and stop sequences.

🗄️ Data Providers

You can use different connection methods to connect to the following data

Snowflake

Connect to Snowflake by means of with password, SSO, MFA or with RSA key.

# Example: connect with your user and password
DOCBT_SNOWFLAKE_ACCOUNT=your-account-id
DOCBT_SNOWFLAKE_USER=your-username
DOCBT_SNOWFLAKE_PASSWORD=your-password
DOCBT_SNOWFLAKE_WAREHOUSE=your-warehouse
DOCBT_SNOWFLAKE_DATABASE=your-database
DOCBT_SNOWFLAKE_SCHEMA=PUBLIC
DOCBT_SNOWFLAKE_AUTHENTICATOR=snowflake

BigQuery

Currently, the BigQuery connection only works with credentials JSON method:

Install cloud dk
Authenticate with JSON credentials

# Point to your credentials JSON in the environment variables
DOCBT_GOOGLE_APPLICATION_CREDENTIALS=/home/<user>/.config/gcloud/application_default_credentials.json

🐛 Troubleshooting

Common Issues

Streamlit App/General Issues Run docbt with debug log level and inspect the logs. If you find any bugs while doing so, please report them. :)

docbt run --log-level debug

LLM Connection Errors

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Verify LM Studio server
curl http://localhost:1234/v1/models

# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models

Docker Issues

# View container logs
docker-compose logs docbt

# Check if container is running
docker ps

# Restart container
docker-compose restart docbt

See Docker Guide for more Docker-specific troubleshooting.

📝 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by the DBT community
Built with Streamlit
AI via OpenAI, Ollama, and LM Studio
Data via Snowflake, BigQuery

📬 Support

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
📧 Email: predaalin2694@gmail.com

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes and add tests
Run ruff format . and pytest
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

CI/CD: All pull requests are automatically tested with our CI pipeline. See CI/CD Documentation for details.

Development Tools: We use Make for automation. See Make Commands Guide for all available commands.

💰 Sponsoring

If you like what I'm working on and decide to sponsor you can do so via:

Happy documenting! 🎉 Generate better DBT documentation with AI assistance.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
docs		docs
src/docbt		src/docbt
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Uh oh!

License

aleenprd/docbt

Folders and files

Latest commit

History

Repository files navigation

docbt

Documentation Build Tool

📖 Why docbt

👔 Target Audience

✨ Key Features

⏳ More to come

� Contents

�🚀 Quick Start

Prerequisites

📦 Installation

Using uv

🔧 Building from Source

🎯 Usage

Data Tab

Node Tab

Columns Tab

Sidebar and Config Tab

AI Tab

Chat Tab

🔧 Configuration Overview

Logging Configuration

LLM Providers

OpenAI

Ollama (OSS)

LM Studio (OSS)

Advanced Parameters

🗄️ Data Providers

Snowflake

BigQuery

🐛 Troubleshooting

Common Issues

📝 License

🙏 Acknowledgments

📬 Support

🤝 Contributing

💰 Sponsoring

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0

Contributors 4

Uh oh!

Languages

Packages