Skip to content
/ docbt Public

Documentation Build Tool - Generate YAML documentation for dbt models with optional AI assistance. Built with Streamlit for an intuitive and familiar web interface.

License

Notifications You must be signed in to change notification settings

aleenprd/docbt

docbt

Documentation Build Tool

CI codecov PyPI version Python 3.10+ Docker License dbt Snowflake Google Cloud ChatGPT Dependabot GitHub Copilot

Generate YAML documentation for dbt models with optional AI assistance. Built with Streamlit for an intuitive and familiar web interface.

πŸ“– Why docbt

docbt (Doc Build Tool) is utility designed to streamline dbt (Data Build Tool) documentation workflows. Connect your data and generate professional YAML documentation ready for your DBT projects. Do this using the assistance provided by the UI and even chat with AI models to 100x your productivity!

πŸ‘” Target Audience

  • Analytics Engineers: streamline your dbt workflow and maintain consistent data modelling.
  • Data Engineers: ensure data quality across your infrastructure through thorough testing.
  • Data Managers: automate tedious tasks and help your team focus on delivering value.
  • AI Enthusiasts: Experiment with local LLMs or cloud providers for automation tasks.

✨ Key Features

  • πŸ› οΈ Non-AI Support: Generate documentation without requiring AI models.
  • πŸ€– Multiple LLM Providers: Choose from OpenAI's GPT models, local Ollama, or LM Studio.
  • πŸ’¬ Interactive Chat: Ask questions about your data and get specific recommendations.
  • πŸ”§ Developer Mode: Token metrics, response times, parameters, prompts and debugging information.
  • βš™οΈ Advanced Configuration: Fine-tune generation parameters.
  • 🧠 Chain of Thought: View AI reasoning process (when available).
  • πŸ“ˆ Real-time Metrics: Monitor API usage, token consumption, and performance.
  • πŸ”Œ Multiple Data Sources: Connect to Snowflake, BigQuery, and more for seamless data integration.

⏳ More to come

οΏ½ Contents

οΏ½πŸš€ Quick Start

Prerequisites

πŸ“¦ Installation

We recommend always isolating your code within a virtual environment and installing the package in it to avoid dependency issues.

Using uv

# Create a virtual enfironment
uv venv

# Activate your virtual environment
source .venv/bin/activate

# Install package version of your choice
uv add docbt                    # For base package with no data platform
uv add "docbt[snowflake]"       # For adding Snowflake provider
uv add "docbt[bigquery]"        # For adding BigQuery provider
uv add "docbt[all-providers]"   # For adding all available data providers
uv add "docbt[dev]"             # For development

# (alternatively) use uv pip
uv pip install docbt

# Verify installation
docbt --version

# Run the application
docbt run
Using Poetry
# Initialize or navigate to your project
# If you don't have a pyproject.toml yet
poetry init

# Add docbt to your project
poetry add docbt                    # For base package with no data platform
poetry add "docbt[snowflake]"       # For adding Snowflake provider
poetry add "docbt[bigquery]"        # For adding BigQuery provider
poetry add "docbt[all-providers]"   # For adding all available data providers

# Development dependencies (optional)
poetry add --group dev "docbt[dev]"

# Activate the Poetry shell
poetry shell

# Verify installation
docbt --version

# Run the application
docbt run
Using pip
# Create virtual environments
python -m venv env

# Activate it
source env/bin/activate

# Install package version of your choice
pip install docbt                    # For base package with no data platform
pip install "docbt[snowflake]"       # For adding Snowflake provider
pip install "docbt[bigquery]"        # For adding BigQuery provider
pip install "docbt[all-providers]"   # For adding all available data providers
pip install "docbt[dev]"             # For development

# Verify installation
docbt --version

# Run the application
docbt run

πŸ”§ Building from Source

Building from source gives you access to the latest development features and allows you to contribute to the project. We recommend using uv for faster dependency resolution and installation. This is also what we, the developers, use.

Using uv (Recommended)
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Create and activate a virtual environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in editable mode with all dependencies
uv pip install -e .                    # Base installation
uv pip install -e ".[snowflake]"       # With Snowflake support
uv pip install -e ".[bigquery]"        # With BigQuery support
uv pip install -e ".[all-providers]"   # With all data providers
uv pip install -e ".[dev]"             # With development tools

# Verify installation
docbt --version

# Run the application
docbt run
Using pip
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Upgrade pip
pip install --upgrade pip

# Install in editable mode
pip install -e .                    # Base installation
pip install -e ".[snowflake]"       # With Snowflake support
pip install -e ".[bigquery]"        # With BigQuery support
pip install -e ".[all-providers]"   # With all data providers
pip install -e ".[dev]"             # With development tools

# Verify installation
docbt --version

# Run the application
docbt run
Using Poetry
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Install dependencies
poetry install

# Install with extras
poetry install --extras "snowflake bigquery"

# Activate the virtual environment
poetry shell

# Run the application
docbt run
Using Pipenv
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Install dependencies
pipenv install --dev

# Activate the virtual environment
pipenv shell

# Install in editable mode
pip install -e .

# Run the application
docbt run
Development Setup

For contributors and developers:

# Clone and navigate to the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Install with development dependencies (using uv)
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Install pre-commit hooks (optional but recommended)
pre-commit install

# Run tests
make test

# Run linting and formatting
make lint
make format

# Check code quality
ruff check .
ruff format .

# Run specific test files
pytest tests/server/test_server.py -v
Verifying Your Installation

After building from source, verify everything works:

# Check version
docbt --version

# View help
docbt help

# Run the server
docbt run

# Run with custom settings
docbt run --port 8080 --log-level DEBUG
Using Make (Recommended for Contributors)

If you're contributing to the project, using Make provides the easiest setup experience with automated tasks.

Prerequisites:

  • Make (usually pre-installed on Linux/macOS)
  • Git
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt

# Create virtual environment (Make will use uv automatically)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install all dependencies with one command
make install

# Create .env file from template (keeps section headers, removes comments)
make env

# Edit .env with your credentials
nano .env  # or your preferred editor

# Install pre-commit hooks (optional but recommended)
make pre-commit

# Verify installation by running tests
make test

# Run the application
docbt run

Common Make commands for development:

make help          # Show all available commands
make install       # Install dependencies
make env           # Create .env from .env.example
make test          # Run tests
make test-cov      # Run tests with coverage report
make lint          # Check code quality
make format        # Auto-format code
make check         # Run format check + lint
make ci            # Run all CI checks (format, lint, test)
make pre-commit    # Install pre-commit hooks

For detailed information on all Make commands, see Make Commands Guide.

Troubleshooting Build Issues

Missing Build Tools:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install python3-dev build-essential

# macOS (requires Homebrew)
brew install python@3.10

# Windows (requires Visual Studio Build Tools)
# Download from: https://visualstudio.microsoft.com/downloads/

Dependency Conflicts:

# Clear pip cache
pip cache purge

# Or with uv
uv cache clean

# Reinstall from scratch
rm -rf .venv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

Permission Issues:

# Don't use sudo with pip/uv in virtual environments
# If you get permission errors, ensure you're in an activated venv
source .venv/bin/activate

🎯 Usage

View live demo app

GIF Demo

docbt comes equipped with a command line tool which supports the commands:

  • --version: prints the version of the package.
  • help: will print very detailed information about commands and options you can use to run the app.
  • run: run the Streamlit app with the option to specify host, port, log level.

Data Tab

Provide the app with data to start working with it

  • Upload: CSV, JSON from your local storage
  • Data Warehouse: connect to your data platform like Snowflake or BigQuery
  • Context Integration: Data automatically included in AI conversations
  • Statistics and EDA: (coming soon)

Data Tab

Node Tab

Here you can set up the configuration for your node

  • Provide specific config: customize your config with platform-specific properties
  • Configure node properties: from materialization to meta-tags
  • Apply node-level data tests: (coming soon)

Node Tab

Columns Tab

Here you can set up the configuration, documentation and tests for your columns

Columns Tab

Sidebar and Config Tab

See the end result of your work in real time

  • Preview Configuration: Interactive visual representation of generated YAML
  • Real-time Updates: see changes live as you configure your documentation using the UI
  • AI Suggestions: use LLMs to generate node and column level descriptions, suggest constraints and data tets

Sidebar AI Suggestion

AI Tab

Configure your AI provider and settings

  • Choose Provider: OpenAI, Ollama, or LM Studio
  • Developer Mode: Enable advanced settings and metrics
  • System Prompt: Customize AI context and behavior (developer mode)
  • Generation Parameters: Control temperature, max tokens, top-p, stop sequences, etc.

AI Tab

Chat Tab

Interact with your AI assistant with in-context data sample

  • Ask questions about DBT best practices or your data in general
  • Get recommendations for data modeling and data use cases
  • Just have whatever type of conversation you want with your model
  • Enable "Chain of Thought" to see AI reasoning

Chat Tab

πŸ”§ Configuration Overview

The behavior of the app can be configured through usage of environment variables. You can find an example environment in the repo. Usage of make env (for developers) will also spawn your own .env file to work with. Alternatively, copy the .env.example contents into .env to make use of docbt's python-dotenv feature. Or just export the environment variables/inject them into your environment of choice.

Logging Configuration

Control the verbosity of docbt's logging output to help with debugging or reduce noise in production.

Setting Log Level:

You can configure the logging level in two ways:

  1. CLI Flag (highest priority):
docbt run --log-level DEBUG
  1. Environment Variable (used if no CLI flag provided):
# In .env file
DOCBT_LOG_LEVEL=DEBUG

# Or export directly
export DOCBT_LOG_LEVEL=DEBUG

Available Log Levels:

  • TRACE - Most verbose, includes all internal details
  • DEBUG - Detailed debugging information (useful for troubleshooting)
  • INFO - General informational messages (default)
  • SUCCESS - Success messages only
  • WARNING - Warning messages and above
  • ERROR - Error messages and above
  • CRITICAL - Only critical errors

Examples:

# Use DEBUG level for troubleshooting
docbt run --log-level DEBUG

# Use environment variable for persistent configuration
echo "DOCBT_LOG_LEVEL=DEBUG" >> .env
docbt run

# Reduce logging noise in production
docbt run --log-level WARNING

Note: The CLI flag always takes precedence over the environment variable. If neither is specified, the default level is INFO.

LLM Providers

# Enable/disable AI usage
DOCBT_USE_AI_DEFAULT=false

# Enable/disable developer more for advanced features
DOCBT_DEVELOPER_MODE_ENABLED=true
DOCBT_SHOW_CHAIN_OF_THOUGHT=true

# You can choose which provider will appear as your default
DOCBT_LLM_PROVIDER_DEFAULT=openai/ollama/lmstudio

OpenAI

We recommend working with gpt-5 series but you can use the Fetch Models button to use whatever OpenAI has to offer.

  • gpt-5-nano: good for most tasks and very cheap - fails to produce valid structured output with large sample size or too many cols
  • gpt-5-mini: handles itself better than nano, worse at long context than gpt-5. Good middle-ground.
  • gpt-5: the best of the gpt-5 series but the most expensive. Use sparingly.
# Set your API key
export DOCBT_OPENAI_API_KEY="sk-..."

# Or add to .env file
DOCBT_OPENAI_API_KEY=sk-...

# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OPENAI=true

Ollama (OSS)

We recomment using models such as:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull qwen3:4b

# Start server (default: http://localhost:11434)
ollama serve

# Set host and port environment variables
DOCBT_OLLAMA_HOST=localhost
DOCBT_OLLAMA_PORT=11434

# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OLLAMA=true

LM Studio (OSS)

Some models we would recommend are:

Note: some models are incapable of producing valid structured outputs. For example, oddly enough, gpt-oss cannot. Experiment and find out what works for your usecase and hardware. Increasing context window in LM-Studio can troubleshoot bugs, especially with data that has lots of columns.

  1. Download from lmstudio.ai
  2. Browse models and download the ones you want
  3. Enable "Local Server" (default: http://localhost:1234) from UI
# Set host and port environment variables
DOCBT_LMSTUDIO_HOST=localhost
DOCBT_LMSTUDIO_PORT=1234

# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_LMSTUDIO=true

Advanced Parameters

In Developer Mode, fine-tune AI generation with inference parameters

  • API Timeout: amount of seconds until API call fails
  • Max Tokens: Maximum response length (100-4000)
  • Temperature: Creativity level (0.0-2.0)
    • 0.0: Deterministic, focused
    • 1.0: Balanced
    • 2.0: More creative, random
  • Top P: Nucleus sampling (0.0-1.0)
  • Stop Sequences: Custom stop words/phrases

Note: gpt-5 series does not support temperature (always 1), top-p and stop sequences.

πŸ—„οΈ Data Providers

You can use different connection methods to connect to the following data

Snowflake

Connect to Snowflake by means of with password, SSO, MFA or with RSA key.

# Example: connect with your user and password
DOCBT_SNOWFLAKE_ACCOUNT=your-account-id
DOCBT_SNOWFLAKE_USER=your-username
DOCBT_SNOWFLAKE_PASSWORD=your-password
DOCBT_SNOWFLAKE_WAREHOUSE=your-warehouse
DOCBT_SNOWFLAKE_DATABASE=your-database
DOCBT_SNOWFLAKE_SCHEMA=PUBLIC
DOCBT_SNOWFLAKE_AUTHENTICATOR=snowflake

BigQuery

Currently, the BigQuery connection only works with credentials JSON method:

# Point to your credentials JSON in the environment variables
DOCBT_GOOGLE_APPLICATION_CREDENTIALS=/home/<user>/.config/gcloud/application_default_credentials.json

πŸ› Troubleshooting

Common Issues

Streamlit App/General Issues Run docbt with debug log level and inspect the logs. If you find any bugs while doing so, please report them. :)

docbt run --log-level debug

LLM Connection Errors

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Verify LM Studio server
curl http://localhost:1234/v1/models

# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models

Docker Issues

# View container logs
docker-compose logs docbt

# Check if container is running
docker ps

# Restart container
docker-compose restart docbt

See Docker Guide for more Docker-specific troubleshooting.

πŸ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“¬ Support

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Run ruff format . and pytest
  5. Commit your changes (git commit -m 'feat: add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

CI/CD: All pull requests are automatically tested with our CI pipeline. See CI/CD Documentation for details.

Development Tools: We use Make for automation. See Make Commands Guide for all available commands.

πŸ’° Sponsoring

If you like what I'm working on and decide to sponsor you can do so via:


Happy documenting! πŸŽ‰ Generate better DBT documentation with AI assistance.

About

Documentation Build Tool - Generate YAML documentation for dbt models with optional AI assistance. Built with Streamlit for an intuitive and familiar web interface.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •