Generate YAML documentation for dbt models with optional AI assistance. Built with Streamlit for an intuitive and familiar web interface.
docbt (Doc Build Tool) is utility designed to streamline dbt (Data Build Tool) documentation workflows. Connect your data and generate professional YAML documentation ready for your DBT projects. Do this using the assistance provided by the UI and even chat with AI models to 100x your productivity!
- Analytics Engineers: streamline your dbt workflow and maintain consistent data modelling.
- Data Engineers: ensure data quality across your infrastructure through thorough testing.
- Data Managers: automate tedious tasks and help your team focus on delivering value.
- AI Enthusiasts: Experiment with local LLMs or cloud providers for automation tasks.
- π οΈ Non-AI Support: Generate documentation without requiring AI models.
- π€ Multiple LLM Providers: Choose from OpenAI's GPT models, local Ollama, or LM Studio.
- π¬ Interactive Chat: Ask questions about your data and get specific recommendations.
- π§ Developer Mode: Token metrics, response times, parameters, prompts and debugging information.
- βοΈ Advanced Configuration: Fine-tune generation parameters.
- π§ Chain of Thought: View AI reasoning process (when available).
- π Real-time Metrics: Monitor API usage, token consumption, and performance.
- π Multiple Data Sources: Connect to Snowflake, BigQuery, and more for seamless data integration.
- More Tests Coverage: automation of dbt utils, dbt expectations and dbt-data-reliability packages.
- Sources: use docbt to automate source declaration and documentation.
- Extra LM providers: use Gemini, Grok, Claude and others to streamline your work.
- Extra Data Sources: connect to Databricks, PostgreSQL, Redshift and others.
- One-click analytics: gain critical insights into your data to better assign tests.
- π Why docbt
- π Quick Start
- π― Usage
- π§ Configuration Overview
- π Troubleshooting
- π License
- π Acknowledgments
- π¬ Support
- οΏ½οΏ½οΏ½ Contributing
- π° Sponsoring
- Python 3.10 or higher
- uv (recommended). poetry or good old pip for package management
- Optional: Ollama, LM Studio, or OpenAI API key for AI assistance
- Optional: Docker, Docker Compose for containerized deployment
We recommend always isolating your code within a virtual environment and installing the package in it to avoid dependency issues.
# Create a virtual enfironment
uv venv
# Activate your virtual environment
source .venv/bin/activate
# Install package version of your choice
uv add docbt # For base package with no data platform
uv add "docbt[snowflake]" # For adding Snowflake provider
uv add "docbt[bigquery]" # For adding BigQuery provider
uv add "docbt[all-providers]" # For adding all available data providers
uv add "docbt[dev]" # For development
# (alternatively) use uv pip
uv pip install docbt
# Verify installation
docbt --version
# Run the application
docbt runUsing Poetry
# Initialize or navigate to your project
# If you don't have a pyproject.toml yet
poetry init
# Add docbt to your project
poetry add docbt # For base package with no data platform
poetry add "docbt[snowflake]" # For adding Snowflake provider
poetry add "docbt[bigquery]" # For adding BigQuery provider
poetry add "docbt[all-providers]" # For adding all available data providers
# Development dependencies (optional)
poetry add --group dev "docbt[dev]"
# Activate the Poetry shell
poetry shell
# Verify installation
docbt --version
# Run the application
docbt runUsing pip
# Create virtual environments
python -m venv env
# Activate it
source env/bin/activate
# Install package version of your choice
pip install docbt # For base package with no data platform
pip install "docbt[snowflake]" # For adding Snowflake provider
pip install "docbt[bigquery]" # For adding BigQuery provider
pip install "docbt[all-providers]" # For adding all available data providers
pip install "docbt[dev]" # For development
# Verify installation
docbt --version
# Run the application
docbt runBuilding from source gives you access to the latest development features and allows you to contribute to the project. We recommend using uv for faster dependency resolution and installation. This is also what we, the developers, use.
Using uv (Recommended)
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Create and activate a virtual environment
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in editable mode with all dependencies
uv pip install -e . # Base installation
uv pip install -e ".[snowflake]" # With Snowflake support
uv pip install -e ".[bigquery]" # With BigQuery support
uv pip install -e ".[all-providers]" # With all data providers
uv pip install -e ".[dev]" # With development tools
# Verify installation
docbt --version
# Run the application
docbt runUsing pip
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Upgrade pip
pip install --upgrade pip
# Install in editable mode
pip install -e . # Base installation
pip install -e ".[snowflake]" # With Snowflake support
pip install -e ".[bigquery]" # With BigQuery support
pip install -e ".[all-providers]" # With all data providers
pip install -e ".[dev]" # With development tools
# Verify installation
docbt --version
# Run the application
docbt runUsing Poetry
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Install dependencies
poetry install
# Install with extras
poetry install --extras "snowflake bigquery"
# Activate the virtual environment
poetry shell
# Run the application
docbt runUsing Pipenv
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Install dependencies
pipenv install --dev
# Activate the virtual environment
pipenv shell
# Install in editable mode
pip install -e .
# Run the application
docbt runDevelopment Setup
For contributors and developers:
# Clone and navigate to the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Install with development dependencies (using uv)
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
# Install pre-commit hooks (optional but recommended)
pre-commit install
# Run tests
make test
# Run linting and formatting
make lint
make format
# Check code quality
ruff check .
ruff format .
# Run specific test files
pytest tests/server/test_server.py -vVerifying Your Installation
After building from source, verify everything works:
# Check version
docbt --version
# View help
docbt help
# Run the server
docbt run
# Run with custom settings
docbt run --port 8080 --log-level DEBUGUsing Make (Recommended for Contributors)
If you're contributing to the project, using Make provides the easiest setup experience with automated tasks.
Prerequisites:
- Make (usually pre-installed on Linux/macOS)
- Git
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Create virtual environment (Make will use uv automatically)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install all dependencies with one command
make install
# Create .env file from template (keeps section headers, removes comments)
make env
# Edit .env with your credentials
nano .env # or your preferred editor
# Install pre-commit hooks (optional but recommended)
make pre-commit
# Verify installation by running tests
make test
# Run the application
docbt runCommon Make commands for development:
make help # Show all available commands
make install # Install dependencies
make env # Create .env from .env.example
make test # Run tests
make test-cov # Run tests with coverage report
make lint # Check code quality
make format # Auto-format code
make check # Run format check + lint
make ci # Run all CI checks (format, lint, test)
make pre-commit # Install pre-commit hooksFor detailed information on all Make commands, see Make Commands Guide.
Troubleshooting Build Issues
Missing Build Tools:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install python3-dev build-essential
# macOS (requires Homebrew)
brew install python@3.10
# Windows (requires Visual Studio Build Tools)
# Download from: https://visualstudio.microsoft.com/downloads/Dependency Conflicts:
# Clear pip cache
pip cache purge
# Or with uv
uv cache clean
# Reinstall from scratch
rm -rf .venv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"Permission Issues:
# Don't use sudo with pip/uv in virtual environments
# If you get permission errors, ensure you're in an activated venv
source .venv/bin/activatedocbt comes equipped with a command line tool which supports the commands:
- --version: prints the version of the package.
- help: will print very detailed information about commands and options you can use to run the app.
- run: run the Streamlit app with the option to specify host, port, log level.
Provide the app with data to start working with it
- Upload: CSV, JSON from your local storage
- Data Warehouse: connect to your data platform like Snowflake or BigQuery
- Context Integration: Data automatically included in AI conversations
- Statistics and EDA: (coming soon)
Here you can set up the configuration for your node
- Provide specific config: customize your config with platform-specific properties
- Configure node properties: from materialization to meta-tags
- Apply node-level data tests: (coming soon)
Here you can set up the configuration, documentation and tests for your columns
See the end result of your work in real time
- Preview Configuration: Interactive visual representation of generated YAML
- Real-time Updates: see changes live as you configure your documentation using the UI
- AI Suggestions: use LLMs to generate node and column level descriptions, suggest constraints and data tets
Configure your AI provider and settings
- Choose Provider: OpenAI, Ollama, or LM Studio
- Developer Mode: Enable advanced settings and metrics
- System Prompt: Customize AI context and behavior (developer mode)
- Generation Parameters: Control temperature, max tokens, top-p, stop sequences, etc.
Interact with your AI assistant with in-context data sample
- Ask questions about DBT best practices or your data in general
- Get recommendations for data modeling and data use cases
- Just have whatever type of conversation you want with your model
- Enable "Chain of Thought" to see AI reasoning
The behavior of the app can be configured through usage of environment variables. You can find an example environment in the repo. Usage of make env (for developers) will also spawn your own .env file to work with. Alternatively, copy the .env.example contents into .env to make use of docbt's python-dotenv feature. Or just export the environment variables/inject them into your environment of choice.
Control the verbosity of docbt's logging output to help with debugging or reduce noise in production.
Setting Log Level:
You can configure the logging level in two ways:
- CLI Flag (highest priority):
docbt run --log-level DEBUG- Environment Variable (used if no CLI flag provided):
# In .env file
DOCBT_LOG_LEVEL=DEBUG
# Or export directly
export DOCBT_LOG_LEVEL=DEBUGAvailable Log Levels:
TRACE- Most verbose, includes all internal detailsDEBUG- Detailed debugging information (useful for troubleshooting)INFO- General informational messages (default)SUCCESS- Success messages onlyWARNING- Warning messages and aboveERROR- Error messages and aboveCRITICAL- Only critical errors
Examples:
# Use DEBUG level for troubleshooting
docbt run --log-level DEBUG
# Use environment variable for persistent configuration
echo "DOCBT_LOG_LEVEL=DEBUG" >> .env
docbt run
# Reduce logging noise in production
docbt run --log-level WARNINGNote: The CLI flag always takes precedence over the environment variable. If neither is specified, the default level is INFO.
# Enable/disable AI usage
DOCBT_USE_AI_DEFAULT=false
# Enable/disable developer more for advanced features
DOCBT_DEVELOPER_MODE_ENABLED=true
DOCBT_SHOW_CHAIN_OF_THOUGHT=true
# You can choose which provider will appear as your default
DOCBT_LLM_PROVIDER_DEFAULT=openai/ollama/lmstudioWe recommend working with gpt-5 series but you can use the Fetch Models button to use whatever OpenAI has to offer.
- gpt-5-nano: good for most tasks and very cheap - fails to produce valid structured output with large sample size or too many cols
- gpt-5-mini: handles itself better than nano, worse at long context than gpt-5. Good middle-ground.
- gpt-5: the best of the gpt-5 series but the most expensive. Use sparingly.
# Set your API key
export DOCBT_OPENAI_API_KEY="sk-..."
# Or add to .env file
DOCBT_OPENAI_API_KEY=sk-...
# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OPENAI=trueWe recomment using models such as:
- Qwen3 series especially in the 4B to 14B range
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull qwen3:4b
# Start server (default: http://localhost:11434)
ollama serve
# Set host and port environment variables
DOCBT_OLLAMA_HOST=localhost
DOCBT_OLLAMA_PORT=11434
# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OLLAMA=trueSome models we would recommend are:
- Qwen3-4b-instruct-2507 or the 8B/14B variant
- Qwen3-4b-thinking-2507 or the 8B/14B variant
- Qwen3-30B-A3B if your GPU permits
Note: some models are incapable of producing valid structured outputs. For example, oddly enough, gpt-oss cannot. Experiment and find out what works for your usecase and hardware. Increasing context window in LM-Studio can troubleshoot bugs, especially with data that has lots of columns.
- Download from lmstudio.ai
- Browse models and download the ones you want
- Enable "Local Server" (default: http://localhost:1234) from UI
# Set host and port environment variables
DOCBT_LMSTUDIO_HOST=localhost
DOCBT_LMSTUDIO_PORT=1234
# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_LMSTUDIO=trueIn Developer Mode, fine-tune AI generation with inference parameters
- API Timeout: amount of seconds until API call fails
- Max Tokens: Maximum response length (100-4000)
- Temperature: Creativity level (0.0-2.0)
0.0: Deterministic, focused1.0: Balanced2.0: More creative, random
- Top P: Nucleus sampling (0.0-1.0)
- Stop Sequences: Custom stop words/phrases
Note: gpt-5 series does not support temperature (always 1), top-p and stop sequences.
You can use different connection methods to connect to the following data
Connect to Snowflake by means of with password, SSO, MFA or with RSA key.
# Example: connect with your user and password
DOCBT_SNOWFLAKE_ACCOUNT=your-account-id
DOCBT_SNOWFLAKE_USER=your-username
DOCBT_SNOWFLAKE_PASSWORD=your-password
DOCBT_SNOWFLAKE_WAREHOUSE=your-warehouse
DOCBT_SNOWFLAKE_DATABASE=your-database
DOCBT_SNOWFLAKE_SCHEMA=PUBLIC
DOCBT_SNOWFLAKE_AUTHENTICATOR=snowflakeCurrently, the BigQuery connection only works with credentials JSON method:
- Install cloud dk
- Authenticate with JSON credentials
# Point to your credentials JSON in the environment variables
DOCBT_GOOGLE_APPLICATION_CREDENTIALS=/home/<user>/.config/gcloud/application_default_credentials.jsonStreamlit App/General Issues Run docbt with debug log level and inspect the logs. If you find any bugs while doing so, please report them. :)
docbt run --log-level debugLLM Connection Errors
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Verify LM Studio server
curl http://localhost:1234/v1/models
# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/modelsDocker Issues
# View container logs
docker-compose logs docbt
# Check if container is running
docker ps
# Restart container
docker-compose restart docbtSee Docker Guide for more Docker-specific troubleshooting.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Inspired by the DBT community
- Built with Streamlit
- AI via OpenAI, Ollama, and LM Studio
- Data via Snowflake, BigQuery
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π§ Email: predaalin2694@gmail.com
We welcome contributions! Please see our Contributing Guide for details.
Quick Start:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Run
ruff format .andpytest - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
CI/CD: All pull requests are automatically tested with our CI pipeline. See CI/CD Documentation for details.
Development Tools: We use Make for automation. See Make Commands Guide for all available commands.
If you like what I'm working on and decide to sponsor you can do so via:
Happy documenting! π Generate better DBT documentation with AI assistance.






