This tool automatically converts batch of images containing structured data (tables, formulas, graphs, diagrams, flowcharts, etc.) into markdown format. Markdown files are suitable for RAG pipeline. Tool uses either top tier Anthropic's models or very cheap Mistral AI's vision Pixtral or Mistral Small models via API to analyze images and create detailed markdown descriptions based on included robust system prompt. Finaly I added script using Google Gemini models that are now superior.
Before you start, you need to have:
- Python installed on your computer (version 3.7 or higher)
- An Anthropic API key (get it from Anthropic's console)
- An Mistral API key (get it from Mistral's console)
- An Google API Key (get it from Google AI Studio)
If you don't have Python installed:
- Go to Python's official website
- Download the latest version for your operating system
- Run the installer
- On Windows: Make sure to check "Add Python to PATH" during installation
- On Mac: Follow the standard installation process
- Open Terminal (Mac) or Command Prompt (Windows)
- Navigate to where you want to save the project:
cd Documents- Clone the repository:
git clone https://github.com/PetrAPConsulting/image2md.git- It creates folder image2md with cloned files
- Click the green "Code" button on this page
- Click on the sheet "Local"
- Select "Download ZIP"
- Extract the ZIP file to your desired location
Open Terminal (Mac) or Command Prompt (Windows) in the project folder and run:
pip install anthropicIf that doesn't work, try:
pip3 install anthropicYou do not need to install anything for using Mistral AI models.
- Open the
images.pyfile in a text editor - Find this line:
self.client = anthropic.Anthropic(api_key="insert_api_key_here")- Replace
"insert_api_key_here"with your Anthropic API key - Follow development of Anthropic models and make adjustments in the script when new version is realised. Only models with vision capabilities are supported.
def __init__(self, model: str = "claude-3-7-sonnet-20250219")def main():
available_models = [
"claude-3-7-sonnet-20250219",
"claude-3-opus-20240229",
"claude-3-5-haiku-latest"
] - Open the
img2md_m.pyfile in a text editor - Find this line:
API_KEY = "API_key_here"- Replace
"API_key_here"with your Mistral API key - Follow development of Mistral AI models and make adjustments in the script when new version is realised. Only models with vision capabilities are supported and Pixtral and Mistral Small are much cheaper than Pixtral Large.
class MistralModel(str, Enum):
PIXTRAL = "pixtral-12b-2409"
PIXTRAL_LARGE = "pixtral-large-latest"
MISTRAL_SMALL = "mistral-small-latest"It's quite easy for Google API, because the most of the dependencies are included in Python installation. Install only
pip3 install google-genaiInput in script your Google API Key
API_KEY = "YOUR_API_KEY_HERE"and you can use script with choices of 3 current Gemini models. If endpoints change, change them in the script.
# Gemini Model Options
MODEL_OPTIONS: Dict[int, tuple] = {
1: ("flash", "gemini-3-flash-preview", "Gemini 3 Flash (Recommended)"),
2: ("pro", "gemini-3-pro-preview", "Gemini 3 Pro"),
}- Copy your images (.jpg, .jpeg, or .png) to the same folder as the script. Keep images around 1000 x 1000px for token consumption optimalization. You can download simple batch image downscaler for downscaling jpeg, jpg, png, webp files.
- Open Terminal (Mac) or Command Prompt (Windows)
- Navigate to the script's folder:
cd path/to/your/folder- Run the script:
python images.pyor
python img2md_m.py- Select a model when prompted (1-3)
- The script will create markdown (.md) files for each image in the same folder
.jpg.jpeg.png.gif.webp
- Automatically detects tables, formulas, graphs, flowcharts, etc.
- Creates markdown tables from image tables
- Converts mathematical formulas to LaTeX format
- Provides detailed analysis of graphs with key values
- Creates nice clear markdown mermaid from flowcharts and process diagrams
- Preserves anotations and tables with measurements
- Generates log files for troubleshooting
- IMPORTANT: If you need output in different language than ENG you need to include this information to system prompt in python script. Even though Anthropic and Mistral models are multilingual, keep system prompt itself always in English.
-
"No module named 'anthropic'"
- Run
pip install anthropicagain - Make sure you're using the correct Python version
- Run
-
"Invalid API key"
- Check if you've correctly inserted your API key in the script
- Verify your API key is active on Anthropic's or Mistral's website
-
"Python not found"
- Make sure Python is installed
- Try using
python3instead ofpython
First, try your good friend ChatGPT, Claude or Gemini. All three of them are able to help you if you give them occured errors. Or create an issue in this repository with:
- The error message you're seeing
- Your operating system
- Steps you've tried
This project is licensed under the MIT License - see the LICENSE file for details.
- Uses Anthropic's API or Mistral AI API or Google API for image analysis