image2md

Structured Images to Markdown Converter

This tool automatically converts batch of images containing structured data (tables, formulas, graphs, diagrams, flowcharts, etc.) into markdown format. Markdown files are suitable for RAG pipeline. Tool uses either top tier Anthropic's models or very cheap Mistral AI's vision Pixtral or Mistral Small models via API to analyze images and create detailed markdown descriptions based on included robust system prompt. Finaly I added script using Google Gemini models that are now superior.

Prerequisites

Before you start, you need to have:

Python installed on your computer (version 3.7 or higher)
An Anthropic API key (get it from Anthropic's console)
An Mistral API key (get it from Mistral's console)
An Google API Key (get it from Google AI Studio)

Installation Steps

1. Install Python

If you don't have Python installed:

Go to Python's official website
Download the latest version for your operating system
Run the installer
- On Windows: Make sure to check "Add Python to PATH" during installation
- On Mac: Follow the standard installation process

2. Get the Code

Using Git (Option 1):

Open Terminal (Mac) or Command Prompt (Windows)
Navigate to where you want to save the project:

cd Documents

Clone the repository:

git clone https://github.com/PetrAPConsulting/image2md.git

It creates folder image2md with cloned files

Manual Download (Option 2):

Click the green "Code" button on this page
Click on the sheet "Local"
Select "Download ZIP"
Extract the ZIP file to your desired location

3. Install Required Library

Open Terminal (Mac) or Command Prompt (Windows) in the project folder and run:

pip install anthropic

If that doesn't work, try:

pip3 install anthropic

You do not need to install anything for using Mistral AI models.

Script configuration for Anthropic

Open the images.py file in a text editor
Find this line:

self.client = anthropic.Anthropic(api_key="insert_api_key_here")

Replace "insert_api_key_here" with your Anthropic API key
Follow development of Anthropic models and make adjustments in the script when new version is realised. Only models with vision capabilities are supported.

def __init__(self, model: str = "claude-3-7-sonnet-20250219")

def main():
    available_models = [
        "claude-3-7-sonnet-20250219",
        "claude-3-opus-20240229",
        "claude-3-5-haiku-latest"
    ]

Script configuration for Mistral AI

Open the img2md_m.py file in a text editor
Find this line:

API_KEY = "API_key_here"

Replace "API_key_here" with your Mistral API key
Follow development of Mistral AI models and make adjustments in the script when new version is realised. Only models with vision capabilities are supported and Pixtral and Mistral Small are much cheaper than Pixtral Large.

class MistralModel(str, Enum):
    PIXTRAL = "pixtral-12b-2409"
    PIXTRAL_LARGE = "pixtral-large-latest"
    MISTRAL_SMALL = "mistral-small-latest"

Google Gemini - Install required dependencies

It's quite easy for Google API, because the most of the dependencies are included in Python installation. Install only

pip3 install google-genai

Input in script your Google API Key

API_KEY = "YOUR_API_KEY_HERE"

and you can use script with choices of 3 current Gemini models. If endpoints change, change them in the script.

# Gemini Model Options
MODEL_OPTIONS: Dict[int, tuple] = {
    1: ("flash", "gemini-3-flash-preview", "Gemini 3 Flash (Recommended)"),
    2: ("pro", "gemini-3-pro-preview", "Gemini 3 Pro"),
}

Usage

Copy your images (.jpg, .jpeg, or .png) to the same folder as the script. Keep images around 1000 x 1000px for token consumption optimalization. You can download simple batch image downscaler for downscaling jpeg, jpg, png, webp files.
Open Terminal (Mac) or Command Prompt (Windows)
Navigate to the script's folder:

cd path/to/your/folder

Run the script:

python images.py

or

python img2md_m.py

Select a model when prompted (1-3)
The script will create markdown (.md) files for each image in the same folder

Supported File Types

.jpg
.jpeg
.png
.gif
.webp

Features

Automatically detects tables, formulas, graphs, flowcharts, etc.
Creates markdown tables from image tables
Converts mathematical formulas to LaTeX format
Provides detailed analysis of graphs with key values
Creates nice clear markdown mermaid from flowcharts and process diagrams
Preserves anotations and tables with measurements
Generates log files for troubleshooting
IMPORTANT: If you need output in different language than ENG you need to include this information to system prompt in python script. Even though Anthropic and Mistral models are multilingual, keep system prompt itself always in English.

Troubleshooting

Common Issues:

"No module named 'anthropic'"
- Run pip install anthropic again
- Make sure you're using the correct Python version
"Invalid API key"
- Check if you've correctly inserted your API key in the script
- Verify your API key is active on Anthropic's or Mistral's website
"Python not found"
- Make sure Python is installed
- Try using python3 instead of python

Still Having Problems?

First, try your good friend ChatGPT, Claude or Gemini. All three of them are able to help you if you give them occured errors. Or create an issue in this repository with:

The error message you're seeing
Your operating system
Steps you've tried

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Uses Anthropic's API or Mistral AI API or Google API for image analysis

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
LICENSE		LICENSE
README.md		README.md
image-downscaler.html		image-downscaler.html
images.py		images.py
img2md_g.py		img2md_g.py
img2md_m.py		img2md_m.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

image2md

Structured Images to Markdown Converter

Prerequisites

Installation Steps

1. Install Python

2. Get the Code

Using Git (Option 1):

Manual Download (Option 2):

3. Install Required Library

Script configuration for Anthropic

Script configuration for Mistral AI

Google Gemini - Install required dependencies

Usage

Supported File Types

Features

Troubleshooting

Common Issues:

Still Having Problems?

License

Acknowledgments

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

PetrAPConsulting/image2md

Folders and files

Latest commit

History

Repository files navigation

image2md

Structured Images to Markdown Converter

Prerequisites

Installation Steps

1. Install Python

2. Get the Code

Using Git (Option 1):

Manual Download (Option 2):

3. Install Required Library

Script configuration for Anthropic

Script configuration for Mistral AI

Google Gemini - Install required dependencies

Usage

Supported File Types

Features

Troubleshooting

Common Issues:

Still Having Problems?

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages