AI-Powered system that converts ANY product catalog PDF into beautiful e-commerce shopping cards
- β Universal PDF Support - Works with ANY catalog type (Electronics, Fashion, Industrial, Medical, etc.)
- β AI-Powered Extraction - Uses GPT-4 Vision, Claude Vision, Gemini for 98%+ accuracy
- β Multi-Method Extraction - Text, tables, images, layout analysis
- β Beautiful HTML Output - Professional, responsive shopping card design
- β Structured JSON Export - Database-ready product data
- β Multi-Language Support - Works with any language (UTF-8)
pip install -r requirements.txtAdd your API key to config/.env.example and rename to .env:
# Choose ONE provider:
DEEPSEEK_API_KEY=your_key_here # Recommended - Fast & Affordable
# OR
ANTHROPIC_API_KEY=your_key_here # Best for technical docs
# OR
OPENAI_API_KEY=your_key_here # GPT-4 Vision
# OR
GOOGLE_API_KEY=your_key_here # Gemini
# OR
HUGGINGFACE_API_KEY=your_key_here # FREE optionGet API Keys:
- DeepSeek: https://platform.deepseek.com/ (β Fast, Affordable, 98%+ accuracy)
- Anthropic Claude: https://console.anthropic.com/ (Best for technical docs)
- OpenAI GPT-4: https://platform.openai.com/api-keys
- Google Gemini: https://makersuite.google.com/app/apikey
- Hugging Face: https://huggingface.co/settings/tokens (FREE)
# English (default)
python main.py your_catalog.pdf
# Persian (ΩΨ§Ψ±Ψ³Ϋ) - with RTL layout
python main.py your_catalog.pdf --lang persian
# Chinese (δΈζ)
python main.py your_catalog.pdf --lang chinesepdfextractstoshoppingcard/
βββ main.py # Entry point
βββ requirements.txt # Dependencies
βββ README.md # This file
β
βββ src/ # Core modules
β βββ __init__.py
β βββ universal_extractor.py # PDF extraction
β βββ ai_analyzer.py # AI-powered analysis
β βββ html_generator.py # HTML generation
β βββ converter.py # Main converter logic
β
βββ config/ # Configuration
β βββ .env.example # API key template
β
βββ output/ # Generated files
βββ *.html # Shopping cards
βββ *.json # Structured data
βββ images/ # Extracted images
python main.py catalog.pdfpython main.py catalog.pdf --output my_product.htmlimport sys
sys.path.insert(0, 'src')
from converter import UniversalPDFConverter
# English version (default)
converter = UniversalPDFConverter(use_ai=True, use_vision=True, language='english')
output = converter.convert('catalog.pdf')
# Persian version with RTL
converter_fa = UniversalPDFConverter(use_ai=True, use_vision=True, language='persian')
output_fa = converter_fa.convert('catalog.pdf', 'product_fa.html')
# Chinese version
converter_zh = UniversalPDFConverter(use_ai=True, use_vision=True, language='chinese')
output_zh = converter_zh.convert('catalog.pdf', 'product_zh.html')| Provider | Accuracy | Speed | Cost | Best For |
|---|---|---|---|---|
| DeepSeek β | 98%+ | Fast | $0.14/1M tokens | All catalogs |
| Anthropic Claude β | 98%+ | Fast | $0.10/catalog | Technical docs |
| OpenAI GPT-4 | 97%+ | Medium | $0.50/catalog | Complex layouts |
| Google Gemini | 95%+ | Fast | Free tier | General catalogs |
| Hugging Face | 90%+ | Medium | FREE | Budget option |
- Extract - Multi-method PDF extraction (text, tables, images, layout)
- Analyze - AI vision analyzes content and structure
- Structure - Intelligent data organization
- Generate - Beautiful HTML shopping card creation
For each PDF catalog:
- Modern, responsive design
- Product cards with specifications
- Interactive features
- Mobile-friendly
- Production-ready
{
"product_family": "Product Line",
"category": "Electronics",
"products": [
{
"name": "Product Name",
"model": "MODEL-123",
"specifications": {...},
"features": [...],
"pricing": {...}
}
]
}- High-resolution page screenshots
- Extracted product images
python main.py [PDF_FILE] [OPTIONS]
Options:
-o, --output PATH Output HTML file path
--lang LANGUAGE Output language: english, persian, chinese (default: english)
--no-ai Run without AI (basic extraction)
--no-vision Use text-only AI
--no-json Don't save JSON files
--demo Auto-find PDF in directory
-h, --help Show help- Python 3.8+
- AI API key (one of: Anthropic, OpenAI, Google, Hugging Face)
- Internet connection (for AI APIs)
- pdfplumber - PDF text extraction
- PyMuPDF - PDF rendering & images
- Pillow - Image processing
- Jinja2 - HTML templating
- anthropic/openai/google-generativeai - AI providers
- Processing Speed: ~0.5-1 sec/page
- Accuracy: 98%+ with AI (90%+ with Hugging Face FREE)
- Memory Usage: ~200-500 MB
- Supported PDF Size: Up to 100+ MB
- Page Limit: Unlimited
The AI extracts and generates content directly in your target language - no translation needed!
| Language | Command | Layout | Features |
|---|---|---|---|
| English | --lang english (default) |
LTR | Standard layout |
| Persian (ΩΨ§Ψ±Ψ³Ϋ) | --lang persian |
RTL | Right-to-left + Persian fonts |
| Chinese (δΈζ) | --lang chinese |
LTR | Chinese fonts optimized |
How it works:
- AI analyzes the PDF and understands the content
- Generates all product descriptions, features, and specs directly in your chosen language
- HTML UI labels (buttons, headings) automatically localized
- For Persian: Full RTL support with proper font rendering
- For Chinese: Optimized fonts and proper character display
Examples:
# Generate English version (default)
python main.py catalog.pdf
# Generate Persian version with RTL layout
python main.py catalog.pdf --lang persian -o product_fa.html
# Generate Chinese version
python main.py catalog.pdf --lang chinese -o product_zh.htmlPowered by DeepSeek AI - perfect for international e-commerce!
- E-Commerce: Convert supplier catalogs to product pages
- B2B Marketplaces: Digitize manufacturer catalogs
- Product Management: Catalog digitization and database population
- Sales & Marketing: Create product presentations and web content
- No AI configured: Add API key to
config/.env - API errors: Check your API key is valid
- PDF not found: Use absolute path or place in same folder
- Poor quality: Try different AI provider (Claude recommended)
MIT License - Free for any use
Alireza Saeedi
- π§ Email: alirezasaeediofficial@gmail.com
- π» GitHub: Your GitHub Profile
- Get API key from any provider
- Add to
config/.env - Run:
python main.py your_catalog.pdf - Open generated HTML file!
Made with β€οΈ for automating product catalog digitization
Universal β’ AI-Powered β’ Production-Ready β¨
Created by Alireza Saeedi β’ 2025