Advanced OCR tool for extracting text from images with preprocessing and parallel processing.
- π· Supports multiple image formats (JPG, PNG, TIFF, BMP)
- π Advanced image preprocessing for better OCR accuracy
- β‘ Parallel processing for fast batch operations
- π Multi-language support (Chinese by default)
- π Progress tracking and performance metrics
- π οΈ Configurable preprocessing and OCR parameters
-
Install Tesseract OCR:
# On Ubuntu/Debian sudo apt install tesseract-ocr sudo apt install libtesseract-dev # On macOS brew install tesseract
-
Python Dependencies:
pip install -r requirements.txt
-
Basic Command:
python program.py -i input_images -o output_texts -
Advanced Usage:
python program.py \ -i ./photos \ -o ./extracted_texts \ --lang eng+chi_sim \ --psm 11 \ --workers 8
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (git checkout -b feature/your-feature)
- Commit your changes (git commit -m 'Add some feature')
- Push to the branch (git push origin feature/your-feature)
- Open a Pull Request
Made with β€οΈ and Python
OCR accuracy may vary depending on image quality and language complexity