A free opensource Android app to extract text from images using OCR technology.
- works offline (without Internet access)
- app size 42 MB
- works on all versions of Android; from Android 6 up to Android 14 (the latest version release)
- get image from gallery or from Camera for extracting text
- crop image before OCR processing
- share image into IMG2TXT app to extract text from it
- app UI supports Arabic & English
- app supports extracting Arabic & English languages
- app supports extracting Arabic+English text from the same image
- Use three OCR engines (ML Kit -> Google Vision -> Tesseract OCR)
- color-coded confidence/accuracy extracted text
- +90% -> white/black
- 80-90% -> purple
- 50-80% -> red
- -50% -> no shown (discarded)
- let the user choose the text language before processing
- show the newlines in the result text
- ability to edit extracted text
- ability to copy extracted text
Install the img2txt app from Google Play:
https://play.google.com/store/apps/details?id=com.softwarepharaoh.img2txt
- use onActivityResult (modern code)
- use general/universal openUrl() method/function
- better way of cropping images
- share image into IMG2TXT app to extract text from it
- translations (English & Arabic), default is English
- About Me
-
rate us on Google Play(removed) - link to Google Play
- first opensource version release is v2.5.1 on April 21st, 2024
- Android 6 (Marshmallow) (SDK 23)
- Android 7
- Android 8
- Android 9
- Android 10
- Android 11
- Android 12
- Android 13
- Android 14 (SDK 34) (v2.6.0)
- simplify app UI layout
- new simpler theme
- show alert/notice if the mean_confidence of result text is less than 60%
- show thresholded/cleaned image (created by Tesseract)
- show bounding rectangles/boxes around each recognized word
- on-device Tesseract OCR (English & Arabic models)
- on-device Google vision API (latin scripted languages)
- on-device ML Kit (latin scripted languages)
- in case of Arabic language or both (Arabic+English), use Tesseract OCR
- fallback strategy in case of English language is ML Kit -> Google Vision -> TesseractOCR
- color-coded confidence/accuracy of the result text from ML Kit & Tesseract OCR
- prompt the app user to choose the language of text on the image before processing it
- add line breaks (newline) in extracted text (ML Kit & Tesseract OCR)
- only show words with confidence > 50
- make result/extracted text editable
- batch processing (in bulk)
- PDF -> Images.foreach(ocr)
- expose more functions to Java : cpp files in
tesseract4android/src/main/cpp/tesseractand java files intesseract4android/src/main/java/com/googlecode/tesseract/androidmust be modified/added. - support Hindi ( Indian language )
- support Farsi ( Persian language )
- save OCR history (aka : Detailed scanned images history)
- save extracted text as PDF
- choose more than one image to OCR
- pre-process image with thresholding for more clarity and better results/accuracy/extracted text
- Ability to edit image before/after running OCR on it (manual)