An OCR for Kannada.
AksharaJaana is a package which uses tesseract ocr in the backend to convert the read-only kannada text to editable format. A Special feature of this is it can separate columns in the page and thus making it easier to read and edit. Do consider using this package if necessary and feel free to mail me for any clarifications.
Happy coding and installing.
To see the python package visit https://pypi.org/project/AksharaJaana/
Conda environment is preferred for the smooth use
Open terminal and execute below commands.
Install the requirements in your system
sudo apt-get update -y
sudo apt-get install -y poppler-utils python3 python3-venv tesseract-ocr tesseract-ocr-kanInstalling packages for AksharaJaana
pip install --upgrade AksharaJaanaInstalling tesseract-ocr in the system
tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit).C:Program FilesTesseract-OCR is present. If yes, follow below procedureC:Program FilesTesseract-OCR to your system PATH by doing the following
Windows start button, search for Edit the system environment variables, click on Environment VariablesNew.C:Program FilesTesseract-OCR, click OK.Installing poppler in the system
poppler-0.54_x86C:UsersProgram Filespoppler-0.68.0_x86C:Program Filespoppler-0.68.0_x86bin to your system PATH by doing the following:
Installing python and pip in the system (If pip is not installed)
Installing packages for AksharaJaana
open command prompt
pip install AksharaJaanaReboot the system before starting to use
from AksharaJaana.main import OCREngine
from AksharaJaana.utils import ModelTypes, FileOperationUtils
ocr = OCREngine(modelType=ModelTypes.Easyocr)
# choices are Paddleocr, Easyocr, Tesseract
text = ocr.get_text_from_file("Your file Path")
print(text)