podpora slovenčiny v Open Source programoch
posledná zmena: 20. March 2021
Requirements
Tesseract 4.1.1 Windows installation (64bit) in command line
Initialisation of project structure
Destination for dependencies
mkdir F:\win64
set INSTALL_DIR=F:\win64
set PATH=%PATH%;%INSTALL_DIR%\bin
Build tree:
mkdir F:\Project & cd Project
Initialize VS environment:
call "c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" x64
zlib build and installation
curl https://zlib.net/zlib1211.zip
"c:\Program Files\Git\usr\bin\unzip.exe" zlib1211.zip
cd zlib-1.2.11
mkdir build.msvs && cd build.msvs
cmake .. -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR%
cmake --build . --config Release --target install
cd ..\..
libpng build and installation
curl https://vorboss.dl.sourceforge.net/project/libpng/libpng16/1.6.37/lpng1637.zip
"c:\Program Files\Git\usr\bin\unzip.exe" lpng1637.zip
cd lpng1637
mkdir build.msvs && cd build.msvs
cmake .. -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR%
cmake --build . --config Release --target install
cd ..\..
leptonica build and installation
Note: tesseract 4.1.1 cmake build needs release and debug version of leptonica
git clone --depth 1 https://github.com/DanBloomberg/leptonica.git
cd leptonica
mkdir build.msvs && cd build.msvs
cmake .. -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% ^
-DCMAKE_PREFIX_PATH=%INSTALL_DIR% ^
-DBUILD_PROG=OFF -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON
cmake --build . --config Debug --target install
cmake --build . --config Release --target install
cd ..\..
tesseract build and installation
git clone -b 4.1.1 --depth 1 https://github.com/tesseract-ocr/tesseract.git
cd tesseract
cmake .. -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% ^
-DCMAKE_PREFIX_PATH=%INSTALL_DIR% ^
-DLeptonica_DIR=%INSTALL_DIR%\lib\cmake ^
-DBUILD_TRAINING_TOOLS=OFF -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release --target install
cd ..\..
Post installation
cd F:\Project
git clone --depth 1 https://github.com/tesseract-ocr/tessconfigs tessdata
curl -L https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata ^
--output F:\Project\tessdata\eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata ^
--output F:\Project\tessdata\osd.traineddata
SET TESSDATA_PREFIX=F:\Project\tessdata
Check
%INSTALL_DIR%\bin\tesseract -v
tesseract 4.1.1
leptonica-1.81.0 (Mar 17 2021, 20:26:26) [MSC v.1928 LIB Release x64]
libpng 1.6.37 : zlib 1.2.11
Found AVX2
Found AVX
Found FMA
Found SSE
tesserocr build
git clone https://github.com/zdenop/tesserocr.git
cd tesserocr
git checkout window_build
SET VS90COMNTOOLS=%VS140COMNTOOLS%
SET INCLUDE=%INCLUDE%;%INSTALL_DIR%\include
SET LIBPATH=%LIBPATH%;%INSTALL_DIR%\lib
python setup.py build
python setup.py bdist_wheel
pip uninstall tesserocr
pip install dist\tesserocr-2.5.2b0-cp38-cp38-win_amd64.whl
Post installation
Note:
adjust to you Python instalation
copy F:\win64\bin\*.dll "C:\Program Files\Python38\Lib\site-packages\"
Check
cd F:\Project\tesserocr
python
>>> import tesserocr
>>> tesserocr.PyTessBaseAPI.Version()
'4.1.1'
>>> tesserocr.get_languages()
('F:\\Project\\tessdata/', ['eng', 'osd'])
>>> from PIL import Image
>>> image = Image.open(r'F:\Project\tesserocr\tests\eurotext.png')
>>> with tesserocr.PyTessBaseAPI() as api:
... api.SetImage(image)
... print(api.GetUTF8Text())
...
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from aspammer@website.com is spam.
Der ,schnelle” braune Fuchs springt
iiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marron ripido salta sobre el perro
perezoso. A raposa marrom ripida
salta sobre o cdo preguigoso.
>>>
© projekt sk-spell