sk-spell

podpora slovenčiny v Open Source programoch

Building tesseract and leptonica with CMake and Clang on Windows   

posledná zmena: 5. January 2020

back to tesseract-ocr-en

Motivation

Disclaimer

Requirements

Setting up build environment

For correct build run you need to setup several environment setting (in command line/terminal/console). I do not suggest to set them permanently/globally especially if you want to also use Visual Studio with cmake. If you plant to build libraries with cmake&clang regularly, put them to batch file. Here are commands:

call "c:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

This initialize VS studio 64 bit variables some relevant part of build (e.g. header files) are found during compilation.
set CC=clang-cl
set CXX=clang-cl

These commands set up clang with VS compatibility as C and C++ compiler (for cmake).
set CFLAGS=-m64 -fmsc-version=1915
set CXXFLAGS=-m64 -fmsc-version=1915

These commands instruct C and C++ compiler (clang) to build for 64bit architecture and produce output compatible with Visual Studio 2017 version 15.8
set INSTALL_DIR="F:\win64_llvm"

This commands instruct cmake where to install results.

Compilation process

Here is order of libraries with notes how to build them.
Ninja is single-configuration generator (it builds only one target without reconfiguring), so I am focused on Release target only.
Also I am interesting on 64bit architecture only, so if you need 32bit you need to adjust above mentioned variables (Setting up build environment).

zlib

Download link: zlib.net or github.com/madler/zlib

Build commands:

cd zlib-1.2.11
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release --target install

Last command will build and install library.

png

Download link: sourceforge.net/projects/libpng

cd lpng1636
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release -DPNG_TESTS=OFF -DSKIP_INSTALL_PROGRAMS=ON
cmake --build . --config Release --target install

gif

There is no official support for CMake in official release sourceforge.net/projects/giflib, so I used version from xbmc project

cd giflib-master
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release --target install

LittleCMS (optional)

Download link: www.littlecms.com
LittleCMS is color management engine used by openjpeg. It is question if it is needed. Thread save version lcms2mt is under development, but there is not public release available…
Here are instruction how to build it:

cd lcms2-2.9
curl https://raw.githubusercontent.com/mindw/little-CMS-cmake/xy/CMakeLists.txt -O
copy Projects\VC2017\lcms2.rc  src\lcms2.rc.in
mkdir build.clang && cd build.clang
copy ..\Projects\VC2017\resource.h  .
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTS=OFF
cmake --build . --config Release --target install

jpeg

There are 2 possibilities how to get jpeg support/library:

  1. libjpeg v9c from Independent JPEG Group (https://www.ijg.org/) – but there is no direct support of CMake
  2. jpeg-turbo – (often used in linux) a JPEG image codec that uses SIMD instructions that claim to be 6x faster than jpeg. It requires to install NASM assembler compiler.
    I suggest to use jpeg-turbo, but here are detail how to build both (install only one of them!).

libjpeg

Download link: www.ijg.org
There is no support for CMake build so I will use them from “Steffen Ohrendorf project libjpeg-cmake”: https://github.com/stohrendorf/libjpeg-cmake
cd jpeg-9c

curl https://raw.githubusercontent.com/stohrendorf/libjpeg-cmake/master/CMakeLists.txt -O
curl https://raw.githubusercontent.com/stohrendorf/libjpeg-cmake/master/jconfig.h.in -O
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

jpeg-turbo

Download link: github.com/libjpeg-turbo
There are several option how to build jpeg – we will build it with the libjpeg v6b API/ABI compatibility:

cd libjpeg-turbo-2.0.2
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

lzma (xz)

Download link: tukaani.org/xz/
Lzma is compression library quite often used in tiff files. But maybe you do not need to build it. Source code package is not available in zip, so maybe you will need tool to unpack tar.gz

Other option is clone xbmc version, but they do not have the latest version of xz. Original version of xz does not have CMake support so anyway we will used XBMC/Team Kodi configuration:

cd xz-5.2.4
curl https://raw.githubusercontent.com/xbmc/xz/master/CMakeLists.txt -O
curl https://raw.githubusercontent.com/xbmc/xz/master/cmake/xz-config.cmake -O
mkdir cmake && move xz-config.cmake cmake
copy windows\vs2017\config.h windows\config.h
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR%  -DCMAKE_PREFIX_PATH=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release --target install
move %INSTALL_DIR%\lib\lzma.lib %INSTALL_DIR%\lib\lzma-static.lib

Unfortunately it did not created shared build (there is problem with static linking in libtiff) – so we will use VS builds to get it quickly:
cd ..\windows\vs2017
msbuild xz_win.sln /property:Configuration=ReleaseMT /property:Platform=x64
copy ReleaseMT\x64\liblzma_dll\liblzma.dll %INSTALL_DIR%\bin
copy ReleaseMT\x64\liblzma_dll\liblzma.lib %INSTALL_DIR%\lib\lzma.lib

JBIG-KIT

Official package https://www.cl.cam.ac.uk/~mgk25/jbigkit/ does not support cmake, so you can use this git repository:

git clone https://github.com/zdenop/jbigkit
cd jbigkit
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

zstd

This is new compression format supported in latest version of tiff.
Download link: github.com/facebook/zstd

cd zstd-1.3.8\build\cmake
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

webp

Download link: webmproject.org
webp is looking for tiff, so we will rebuild it after we build tiff (which is looking for webp ;-) )

cd libwebp-1.0.2
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

tiff

Download link: download.osgeo.org/libtiff/

UPDATE: version 4.0.10 has problem when building from cmake on windows. It is already fixed in master code so get from git:

git clone https://gitlab.com/libtiff/libtiff
cd libtiff
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

Now you can rebuild webp if you want to have tiff support also there:
cd ..\..\libwebp-1.0.2\build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

openjpeg

Download link: github.com/uclouvain/openjpeg

cd openjpeg-2.3.0
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release 
cmake --build . --config Release --target install

leptonica

I use github code for leptonica, because it does not depends on PkgConfig.

git clone https://github.com/DanBloomberg/leptonica.git
cd leptonica
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release -DCMAKE_MODULE_LINKER_FLAGS=-whole-archive -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release --target install

libarchive

Download link: /www.libarchive.org
Tesseract start to support compressed traineddata file via libarchive. I will use only already installed compression libraries (z, lzma, ztsd) – if you need other compression you need to install the additionally before configuring libarchive.

cd libarchive-3.3.3 
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release --target install

tesseract

We will use the latest code from github. We will not build training tools because they need of other external dependencies.

git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
mkdir build.clang && cd build.clang
cmake .. -G Ninja -DBUILD_TRAINING_TOOLS=OFF -DCPPAN_BUILD=OFF -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_PREFIX_PATH=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release -DCMAKE_MODULE_LINKER_FLAGS=-whole-archive
cmake --build . --config Release --target install

And now we can test what support has out installation of tesseract:

%INSTALL_DIR%\bin\tesseract -v 

tesseract 4.1.0-rc1-71-ge4bf
 leptonica-1.78.0 (Mar  8 2019, 18:10:35) [MSC v.1915 LIB Release x64]
  libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found SSE
 Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8

Conclusion

Above mentioned process could be easily used for building tesseract also with VS2017 (or VS2015): just replace Ninja with "Visual Studio 15 2017 Win64" (or "Visual Studio 14 2015 Win64").

I expect the same process can be used for building with other compilers (e.g. from intel).

Clang can used to generate output also with gcc compatibility, but it require installation of mingw compiler instead of VS. See details if you are interested in it.

back to tesseract-ocr-en

© projekt sk-spell

RSS [opensource] [w3c] [firefox] [textpattern]