Training Tesseract wiki page instruct to run mftraining and cntraining programs after creating training files.
I run it as described on wiki:
$ /usr/local/bin/mftraining *.tr
It produced following message:
Failed to load unicharset from file unicharset
Building unicharset for mftraining from scratch…
Reading slk.arial.001.tr …
slk.arial.001 has no defined properties.
Reading slk.times.001.tr …
slk.times.001 has no defined properties.
strace revealed that mftraining tried to open unicharset file before clustering training files.
So we need to run unicharset_extractor first.
$ /usr/local/bin/unicharset_extractor *.box
This command made simple output message:
Extracting unicharset from slk.arial.001.box
Extracting unicharset from slk.times.001.box
Wrote unicharset file ./unicharset.
unicharset_extractor created one new file – unicharset.
Than I run mftraining with following output message:
Reading slk.arial.001.tr …
slk.arial.001 has no defined properties.
Reading slk.times.001.tr …
slk.times.001 has no defined properties.
Warning: no protos/configs for / in CreateIntTemplates()
Error: no configs for class / in mftraining
Writing Merged Microfeat …Done!
mftraining created these files:
$ /usr/local/bin/cntraining *.tr
Command showed this message:
Reading slk.arial.001.tr …
Reading slk.times.001.tr …
Clustering …
Writing normproto …
cntraining created just one new file – normproto.
At the end of training I need to rename output from previous steps (slk is my language code):
$ mv unicharset slk.unicharset $ mv inttemp slk.inttemp $ mv pffmtable slk.pffmtable $ mv normproto slk.normproto
As you can be aware files mfunicharset and Microfeat are not present in final language file.