sk-spell

podpora slovenčiny v Open Source programoch

Getting component coordinates with python from tesseract C-API

posledná zmena: 8. December 2013

back to tesseract-ocr-en

Tesseract-ocr C-API brought an option to use tesseract API within python via ctype. Simple example is included in tessseract source (in contrib directory) already.

With ctypes in python you can face several difficulties e.g. when you should pass file object to C library. For this issue there is the example tesseract-ocr C-API with file via ctypes in python on pastebin.com.

Other task could be handling of result structures and defining C enums. Here is the example how these can be solved. This example is based on C++ tesseract API example. It shows how to get information (position, size) about image component (in this case line – but can be easily modified for paragraph, words , symbols by replacing of ‘RIL_TEXTLINE’ with other page level iterator). This example use leptonica library via ctype too (there is also python wrapper for leptonica call pylepthonica)

  #!/usr/bin/python
  # * coding: utf-8 *

  # Copyright 2013 Zdenko Podobný
  # Author: Zdenko Podobný
  #
  # Licensed under the Apache License, Version 2.0 (the “License”);
  # you may not use this file except in compliance with the License.
  # You may obtain a copy of the License at
  #
  #      http://www.apache.org/licenses/LICENSE-2.0
  #
  # Unless required by applicable law or agreed to in writing, software
  # distributed under the License is distributed on an “AS IS” BASIS,
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  # See the License for the specific language governing permissions and
  # limitations under the License.

  “”“
  This is python version of C++ example:
    https://code.google.com/p/tesseract-ocr/wiki/APIExample#example

  It demonstrate how to use tesseract-ocr 3.02 c-api and leptonica to get info
  about image components.

  Tested on openSUSE 13.1 64bit with tesseract 3.03 (r918), leptonica 1.69
  “”“
  #pylint: disable-msg=C0103, R0903

  import os
  import ctypes

  # Demo variables
  lang = “eng”
  filename = “../phototest.tif”

  TESSDATA_PREFIX = os.environ.get(‘TESSDATA_PREFIX’)
  if not TESSDATA_PREFIX:
      TESSDATA_PREFIX = “../”

  (L_INSERT, L_COPY, L_CLONE, L_COPY_CLONE) = map(ctypes.c_int, xrange(4))
  # Define Page Iterator Levels
  (RIL_BLOCK, RIL_PARA, RIL_TEXTLINE, RIL_WORD, RIL_SYMBOL) = \
      map(ctypes.c_int, xrange(5))
  # Define Page Segmentation Modes
  (PSM_OSD_ONLY, PSM_AUTO_OSD, PSM_AUTO_ONLY, PSM_AUTO, PSM_SINGLE_COLUMN,
  PSM_SINGLE_BLOCK_VERT_TEXT, PSM_SINGLE_BLOCK, PSM_SINGLE_LINE,
  PSM_SINGLE_WORD, PSM_CIRCLE_WORD, PSM_SINGLE_CHAR, PSM_SPARSE_TEXT,
  PSM_SPARSE_TEXT_OSD, PSM_COUNT) = map(ctypes.c_int, xrange(14))

  class BOX:
      “”“Leptonica box structure
    “”“
      fields = [
	  (“x”, ctypes.c_int32),
	  (“y”, ctypes.c_int32),
	  (“w”, ctypes.c_int32),
	  (“h”, ctypes.c_int32),
	  (“refcount”, ctypes.c_uint32)
      ]

  libname = “libtesseract.so.3”
  leptlib = “liblept.so”

  try:
      tesseract = ctypes.cdll.LoadLibrary(libname)
  except OSError, error:
      print “Loading of ‘%s failed…” % libname
      print error
      exit(1)

  try:
      leptonica = ctypes.cdll.LoadLibrary(leptlib)
  except OSError, error:
      print “Loading of ‘%s failed…” % leptlib
      print error
      exit(1)

  tesseract.TessVersion.restype = ctypes.c_char_p
  tesseract_version = tesseract.TessVersion()[:4]
  # We need to check library version because libtesseract.so.3 is symlink
  # and can point to other version than 3.02
  if float(tesseract_version) < 3.02:
      print “Found tesseract-ocr library version %s.” % tesseract_version
      print “C-API is present only in version 3.02!”
      exit(2)

  # Read image with leptonica => create PIX structure and report image size info
  pix_image = leptonica.pixRead(filename)
  print “image width:”, leptonica.pixGetWidth(pix_image)
  print “image height:”, leptonica.pixGetHeight(pix_image)

  # Create tesseract api
  api = tesseract.TessBaseAPICreate()
  rc = tesseract.TessBaseAPIInit3(api, TESSDATA_PREFIX, lang)
  if (rc):
      tesseract.TessBaseAPIDelete(api)
      print(“Could not initialize tesseract.\n”)
      exit(3)
  tesseract.TessBaseAPISetPageSegMode(api, PSM_AUTO_OSD)

  # Set PIX structure to tesseract api
  tesseract.TessBaseAPISetImage2(api, pix_image)
  # Get info(BOXA structure) about lines(RIL_TEXTLINE) from image in api
  boxa = tesseract.TessBaseAPIGetComponentImages(api, RIL_TEXTLINE, 1,
						None, None)
  # Get info about number of items on image
  n_items = leptonica.boxaGetCount(boxa)
  print “Found %d textline image components.” % n_items

  # Set up result type (BOX structure) for leptonica function boxaGetBox
  BOX_Ptr_t = ctypes.POINTER
  leptonica.boxaGetBox.restype = BOX_Ptr_t

  # Shut up tesseract – there is a lot of unwanted messages for RIL_TEXTLINE
  tesseract.TessBaseAPISetVariable(api, “debug_file”, “/dev/null”)

  # print info about items
  for item in range(0, n_items):
      BOX = leptonica.boxaGetBox(boxa, item, L_CLONE)
      box = BOX.contents
      tesseract.TessBaseAPISetRectangle(api, box.x, box.y, box.w, box.h)
      ocr_result = tesseract.TessBaseAPIGetUTF8Text(api)
      result_text = ctypes.string_at(ocr_result)
      conf = tesseract.TessBaseAPIMeanTextConf(api)
      print “Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s” % \
	  (item, box.x, box.y, box.w, box.h, conf, result_text.strip())

Code is available also on pastebin.com

sk-spell

Getting component coordinates with python from tesseract C-API

back to tesseract-ocr-en

© projekt sk-spell