Close X

TopOCR - Bringing Enhanced Tesseract OCR to Document Cameras

Tesseract LSTM OCR is a super accurate multi-lingual OCR classifier that has been optimized for TopOCR with greatly enhanced accuracy and speed compared to the standard release.

Tesseract LSTM OCR (LSTM Recurrent Neural Network + Static Classifier Architecture)


Tesseract LSTM OCR can read eleven different languages (English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish). The primary character classifier function in Tesseract OCR is based on an implementation of a Long Short-Term Memory neural network or LSTM network. LSTM neural networks outperform all other alternative neural network architecture models for this type of pattern recognition and also outperform the more "classical" character recognition algorithms used by the top selling commerical OCR products. For example, an LSTM network achieved the best known results in unsegmented connected handwriting recognition, and in 2009 won the ICDAR handwriting competition. If Tesseract's LSTM neural network recognizer fails on a particular character sequence, it can "fall-back" to its generic static shape classifier to make the determination. So in essence, Tesseract LSTM is actually two OCR classifiers.

The amount of computation required for LSTM network character recognition is about 50 times greater than for character recognition performed using a static classifier. To help speed up the processing, we are utilizing either SSE4.2 or AVX instructions for the inner neural network calculations. We have also achieved a significant performance increase by making extensive use of multi-threading (running on multiple-CPUs) in the most CPU intensive portions of the OCR and image processing functions. To optimize multi-threading, TopOCR will automatically scale the number of threads the program uses based on the number of processors or "cores" on your PC. On a low-end DeskTop PC using a 4-core Intel 3.4GHz i7-6700 CPU, our implementation of Tesseract's LSTM neural network OCR engine takes about 4 seconds to read a 5.0 MP image and TopOCR's image pre-processing (binarization and straighten columns) adds about another second. Because of the enormous performance improvement achieved by using multi-processing, we recommend ONLY running TopOCR on a 4-core or better CPU! As 8-core and even 16-core and larger CPUs become more mainstream, TopOCR will be able to automatically scale it's performance for these CPUs!

Please note that in this release, Tesseract LSTM calculations will automatically switch to the AVX or AVX2 instruction set if it is supported by your CPU, otherwise it will fallback to using the SSE4.2 or SSE special instruction sets.