Page Distortion Correction with Neural Warp and Straighten Columns

When books are scanned, the cylindrical curvature of the image can produce a 3D geometric image distortion that produces curled text-lines. In addition, another type of 3D geometric image distortion, called "Perspective Distortion" can be created if the camera is not perpendicular to the surface of the document to be scanned. Instead of a complex 3D page distortion, there can also be a simple 2D image rotation, and to make things really complicated there may be a combination of 3D page distortion combined with 2D image rotation!

TopOCR provides two separate functions, called "Neural Warp" and "Straighten Columns" to correct for these types of commonly encountered image distortion.

Neural Warp is a neural network that will take any document camera text image and automatically correct for both 3D and 2D image distortion for perspective, page curl, rotation, lighting and background.

Straighten Columns is a 2D text line tracking function combined with a sophisticated curve fitting function to straighten lines of 2D text.

You can select either of these functions to use automatically every time you scan an image with a document camera with the DocCam Image Preview Dialog These functions automatically create a "corrected" text aligned output image ready for OCR, and can greatly improve OCR accuracy, in some cases by as much as 40%-50%!

Distorted Book Input Image Neural Warp Output Image


Neural Warp Reads Can Labels!


Here is an example of a can label that combines the 3D geometic distortion caused by the curvature of the can with a slight 2D rotation of the can. Neural Warp can correct this image and the main label can be read with 100% OCR accuracy!

Neural Warp Can Input Image Neural Warp Can Output Image


Perspective Distortion


If your camera is not at a perpendicular 90 degree angle from the document's surface, it will introduce "Perspective "Distortion. Neural Warp can correct for this type of 3D geometric distortion as well!

Distorted Book Input Image Neural Warp Output Image


Column Straighten

Straighten Columns tries to rectify curled or rotated of 2D text lines by the following process:

Perform a document layout analysis of the image and detect the location of columns of text, graphics and clipped columns of text near the edges of the document.

Determine if this particular image has just one single column of text or has multiple columns.

In the case of one single column, it will straighten the entire image, if the image has multiple columns, since each column can have different distortion characteristics, it will try to straighten each individual column separately. This is typical for an image of an open book with two pages, each of which can have its own unique distortion.

This function will straighten columns of text and also remove graphics and clipped columns of text. The process is very fast, generally taking under .25 sec once an image has been binarized.

Note: Straighten Columns will not work very well on more extreme cases of cylindrical book curvature nor on images that have greater than 12 degress of skew. As a consequence of this, it's recommended that you use Neural Warp to dewarp this type of image.

The Straighten Columns function will not alter single lines of text or short columns of less than 4 lines of text. It will also automatically remove clipped columns of text, generally caused by the camera clipping the edge of one page.

Raw Input Image Column Straighten Output Image


Neural Warp Versus Column Straighten

Neural Warp can handle more complex 3D distortions but at the expense of greater computation. Neural Warp generally takes more than 10 times longer to correct an image compared to Column Straighten. Neural Warp dewarps character by character and so it must generate the 2D coordinates of the cell for each character. All page transformations are generated based on a transformational matrix of all character cells and lines as input to a convolutional neural network. This requires a significant amount of calculation for a complex page!

Column Straighten instead uses a much simpler 2D "line tracking" and curve fitting function to straighten curved lines of text. Neural Warp requires about 4 seconds to process an average page while Column Straighten only requires about .25 second.

The current version of Neural Warp only supports single-column layouts wheras Column Straighten supports multi-column layouts.

Neural Warp Character Cell Page Map Column Straighten Line Tracking Input


Neural Warp II Development

We are currently working on Neural Warp II! Neural Warp II achieves broader coverage and better over-all results by working in conjunction with Tesseract's document layout analysis system. This works much better than full page image dewarping because each individual paragraph can have its own unique level of image distortion and can be corrected at a much finer level. In addition, this process is inherently parallel and it can more than double the performance on a typical multi-column document image. It also allows us to exclude non-textual parts of the image and only dewarp text regions, which saves even more time compared to full page dewarping.