Troubleshooting Document Camera Image Capture

So you read the instructions and you're still having problems getting accurate OCR results from your document camera? Well, don't worry, TopOCR has tools to help you understand the problem.

Character Size

You can use your mouse to measure the height in pixels of the characters in the image. Place the mouse cursor at the bottom of a character. On the status bar on the bottom of the Image Window note the "y =" value. Then place the mouse cursor at the top of a character, and note the "y =" value. Subtract the bottom y value from the top y value. This is the height of the character in pixels. TopOCR does best when character heights are between 18 and 48 pixels. You may need to change the distance of the camera from the page to produce numbers in this range. If character height is consistently below 22 pixels tall, you may want to investigate if there is any improvement in OCR accuracy by using the Small Print Mode feature in the Settings->OCR... dialog.

Brightness and Contrast

A simple way to test if the image is too bright is to binarize the Image, using the Image->Binarize..." function and look closely at the characters. If the characters are not crisp and sharp, but instead are thin and segmented then it's possible that the page is reflecting too much light, this might happen for example from a glossy magazine page. This issue may also occur if there is too much reflected glare on only a portion of the page. TopOCR's binarization function provides an option called "Equalize Contrast" that helps to eliminate this type of problem. Another, similar issue may happen if the page is under illuminated and the text doesn't stand-out against the darker image background. TopOCR's binarization function provides another option called "Maximize Contrast" that helps to separate the text from a dark background. If these options do improve binarization, you may want to have the OCR engine do this automatically by setting the "Equalize Contrast" or "Maximize Contrast" options in the Settings->OCR... dialog. Low quality images can sometimes be improved by binarizing and selecting an optimal background subtraction intensity with the slider. Moving the slider to the left, decreases the intensity of the background subtraction, and moving the slider to the right, increases the background subtraction.

Column Straightening

If your document image has some "page curl" then you can use the Settings->OCR->Straighten Columns function to reduce this page curvature.