Removing Noise from Images
Many times, particularly at lower light levels digital cameras will pick up some noise in the image. TopOCR can remove most image noise image through the use of an optional 2 stage post-binarization noise reduction filter combined with column straightening, and can even do this automatically when you scan an image. To enable this filter to be applied automatically whenever you scan an image, simply turn on the "Straighten Columns" feature in the Doccam Image Options Dialog. This filter combines background subtraction with a bilinear filter to remove "specks" before it does column straightening and can greatly increase OCR accuracy!
In addition to using it for real-time scanning with a document camera, there is also an Image->Noise Reduction function that will apply noise reduction with binarization and background removal to any color or grayscale image.
For example, the "Raw Input Image" below came from an $18 2.0 MP 1080P webcam used in a low light environment that generated LOTS of image noise. However, this noisy 2.0 MP image of a typed letter sized document, after being processed as described below, can be read with 100% accuracy by TopOCR's Tesseract LSTM OCR engine!
|Step 1 - Raw Input Image - Low Light Causes Extreme Image Noise!||Step 2 - Binarization and Background Subtraction Leaves Noise|
|Step 3 - Image->Noise Reduction Removes Most Noise||Step 4 - Image->Straighten Text Removes Even More Noise|