SBW Magazine & History Project

This is just a place to record info on tips for improving the scanned files. It is probably not of interest otherwise.

Removing speckles from scans

After having manually cleaned up speckles from too many scans, I eventually found a useful technique that can automatically remove many speckles, without affecting the text too badly.

It works on the principal that the speckles are invariably smaller than the text. First, we reduce the size of both until the speckles disappear. Then the “seed” that's left of the text is grown back to the original.

It uses the image processing software ImageMagick. The instructions are based on the information in this thread.

convert "e:/source.tif" -write MPR:source -morphology close rectangle:3x4 -clip-mask MPR:source -morphology erode:40 square +clip-mask  "e:/output.png"

The key items to play with are the size of the initial rectangle (3×4) for the close operation, and the number of iterations of erode (40). The rectangle effectively determines the largest speckle that will be removed. However, making it much large also tends to result in portions of text being removed. The number of iterations of erode is to build the image back up. For letters with thin branches (such as e or g), smaller numbers seem to result in missing chunks off the branches.

This algorithm could possibly be improved by using multiple iterations in the first instance to reduce the speckles. I haven't tested this.