Join Rick Borstein as he presents the advantages and disadvantages of ClearScan over Searchable Image OCR in Acrobat 9. Optical Character Recognition (OCR) converts scanned paper documents into searchable PDF documents. This technology has been available in Acrobat for about ten years. While OCR accuracy and language support have improved over the years, the default OCR "flavor"— Searchable Image— was the only useful choice.
Searchable Image retains the underlying scanned image and adds an invisible layer of text on top, which may be selected:
Searchable Image OCR has some shortcomings:
In Acrobat 9, Adobe engineers added a new flavor of OCR called ClearScan. ClearScan offers improved text quality with a decrease in file size:
I've recently completed some benchmarking which shows dramatic file size decreases and quality gains. Read on to learn about size comparisons, how to use ClearScan OCR and a bit more about how it all works.
I created two test documents:
I ran OCR and compared file sizes on my ThinkPad W500. The test machine ran Vista Enterprise in 32-bit mode and has 4GB of RAM. In addition to Acrobat, I also had Excel running. The W500 is a current model laptop which runs an Intel Core 2 Duo CPU at 2.8 GHz. The test machine has an IBM standard 320GB laptop hard drive running at 7200 rpm.
Note: Numbers rounded.
At 300 dpi, ClearScan offered improved visual quality at about one-third the total file size. At 600 dpi, the ClearScan file was seven times smaller and looked better.
ClearScan works by turning the images which represent text characters on the page into smoothed vector outlines. Each character on the page is compared and all matching characters are replaced with a an outline character:
800% View in Acrobat
300 dpi scan
ClearScan does not replace the font with your system fonts. Rather, a custom font it is created to match the visual appearance of the pixels. In fact, if you run ClearScan OCR and choose File—> Document Properties and click on the Fonts tab, you'll see that custom fonts are created:
Besides better visual appearance, print time is reduced. Instead of sending large images to the printer, Acrobat can send the compact font information instead.
ClearScan OCR is not the default in Acrobat 9, so you'll need to change a setting to use it. Here's how.
Here are a few answers to the most common questions about ClearScan OCR.
| || |
300 dpi input file
ClearScan 800% view
600 dpi input file
ClearScan 800% view
Please Log in to provide feedback on this tutorial.