This tutorial shows you how to work with the Scan and Optimize features in Acrobat 9. See what the all-new Acrobat DC can do for you.
Download a free trial of the new Acrobat.
Optical Character Recognition (OCR) converts scanned paper documents into searchable PDF documents. This technology has been available in Acrobat for about ten years. While OCR accuracy and language support have improved over the years, the default OCR "flavor"— Searchable Image— was the only useful choice.
Searchable Image retains the underlying scanned image and adds an invisible layer of text on top, which may be selected:
Searchable Image OCR has some shortcomings:
In Acrobat 9, Adobe engineers added a new flavor of OCR called ClearScan. ClearScan offers improved text quality with a decrease in file size:
I've recently completed some benchmarking which shows dramatic file size decreases and quality gains. Read on to learn about size comparisons, how to use ClearScan OCR and a bit more about how it all works.
I created two test documents:
I ran OCR and compared file sizes on my ThinkPad W500. The test machine ran Vista Enterprise in 32-bit mode and has 4GB of RAM. In addition to Acrobat, I also had Excel running. The W500 is a current model laptop which runs an Intel Core 2 Duo CPU at 2.8 GHz. The test machine has an IBM standard 320GB laptop hard drive running at 7200 rpm.
Note: Numbers rounded.
At 300 dpi, ClearScan offered improved visual quality at about one-third the total file size. At 600 dpi, the ClearScan file was seven times smaller and looked better.
ClearScan works by turning the images which represent text characters on the page into smoothed vector outlines. Each character on the page is compared and all matching characters are replaced with a an outline character:
Original | ClearScan |
800% View in Acrobat 300 dpi scan |
ClearScan does not replace the font with your system fonts. Rather, a custom font it is created to match the visual appearance of the pixels. In fact, if you run ClearScan OCR and choose File—> Document Properties and click on the Fonts tab, you'll see that custom fonts are created:
Besides better visual appearance, print time is reduced. Instead of sending large images to the printer, Acrobat can send the compact font information instead.
ClearScan OCR is not the default in Acrobat 9, so you'll need to change a setting to use it. Here's how.
Here are a few answers to the most common questions about ClearScan OCR.
300 dpi input file ClearScan 800% view | 600 dpi input file ClearScan 800% view |
Products covered: |
Acrobat 9 |
Related topics: |
Scan and Optimize |
Top Searches: |
Create PDF convert scanned documents to PDFs get started with Acrobat DC |
Try Acrobat DC
Get started >
Learn how to
edit PDF.
Post, discuss and be part of the Acrobat community.
Join now >
13 comments
Comments for this tutorial are now closed.
Lori Kassuba
3, 2015-06-24 24, 2015Hi abhishek,
Both Acrobat Std & Pro support OCR and export. You can actually run the OCR process when you export a file to Word.
You can find more details on Acrobat Std & Pro here:
https://acrobat.adobe.com/us/en/how-to/ocr-software-convert-pdf-to-text.html
Thanks,
Lori
abhishek
8, 2015-06-15 15, 2015Hi ,
I want to know About adobe acrobat pro OCR Functionality.
How We can Extract the text from PDF & what will be the out put format.
also want to know about the License cost and in which version of Adobe acrobat has this OCR feature ??
kindly reply
Lori Kassuba
5, 2015-05-12 12, 2015Hi Daniel Ford,
Try using the Native Scanner interface instead of Acrobat. Something scanners don’t always report the correct page size. You can find this in the Custom Scan dialog under the Options button. Also, you can only rotate in 90 degree increments in Acrobat but you can use the deskew tool to straighten images.
Thanks,
Lori
Daniel Ford
8, 2015-05-11 11, 2015When I scan the photo from Fujitsu Fi-6770 with Adobe Acrobat pro in custom mode for paper size such as 4"x6”, the scanned image always came in half image (other half being off of the scanned picture). I tried to use other paper size with same results until I selected C5 which showed the whole image but with white block attached under the image which is not desired. How can I produce whole 4"x6” images scans?
Another question. Can I rotate the images in smaller degrees like 1, 3, 6, 11, etc instead of 90, 180 or 270?
Patty Friesen
3, 2015-04-05 05, 2015Hi John,
Can you please post your question in the Acrobat forum so our experts can help you interactively:
https://answers.acrobatusers.com/AskQuestion.aspx
Thanks,
Patty
Lori Kassuba
3, 2015-04-02 02, 2015Hi Ben,
Has then PDF been secured? You’ll see the word secured in the title bar or a lock icon will appear in the navigation pane on the left. If so, you’ll need to get the password from the author before you can edit the file.
Thanks,
Lori
John Wojewoda
1, 2015-04-02 02, 2015I want my PDF to have searchable text and I have succeded that however once the text where recognized the PDF went crooked. How do I get it straight?
Ben
3, 2015-03-26 26, 2015I have received an interactive Acrobat pdf; however, I am unable to edit the document or fill in the requested information. I am currently running Acrobat XI Standard. What do I have to do to edit the interactive Acrobat pdf?
Lori Kassuba
3, 2015-03-26 26, 2015Hi Inderjit Singh,
Please see this tutorial on How to audit and optimize a PDF file using Acrobat XI Pro for details on how to reduce the filesize:
https://acrobatusers.com/tutorials/how-to-audit-and-optimize-a-pdf-file
Thanks,
Lori
Inderjit Singh
9, 2015-03-20 20, 2015Sir,
I want to reudce to text scanned pdf file of 60 pages. Advise how to bring the 25 MB file to less than 3 MB file .
Lori Kassuba
10, 2014-12-02 02, 2014Hi Raul,
Please see this tutorial on how to edit a scanned document:
https://acrobatusers.com/tutorials/how-to-edit-a-scanned-pdf-file
Thanks,
Lori
Raul
10, 2014-11-27 27, 2014DESEO REEMPLAZAR UN NOMBRE DESDE UN DOCUMENTO ESCANEADO A TRAVÉS DEL ADOBE ACROBAT PRO.
COMO DEBO DE REALIZARLO?
DESIRE TO REPLACE A NAME FROM A DOCUMENT SCANNING THROUGH THE ADOBE ACROBAT PRO .
HOW SHOULD I BE PERFORMED ?
Lori Kassuba
2, 2013-03-11 11, 2013Hi Heather,
You’ll need to correct the OCR suspects to correct the misidentified characters. Here is a tutorial that explains how to do this:
https://acrobatusers.com/tutorials/how-to-find-and-correct-ocr-errors
Thanks,
Lori
Heather
5, 2013-03-05 05, 2013Interesting. I should have known this before! It still works with Acrobat XI. My question is what happens with misidentified characters? The resulting text is sometimes different. Will this wrong result be displayed instead? Thanks.
Lori Kassuba
8, 2013-02-19 19, 2013Hi Steven,
I would recommend using the Optimize Scanned PDF command modify your scanned settings. Here is a tutorial on the subject:
https://acrobatusers.com/tutorials/how-do-i-optimize-a-scanned-pdf-document
Thanks,
Lori
Steven Hubert
7, 2013-02-15 15, 2013It seems that the reason it could take more than an hour to print an OCR’d document is the initial scan setting. The way I understood this, if the scan quality is not too good, OCRing it will produce a lont-time-to-print PDF. To avoid the issue in the future I thought to change my scan settings (on a Fujitsu ScanSnap fi-5110EOXM) to “Excellent” which means slower, like to 10 pages per minute. But what about converting all of those slow PDF’s into a faster-printing ClearScan document? Can I re-OCR a batch of Searchable Image files to turn them into ClearScan files?
Patty Friesen
9, 2012-11-13 13, 2012Hi Jim,
If you’re interested in how to edit text in a scanned PDF, check out:
https://acrobatusers.com/tutorials/how-do-i-edit-text-in-a-scanned-pdf
Hope this helps,
Patty
Jim Dunlap
4, 2012-11-13 13, 2012The link I clicked was how to Edit text in a scanned PDF… this tutorial is great for Recognizing text… not editing it…
Michael Brennan
9, 2012-06-26 26, 2012I was successfully redacting a number of scanned PDF files, but there were a couple pop up boxes. I checked the boxes so they would not be shown again. Now when I go to search the scanned docs for SSN patterns, Adobe says it is not finding any documents to scan.
I replied to this feed because the last thing I saw before successfully executing my redaction was the same “Recognize text” pop-up you show above.
How do I “uncheck” the boxes I asked to never see again and/or redact these documents now that I’ve checked the boxes?
Can you post your question on our User-to-User security forum so that we can help you interactively?
http://forums.adobe.com/community/acrobat/security_&_digital_signatures
Thanks,
Lori
Comments for this tutorial are now closed.