This tutorial shows you how to work with the Scan and Optimize features in Acrobat X. See what the all-new Acrobat DC can do for you.
Download a free trial of the new Acrobat.
In this tutorial, learn how to OCR PDF by using the Recognize Text panel in Acrobat X to fix up text in your PDF file. Learn how to use the Recognize Text panel in Acrobat X to make scanned text searchable in your PDF file and also fix up any recognition errors as well.
Ian Campbell October 12, 2010
I'm Ian Campbell and in this video we show you how to use the new Recognize Text panel in Acrobat X to make scanned text searchable in your PDF file, and also fix-up any recognition errors as well.
By the way, the technical term for recognizing typed text in images is OCR - that's 'Optical Character Recognition'.Here's a document which has been converted from a collection of TIFF files to a PDF.
Watch our related video: 'Converting Scanned Documents into PDF Files' to see how to convert image files.At the moment, this document is not searchable - it contains just images of the scanned pages.
If I try to find a word, Acrobat will give us an error message.However if I open up Acrobat X's new tool area, straight away I can see the Recognize Text function and clicking it reveals the panel options.
By the way, if for some reason the Recognize Text panel has been turned off, you can click here to reveal this or any other panel.I'm going to choose the 'Recognize Text In This File' option and choose to OCR just the current page for speed.
As this is the first time I have used the Recognize Text feature in Acrobat, I shall click Edit to change some settings.
The main language used in this document is American English, so I'll choose 'English (US)'.
For PDF Output style I usually like to choose 'ClearScan' as that creates a very compact yet searchable PDF file.However here my original was a legal document, so I want to ensure that the searchable PDF contains the scanned image from the original.
Choosing 'Searchable Image' as the Output Style retains an image of the page but adds a layer of searchable text beneath it.
Image content may be downsampled to keep filesize to a manageable level however.
If this document was mission critical, I could use the 'Searchable Image (Exact)' option - in which case the image is preserved exactly as it was scanned to preserve total authenticity.I'll now OK that choice, and say OK to start the OCR process.
The Searchable Image option includes some image cleanup, such as deskewing or straightening up the page, as well as making the page searchable.Let's try to find a word - 'party' - the document is definitely searchable now, yet we still have an image of the scanned page to refer to.
One added benefit of using the 'Searchable Image' OCR option is that we can ask Acrobat X to identify any word conversions it is unsure of, and allow us to manually correct them.
I could choose to show all the suspect words at once, but for this demo we'll choose Find First Suspect as that allows us to move through the suspects one by one.
So, the first suspect it finds is the word 'long'.
If we click in the word on the page, we can see that actually Acrobat recognized and spelled the word correctly - so we can choose Accept and Find to move to the next suspect.The next word it is not entirely confident about is 'relationship' - but in fact, again, Acrobat has correctly converted the scanned word.
On the other hand, we now see where Acrobat has made a mistake - with the word that we can just about make out to be 'providing'.
This part of the page was obviously damaged prior to scanning, so hardly surprising that Acrobat would have a problem identifying the word.
It's very easy to correct though - just click on the word on the page and delete and insert the correct characters.
Finally choose Accept and Find to move on, and so forth.Acrobat really does make it easy to create accurate, fully searchable PDFs from your paper originals.
Please Log in to provide feedback on this tutorial.
|Scan and Optimize|
|Create PDF, convert scanned documents to PDFs, get started with Acrobat DC|