Recognizing Text in Scanned PDF Documents

By Ian Campbell – October 12, 2010

 



This video details how to use the new Recognize Text panel in Acrobat X to make scanned text searchable in your PDF file, and also fix-up any recognition errors as well.

View transcript

Recognizing Text in Scanned PDF Documents

Ian Campbell – October 12, 2010

Hi there.

I'm Ian Campbell and in this video we show you how to use the new Recognize Text panel in Acrobat X to make scanned text searchable in your PDF file, and also fix-up any recognition errors as well.

By the way, the technical term for recognizing typed text in images is OCR - that's 'Optical Character Recognition'.Here's a document which has been converted from a collection of TIFF files to a PDF.

Watch our related video: 'Converting Scanned Documents into PDF Files' to see how to convert image files.At the moment, this document is not searchable - it contains just images of the scanned pages.

If I try to find a word, Acrobat will give us an error message.However if I open up Acrobat X's new tool area, straight away I can see the Recognize Text function and clicking it reveals the panel options.

By the way, if for some reason the Recognize Text panel has been turned off, you can click here to reveal this or any other panel.I'm going to choose the 'Recognize Text In This File' option and choose to OCR just the current page for speed.

As this is the first time I have used the Recognize Text feature in Acrobat, I shall click Edit to change some settings.

The main language used in this document is American English, so I'll choose 'English (US)'.

For PDF Output style I usually like to choose 'ClearScan' as that creates a very compact yet searchable PDF file.However here my original was a legal document, so I want to ensure that the searchable PDF contains the scanned image from the original.

Choosing 'Searchable Image' as the Output Style retains an image of the page but adds a layer of searchable text beneath it.

Image content may be downsampled to keep filesize to a manageable level however.

If this document was mission critical, I could use the 'Searchable Image (Exact)' option - in which case the image is preserved exactly as it was scanned to preserve total authenticity.I'll now OK that choice, and say OK to start the OCR process.

The Searchable Image option includes some image cleanup, such as deskewing or straightening up the page, as well as making the page searchable.Let's try to find a word - 'party' - the document is definitely searchable now, yet we still have an image of the scanned page to refer to.

One added benefit of using the 'Searchable Image' OCR option is that we can ask Acrobat X to identify any word conversions it is unsure of, and allow us to manually correct them.

I could choose to show all the suspect words at once, but for this demo we'll choose Find First Suspect as that allows us to move through the suspects one by one.

So, the first suspect it finds is the word 'long'.

If we click in the word on the page, we can see that actually Acrobat recognized and spelled the word correctly - so we can choose Accept and Find to move to the next suspect.The next word it is not entirely confident about is 'relationship' - but in fact, again, Acrobat has correctly converted the scanned word.

On the other hand, we now see where Acrobat has made a mistake - with the word that we can just about make out to be 'providing'.

This part of the page was obviously damaged prior to scanning, so hardly surprising that Acrobat would have a problem identifying the word.

It's very easy to correct though - just click on the word on the page and delete and insert the correct characters.

Finally choose Accept and Find to move on, and so forth.Acrobat really does make it easy to create accurate, fully searchable PDFs from your paper originals.


Was this tutorial helpful?

Please Log in to provide feedback on this tutorial.

Rate this tutorial

Please Log in to rate this tutorial.

Rating:

Did you know?

  • You can ask a question and get an answer from one of our experts.
  • You can search our database of over 800 tutorials by product and/or topic.
  • You can leave a comment below for the author of this tutorial.

Products covered:

Acrobat X ProAcrobat X StandardAcrobat X Suite

Related topics:

Scanning & OCR

Top Searches:

Create PDF, Scan to PDF, Electronic signatures, OCR PDF

2 comments

Lori Kassuba

11, 2013-06-20 20, 2013

Hi Jeff,

Try opening the Recognize Text panel by using the menu command under View > Tools > Document Processing in Acrobat (not Reader) X.

Thanks,
Lori

Jeff

12, 2013-06-16 16, 2013

I have Acrobat X but it doesn’t work as shown in your video. There is no OCR in mine. There isn’t any thing to “just click on” either to open up.

What gives?????

Leave a reply:

Have an urgent question? Post your question to our Ask an Expert forum for a faster response.

Fields marked with * are required.

Download
Acrobat XI trial

Get the trial now

Learn how to
edit PDF.

Get started