This tutorial shows you how to work with the Scan and Optimize features in Acrobat X. See what the all-new Acrobat DC can do for you.

Download a free trial of the new Acrobat.

Recognizing text in scanned PDF documents with Acrobat X

Learn how to OCR PDF by using the Recognize Text panel in Acrobat X to fix up text in your PDF file.

Scan files to PDF for free.

By Ian Campbell – October 12, 2010

 



In this tutorial, learn how to OCR PDF by using the Recognize Text panel in Acrobat X to fix up text in your PDF file. Learn how to use the Recognize Text panel in Acrobat X to make scanned text searchable in your PDF file and also fix up any recognition errors as well.

View transcript

Recognizing text in scanned PDF documents with Acrobat X

Ian Campbell – October 12, 2010

Hi there.

I'm Ian Campbell and in this video we show you how to use the new Recognize Text panel in Acrobat X to make scanned text searchable in your PDF file, and also fix-up any recognition errors as well.

By the way, the technical term for recognizing typed text in images is OCR - that's 'Optical Character Recognition'.Here's a document which has been converted from a collection of TIFF files to a PDF.

Watch our related video: 'Converting Scanned Documents into PDF Files' to see how to convert image files.At the moment, this document is not searchable - it contains just images of the scanned pages.

If I try to find a word, Acrobat will give us an error message.However if I open up Acrobat X's new tool area, straight away I can see the Recognize Text function and clicking it reveals the panel options.

By the way, if for some reason the Recognize Text panel has been turned off, you can click here to reveal this or any other panel.I'm going to choose the 'Recognize Text In This File' option and choose to OCR just the current page for speed.

As this is the first time I have used the Recognize Text feature in Acrobat, I shall click Edit to change some settings.

The main language used in this document is American English, so I'll choose 'English (US)'.

For PDF Output style I usually like to choose 'ClearScan' as that creates a very compact yet searchable PDF file.However here my original was a legal document, so I want to ensure that the searchable PDF contains the scanned image from the original.

Choosing 'Searchable Image' as the Output Style retains an image of the page but adds a layer of searchable text beneath it.

Image content may be downsampled to keep filesize to a manageable level however.

If this document was mission critical, I could use the 'Searchable Image (Exact)' option - in which case the image is preserved exactly as it was scanned to preserve total authenticity.I'll now OK that choice, and say OK to start the OCR process.

The Searchable Image option includes some image cleanup, such as deskewing or straightening up the page, as well as making the page searchable.Let's try to find a word - 'party' - the document is definitely searchable now, yet we still have an image of the scanned page to refer to.

One added benefit of using the 'Searchable Image' OCR option is that we can ask Acrobat X to identify any word conversions it is unsure of, and allow us to manually correct them.

I could choose to show all the suspect words at once, but for this demo we'll choose Find First Suspect as that allows us to move through the suspects one by one.

So, the first suspect it finds is the word 'long'.

If we click in the word on the page, we can see that actually Acrobat recognized and spelled the word correctly - so we can choose Accept and Find to move to the next suspect.The next word it is not entirely confident about is 'relationship' - but in fact, again, Acrobat has correctly converted the scanned word.

On the other hand, we now see where Acrobat has made a mistake - with the word that we can just about make out to be 'providing'.

This part of the page was obviously damaged prior to scanning, so hardly surprising that Acrobat would have a problem identifying the word.

It's very easy to correct though - just click on the word on the page and delete and insert the correct characters.

Finally choose Accept and Find to move on, and so forth.Acrobat really does make it easy to create accurate, fully searchable PDFs from your paper originals.



Products covered:

Acrobat X

Related topics:

Scan and Optimize

Top Searches:


3 comments

Comments for this tutorial are now closed.

SONIA MORENO

8, 2016-02-22 22, 2016

I HAVE AN EPAD INK AND ADOBE DOESN’T RECOGNIZE IT.

Lori Kassuba

6, 2015-10-15 15, 2015

Hi pipelliott,

Have you run OCR on your scanned image in Acrobat? Did you use the Clearscan option?

Thanks,
Lori

pipelliott

9, 2015-10-08 08, 2015

how to I change a document I have scanned on to my pc but when I open the document it is in a inedible language in word or pdf

Lori Kassuba

3, 2013-06-20 20, 2013

Hi Jeff,

Try opening the Recognize Text panel by using the menu command under View > Tools > Document Processing in Acrobat (not Reader) X.

Thanks,
Lori

Jeff

4, 2013-06-16 16, 2013

I have Acrobat X but it doesn’t work as shown in your video. There is no OCR in mine. There isn’t any thing to “just click on” either to open up.

What gives?????

Comments for this tutorial are now closed.