These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Basic OCR Question

MoreToast
Registered: Nov 26 2008
Posts: 5
Answered

I'm probably doing something silly, but I've been fighting this for a while, so any help would be great.

I'm using Acrobat Pro 9.

I have a pdf and I'm trying to create a text document from it. I did this in the past with 7 (using the export text option), but I can't figure it out on 9.

I run it through the OCR (tried all three options), but after it processes nothing seems different, I can't select any text on the pdf. I tried to save as a word doc, but it saves as a series of images. I tried to export as text, but it creates a blank txt file.

Thanks!

My Product Information:
Acrobat Pro 9.0, Windows
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Hi MoreToast,
The OCR output is typically layered on the image.
As the OCR characters are in mode 3 (invisible) text you'd not expect to see anything different as you are still viewing the image.
After performing an OCR of the PDF try the Examine Document feature to view the invisible text that one would expect to be present.

Quote:
If you want to examine every PDF for hidden content before you close it or send it in email, specify that option in the Documents preferences using the Preferences dialog box.Choose Document > Examine Document.Hidden Text This item indicates text in the PDF that is either transparent,
covered up by other content, or the same color as the background.
To view hidden text, click Preview. Click the double-arrow buttons to
navigate pages that contain hidden text, and select options to show
hidden text, visible text, or both.
More details are available via the on line help system at
[url]http://help.adobe.com/en_US/Acrobat/9.0/3D/WS7E9FA147-10E3-4391-9CB6-6E44FBDA8856.w.php[/url]

If after performing OCR & a file save then using
Examine Document > Hidden Text > Show hidden text reveals no characters than something may be
amiss with the install of Acrobat (?).

fwiw -
Some expected behaviors for various scenerios when exporting to MS Word of saving as a text file:
A PDF containing the scanned image of text - Not OCR'd
Export to Word - the image is exported to Word.
Save As to a text file - the *.txt file(s) are empty. ASCII text files are not able to "hold" the image.

A PDF containing the scanned image of text - OCR'd
Export to Word - the OCR'd lines of content are exported to word.
Word wraps these in form fields (which support positioning of text to give some semblence of layout).
Save As to a text file - The OCR characters for the content of the *.txt file.

Be well...

Be well...

MoreToast
Registered: Nov 26 2008
Posts: 5
That worked for me. Thanks for the help!
rbogie
Registered: Apr 28 2008
Posts: 432
for results possibly superior to the 'export' method, experiment with the following: Approach #1) on an OCR'd pdf page do a 'select all' and copy; then paste to a blank word doc. you may need to reset margins and/or font size. you'll probably get a result that requires less work to get the word page looking right. Approach #2) export pdf to TIFF (file> export> image> tiff) and open tiff in MS office document imaging (assuming you have MS office 2003 and up); do a 'send text to word'.