Adobe Acrobat

2011-01-13 08:52:15

melthebikerchick

Registered: Jan 13 2011

Posts: 1

Wondering if there is an easy, efficient way to identify searchable PDFs as opposed to image-only PDFs. The only way I know to check for this is to open the PDF in Adobe and try a search. If it says 'not found' and I'm looking at the word in the document itself, I know it's just an image. Surely there is a better way to determine this?

I ask because we want to register/create searchable PDFs only in our EDRM/ECM system (HP TRIM). Right now our employees don't know the difference and it's a pain to replace PDF images created by other line-of-business systems/applications or scanners.

To make it even more convoluted, I use Acrobat Pro 9 but our employees use a mixture of Acrobat Pro versions as well as Reader 9.

Thanks in advance -- any and all suggestions will be greatly appreciated! Melinda

My Product Information:
Acrobat Pro 9.0, Windows

2011-01-13 09:12:07

gkaiseril

Registered: Feb 23 2006

Posts: 4308

You can open the PDF and sum the words on each page. If the result is a value greater than 0 then you have search able text within the PDF. This technique can be added to Acrobat or Reader as a menu item or tool bar button.

The Acrobat JS API Reference has an example for counting the number of words in a document. This code will only work on version 5 or above.

Will your system accept a page image with hidden text behind the page image?

George Kaiser

2011-01-13 13:04:35

daka630

Registered: Mar 1 2007

Posts: 1420

Any user of Adobe Reader or Acrobat (any version) can perform a quick litmus test.
Just use "Select All" (Windows Ctrl+A).
See page 1 of this PDF:
.

a check listPrior to parking PDFs they can be run through an Acrobat 9 Pro Preflight that would check for searchable text and hidden (OCR output) text. If neither is present in a PDF then you have a candidate for OCR.

Be well...

These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Inquiry: How can one (easily) tell whether a PDF is searchable or image only?