These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

PDF's that are searchable

mmckone
Registered: May 21 2007
Posts: 3

I am planning a project to scan paper copies of a newsletter. I want to create pdf files that are searchable. Are PDF Normal files searchable? or do I need to use the "searchable pdf" format? Also, if PDF normal files are searchable, is this because OCR is performed?

My Product Information:
Acrobat Pro 7.0.8, Windows
dthanna
ExpertTeam
Registered: Sep 28 2005
Posts: 248
Please don't take this the wrong way, but the question, as posed, doesn't make much sense.

Let me give a lengthy explanation and hopefully that will clarify things a bit.

A PDF is a container. Like most containers, you can put a number of different things inside of it. The biggies are Text and Graphics. Graphics comes in two flavors (for this discussion) vector - that would be line art - and bitmap - pictures. Go along with me on this...

Text, as you see in a PDF, can come in a couple of forms. Glyphs of a font based on underlying text, outlines (vector graphics) of text and pictures (scanned images/bitmap graphics).

Only the first one (glyphs/fonts/text) is "natively" searchable in a PDF. No OCR needed here. You are searching on the underlying text. The text underneath the font.

Scanned document images must first be OCR'd before being searchable.

Outline text tends to be difficult to OCR and be searchable.

So, to directly answer your questions -
Are PDF Normal files searchable? Yes - so long as they contain underlying text or have been OCR'd if they are a scanned image.

do I need to use the "searchable pdf" format? No - this is a search index that is placed indide the PDF to speed up searches of already identified text.

Also, if PDF normal files are searchable, is this because OCR is performed? Depends - If it is actual text, no OCR is needed. If it is a scanned image, you need to OCR it before you can search on it.

Does this help?

Douglas Hanna is a member of the Production Print Technology team at Aon.
www.aonhewitt.com

aleluis
Registered: Oct 11 2008
Posts: 4
Hi dthanna, I have a question that may be related with the post. I need to find out wether a PDF is searchable or not. I'm developing an application following IEEE specificaions abouts document that must be searchable. This specifications tell that a pdf, in order to be searchable, must have all fonts embedded and using Built-in encoding, Ansi, MacRoman, etc.
I have a PDF that meets this requirements (viewing de Document Properties, the tag fonts), but parts of the text are not searchable, and some are not even selectable. When you copy some parts of the texts into a new archive, it's shows weird characters instead of the selected words. Is there any deep specification as wether a pdf is searchable or not?
Thanks.
khushi.sshaikh
Registered: Nov 21 2008
Posts: 1
Hi,

One of my client has 1000s of individual pieces of paper –mostly 8.5 x 11- in hundreds of files. Some are two sided. They want to scan all those and digitalize them. All can be scanned B&W.I want to know is your product can indexed these scanned files? Will all those scanned files searchable from a Windows OS? How would it handle pictures in it. How does it handle the contents?
I am confuse which of your product best suit for this requirement. Can you please guide me ? Which of your product be best suit for my clients requirement? Can you please send me any case studies or any of your client's history with the same requirements?
What other advantages I will get on buying your product?


Regards
Khushi