I am planning a project to scan paper copies of a newsletter. I want to create pdf files that are searchable. Are PDF Normal files searchable? or do I need to use the "searchable pdf" format? Also, if PDF normal files are searchable, is this because OCR is performed?
Let me give a lengthy explanation and hopefully that will clarify things a bit.
A PDF is a container. Like most containers, you can put a number of different things inside of it. The biggies are Text and Graphics. Graphics comes in two flavors (for this discussion) vector - that would be line art - and bitmap - pictures. Go along with me on this...
Text, as you see in a PDF, can come in a couple of forms. Glyphs of a font based on underlying text, outlines (vector graphics) of text and pictures (scanned images/bitmap graphics).
Only the first one (glyphs/fonts/text) is "natively" searchable in a PDF. No OCR needed here. You are searching on the underlying text. The text underneath the font.
Scanned document images must first be OCR'd before being searchable.
Outline text tends to be difficult to OCR and be searchable.
So, to directly answer your questions -
Are PDF Normal files searchable? Yes - so long as they contain underlying text or have been OCR'd if they are a scanned image.
do I need to use the "searchable pdf" format? No - this is a search index that is placed indide the PDF to speed up searches of already identified text.
Also, if PDF normal files are searchable, is this because OCR is performed? Depends - If it is actual text, no OCR is needed. If it is a scanned image, you need to OCR it before you can search on it.
Does this help?
Douglas Hanna is a member of the Production Print Technology team at Aon.
www.aonhewitt.com