Adobe Acrobat

2010-03-22 09:46:04

kenwiens

Registered: Jan 5 2010

Posts: 15

I have a number of scanned and OCR'd pdf documents. Due to an unusual requirement, I need to ensure that a particular name is not included in the index. For example - let's assume the name is "Smith". I want the entire document to be indexed and searchable (including the pdf search - but primarily google search) - but a search for the word "Smith" must be unsuccessful. Is there any way to accomplish this (short of modifying every occurance in the original documents)?

My Product Information:
Acrobat Pro Extended 9.3.1, Windows

2010-03-22 10:05:09

daka630

Registered: Mar 1 2007

Posts: 1420

Hi,

To exclude a word from a Catalog index go into Acrobat Preferences prior to building the Catalog index.
For the category "Catalog" - click the "Stop Words..." button.
In the Stop Words dialog, enter the Word(s). "Add" each.
This builds list of words to exclude from the index.
However, any user of Adobe Reader or Acrobat may perform a "Find" that would show the presence of the word(s).

As to Google search of a PDF.
If the PDF can be cataloged by Google, then all content is cataloged.

To assure a word or a text string will not be available use of the Redaction tool may be called for.
The redaction tool, used properly, will fully remove the word or text string selected.
Thus, nothing for a search engine to harvest.

Be well...

Be well...

2010-03-22 13:04:34

kenwiens

Registered: Jan 5 2010

Posts: 15

Many thanks for the suggestion and the explanation.

I have been reading over the redaction documentation in my manuals and may have to use this. Unfortunately, that will alter the original documents. The problem here is that this is a collection of 50 years of newsletters from a society, where one person needs to not be found. We want to distrubute the newsletters to all the members (as part of the 50 year celebration), and realize that eventually someone will likely put these on the internet. Once google has indexed them, this persons name would show up in a search. However if I use redaction to remove the name, then I am altering the original document. I think what I need is a way to OCR the document, but not OCR that particular name. Is there any way to do this? Is there an easy way to replace the word (for eg "Smith") with say an image of a handwritten "Smith" so that the OCR engine couldn't interpret it?

Thanks

2010-03-22 14:14:48

rbogie

Registered: Apr 28 2008

Posts: 432

there is not an easy way to remove the invisible OCR'd text layer "smith" and leave the associated image of the word "smith". Given that you want to remove the searchble text "smith" but retain the image, you cant use redaction tool because it will remove the text and burn a mask over the image. What you want to do can only be done one "smith" at a time. here's how: first search the document for all instances of "smith". next, display the 'content' panel (navigation pane [F4]); for every page where a hit occurs expand that page's content tree (click '+'); then expand the text container (the OCR results) and locate the string "smith" and hit delete. repeat for all instances of "smith".

These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

How can I creat index exceptions?