These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Masked text from indesign coming up in Acrobat search

scotsman2
Registered: Aug 2 2010
Posts: 16
Answered

I have illustrator documents placed into Indesign with areas masked off. When I create a PDF and perform a search, hidden text that is behind the masks appears in the search. How can I prevent this happening?

Thanks!

Andrew

My Product Information:
Acrobat Pro 9.3.1, Macintosh
gkaiseril
Expert
Registered: Feb 23 2006
Posts: 4307
"Masking", like making tape, is not the same as 'Redaction' You can use the Redaction tool in Acorbat Pro to find and remove that unwanted data along with other hidden data.

And a "Highlight" or form field cover is also not a way to redact text from a PDF.

George Kaiser

scotsman2
Registered: Aug 2 2010
Posts: 16
Thanks for the reply. I investigated the redaction tool, but unless I have misunderstood it I don't think it will help me.

I want only text I can see in my document to be search-able, not invisible text. And there is a lot of invisible text in my 100 page doc.

This also goes for selections. I seem to be still be able to select text that is not visible, making selecting things on the top visible indesign layer impossible.

Thanks!
gkaiseril
Expert
Registered: Feb 23 2006
Posts: 4307
But if you want that masked text not to be searchible you have to remove it. Leaving it in the PDF, even if masked, makes it available to the search tool in the PDF.

OCR can produce an image of a text page PDF, an image with hidden text (the searchible and TTS text) PDF, or a text PDF. Hidden just means not shown. It does not mean not searchible. And this is by design to provide the greatest usability of the PDF format. But this also means the PDF format will not meet 100% of users needs.

George Kaiser

scotsman2
Registered: Aug 2 2010
Posts: 16
Thanks again for the advice. I do want the text removed. I was hoping that there was an easy way to do this in the actual pdf creation process, or at least an option such as do not index hidden text.

It just seems to create a big mess.
scotsman2
Registered: Aug 2 2010
Posts: 16
Also I am not totally sure why allowing text that you cannot see on the page to be searchable and selectable is even a feature in Acrobat. I would imagine that 99% of users would NOT want this.
gkaiseril
Expert
Registered: Feb 23 2006
Posts: 4307
Because OCR programs and devices create page images with hidden text underneath the image. This for use by the accessibility feature of Text to Speech for the visually impaired, or for page images that can not be completely properly displayed as text and hidden text is provided for searching. There are even document management text that is placed in PDFs with a text color of white so only "those in the know can access" this data.

It is a good thing you do not know how much information is hidden in MS Word that you or the Number 10 Downing Street does not want disclosed but constantly discloses to the public.

George Kaiser

scotsman2
Registered: Aug 2 2010
Posts: 16
I see. Very interesting. These still look like marginal use-cases though, punishing the majority to accommodate the minority. It would be useful to be able to switch this off.
gkaiseril
Expert
Registered: Feb 23 2006
Posts: 4307
Not only is this an OCR issue, but in the U.S., the federal courts require documents in the PDF format. And those documents are public but could contain very sensitive data about individuals with unfounded accusations about them, confidential informants, national security items that can not be released and in the paper world these items were cutout or covered to totally remove recovery of this data. Allowing the ability to electronically resurrect this data is unacceptable.

This ability to remove sensitive data also applies to intelligent agencies, newspapers, schools, etc.

As to making it optional, well there are tools that can completely display any content within the PDF. So if you leave any trace of an item you do not want to disclose, you have to completely remove that data.

This the way the application and format works. So if it does not meet your needs find another product.

George Kaiser

scotsman2
Registered: Aug 2 2010
Posts: 16
"So if it does not meet your needs find another product."

That replaces the pdf format that has become an industry standard? I don't think so. There is no way I could communicate with my clients another way.

"This is the way the application and format works." I don't buy that. Applications change, improve, and evolve to match users needs. And there is generally a fix somewhere.

Looking again at my issue if I embedded the illustrator files as bitmaps rather than vector files it would solve my problem. However there would be a trade off in image quality and greater file size.

I think I lost you a bit when you started talking about national security. I don't think this really relates to my simple problem. I was just talking about 'Switching off the indexing/searching of non-visible text' post-pdf creation, or an option when creating the pdf to discard any non-visible items.
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
If you're talking specifically about Illustrator files (AI,EPS) with text in them, simply convert the text to vector outlines in Illustrator - that way they're still perfect quality, but Acrobat won't find any "real" text to index.
scotsman2
Registered: Aug 2 2010
Posts: 16
An excellent idea. Thank you!