These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

PDF's and SE spiders

john rothwell
Registered: May 19 2009
Posts: 3
Answered

Hi would someone be able to tell me if it is possible to prevent SE Spiders from accessing and indexing PDF documents that we load on to our Website as optional documents to Read or Download?

Is there an option in the PDF settings/properties that can prevent the spider access?

Many Thanks,
John R.

My Product Information:
Acrobat Pro 9.1.1, Windows
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
If the PDF is openable without a password, and it contains real text (i.e. not just a bitmap that looks like text) then Google et. al. will try and index it. You can persuade them not to by including the file or folder in a robots.txt file on the server (which Google obeys) but they may still scan the file, as Google inspects everything it can to see if websites should get its "this site may harm your computer" label.
john rothwell
Registered: May 19 2009
Posts: 3
Thank you for your prompt and clear reply, very much appreciated.

I suppose one thing I could do is turn the PDF into a Picture (JPG) and that way it can be read by a human but not by the SE Spiders!?

Kind Regards,
John R.
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
Yes, that would work. Although many spam robots can read via OCR, GoogleBot doesn't. There are other options if you have a long text document and don't want the huge file sizes from raster pages, such as beating Acrobat until it uses a non-standard font encoding sequence, but they tend to be a little obscure and time-consuming to implement. An entry in robots.txt will keep the big guys from listing it in their results, and that's what 99% of people are bothered about. It also means you can keep the document accessible (as you're in the accessibility forum, I'd better mention it!)