Adobe Acrobat

2010-04-22 10:10:31

jwheaton

Registered: Apr 22 2010

Posts: 6

Hi,

I have many files which are PDF Normal - they have a text layer. Most times, this text layer does not include all of the text, only a logo etc. I need to OCR these files, but there is a problem when the file is PDF Normal.

I would like to only save the image with no text layer, as PDF, so that I can batch OCR these files.

I have tried the examine document function in Acrobat Pro 9, removed everything, confirmed by re-opening the file - but the text is still selectable, and the OCR process fails because the file is still "PDF Normal"

Any ideas would be great.

Thanks

(ps - I have access to acrobat pro 7.0.5 and 9.3.1)

My Product Information:
Acrobat Pro 9.3.1, Windows

2010-04-22 21:10:36

daka630

Registered: Mar 1 2007

Posts: 1420

Hi,

A PDF 'Normal' (older terminology) is a PDF containing PDF page content provided by some authoring application's file which was converted to PDF.
The text is not a 'layer'; rather, it is an inherent part of the PDF page content.
As such Examine Document cannot remove it.
Nor can OCR process a PDF page containing such renderable text.

With Acrobat 8 or 9 Pro you could use the redaction tool to completely remove desired PDF page content.

You could print the PDF through Adobe Printer and, in the Print dialog, use the Advanced button to enter a dialog in which you can select Print as Image and select a desired resolution.
But, the text content of the original PDF will be present in the image held in the new PDF.
You could now OCR that text.

But why? You already have renderable, searchable text.

Now, if all you want are the logos, then redaction of text could give you that.
However, logos are typically graphic objects and such typically do not provide OCR with something to process for OCR character output.

Be well...

Be well...

2010-04-23 10:00:19

jwheaton

Registered: Apr 22 2010

Posts: 6

Well, the problem is that these are engineering drawings, for example. So they use our template, and they bought something from a vendor, and have pasted that image into the middle of our template - so, the stuff they typed into our template is good - searchable. But the more important info is a pasted image and needs to be OCRd. That's the best example I can think of.

Most of these are 11x17 or occasionally letter, but almost always, they are landscape orientation. I've tried the print to Adobe PFD and chosen print as image, and it works, but it rotates it to a portrait orientation (and our OCR process can already do that - this is what we are trying to get around - I need to find a way to turn the files into PDF while preserving the original orientation (i.e. 11x17 landscape OR a document with 500 pages with some letter/portrait and some 11x17/landscape)

2010-04-26 02:04:09

TonyPotter

Registered: Feb 1 2010

Posts: 85

A good way is to convert PDF to image using converter or save PDF as image using you Acrobat. You can easily make that work well.

I will try my best to help you in PDF converison fields, objectively and Neutral.

2010-04-26 08:39:08

jwheaton

Registered: Apr 22 2010

Posts: 6

I'm assuming by "save PDF as image" you mean as a TIFF or similar?

I'd like to save it as a PDF.

These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Convert or change PDF Normal into PDF (image)