Adobe Acrobat

2009-10-01 07:48:38

bryan_unger

Registered: Sep 17 2009

Posts: 14

Answered

I scanned a large 8 1/2 x 11 document upside down accidentally. When I tried to save it as a Searchable PDF, the OCR text layer is rotated 180 degrees relative to the scanned image. This occured whether I had rotated the original scanned PDF 180 degrees before or after saving as a Searchable PDF.

The PDF software I'm using to make the Searchable PDF is Nuance PDF Converter Professional 5.1. However, this seems like such a good forum I thought I would post. However, solutions using either Acrobat 5 or 7 are welcome.

Is there anyway to rotate the OCR layer 180 degrees to be aligned with the scanned image layer? A batch process using Javascript perhaps? Please provide the code and instructions how to run as I'm not experienced with programming.

Thanks in advance for any assistance rendered.

2009-10-01 18:03:00

rbogie

Registered: Apr 28 2008

Posts: 432

run 'examine document' tool (on document menu in AA8). remove hidden text. reapply OCR. report back to forum.

2009-10-02 05:05:18

bryan_unger

Registered: Sep 17 2009

Posts: 14

I don't have access to AA8, only 5 and 7, as I mentioned in the original post. Thanks anyway. Any other suggestions?

2009-10-02 11:33:15

rbogie

Registered: Apr 28 2008

Posts: 432

i think 'examine document' is supported in AA7. search in 'help' for 'examine'. read under heading 'examine a pdf for hidden content', then follow previous suggestion.

2009-10-06 20:31:26

rbogie

Registered: Apr 28 2008

Posts: 432

beyond my previous suggestions, there is another solution to the 'rotated' issue -- if you are still interested in knowing the range of possible ways to address this issue.

2009-10-08 06:21:46

bryan_unger

Registered: Sep 17 2009

Posts: 14

Much obliged rbogie. I finally had an opportunity to followup on your Acrobat 7 suggestion. I was unable to locate the information in the help file. Searching for examine returned no results. I tried some other search terms but was unable to find information similar to your description.

I searched the acrobatusers.com site again and found this:

http://www.acrobatusers.com/tutorials/cleaning-your-pdf-documents

which mentioned the examine feature in Acrobat 9 is essentially a component of the PDF Optimization process. PDF Optimization is found on the AA& Advanced menu, so I gave it a shot. There didn't appear to be any settings relevant to the issue except on the Discard Objects, which had a "Discard hidden layer content and flatten visible layers option." I reviewed the help file on this option and didn't think it would help, and when I tried it, this was confirmed. The OCR text remained.Out of curiosity I pulled up the Content navigation tab. Here, the text objects on each page were visible, along with another object called "annotations". Based on some additional review of the formums, this text can removed using the navigation tab or through the Touchup Text tool. Obviously, for a large document this would be quite a tedious process. I also read that redaction is an option, but AA7Pro doesn't have this capability.

Back to searching the forums...

And I stumbled on one of your replies to another poster...

http://www.acrobatusers.com/forums/aucbb/viewtopic.php?id=16461

...which recommended printing to image. I had tried this earlier with the Nuance software and AA5, and it was painfully slow. Having nothing to lose I tried it on AA7Pro and this worked. Anyway, to make a short story long, I've written up the final solution in a separate post and accepted it as an answer. However, there are still some loose ends. Feel free to provide additional suggestions or references to other forum posts.

2009-10-08 06:28:45

bryan_unger

Registered: Sep 17 2009

Posts: 14

1) Print document to image to remove the OCR text. (Advanced button on the print dialog)
2) Run the Document / Recognize Text as OCR command. I generated a Searchable (Compact) PDF but the other two options may work as well.

Caveats
1) Printing to image broke up the image object on each page into about 25 separate objects. This was observed on the Content navigation panel. This seems to be corrected when running the Recognize Text as OCR command; a single image object is shown in the Content navigation panel.
2) The document I was working with had a lot of Container objects. Most of the words were correct, and were searchable, but when I tried to run the Find OCR Suspects command, none of these objects were located, so I'm not sure how to correct them. Suggestions welcome.
3) The file size ballooned after each step. First from 30 to 80 MB after printing to image. Then 80 MB to 150 MB after running the OCR. Probably just needs optimization to reduce the file size.

Much thanks to forum member rbogie for his assistance in developing this solution.

2009-10-08 13:09:25

rbogie

Registered: Apr 28 2008

Posts: 432

another method: import the troublesome PDF to MS Office Doument Imaging (import and save as TIFF, not MDI). (MODI comes with MS office.) next, create a new PDF from the TIFF file. Then reapply OCR.

2009-10-08 14:27:01

daka630

Registered: Mar 1 2007

Posts: 1420

Hi,
rbogie's last post may be your best course of action as you have Acrobat 7.

It has been awhile since I'd used Acrobat 7 Pro so I looked it over.
No "Examine Document" to be found. Turns out that this is because it was introduced with Acrobat 8.
[url]http://blogs.adobe.com/acrolaw/2006/12/acrobat_8_new_e.html[/url]

fwiw, its features (in Acrobat 8 & 9) are:
Identify presence and, if desired, removal of:
Metadata, File Attachments, Annotations & Comments, Form field logic or actions, Hidden text (e.g., OCR text), Hidden layers, Bookmarks, Embedded search index, and Hidden page and image content.When you remove selected items, some other things are automatically removed as well.
Digital signatures, document information added by third-party plug-ins & applications, and special features that enable Adobe Reader to review, sign, and fill in PDF documents.Be well...

Be well...

2009-10-08 19:27:41

gandaria

Registered: Oct 8 2009

Posts: 2

Image To PDF OCR Converter is an application based on Windows platform, which can directly convert more image formats, such as TIFF,JPG,GIF,PNG,BMP,PSD,WMF,EMF,PCX,PIC and so on,into text searchable PDF format, it supports manifold conversion ways, and automatically clear and skew-correct by adopting special technique for B/W images.

____________________________
[url=http://www.DigBands.com/store/]Band Merchandise[/url]

These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Searchable PDF - Rotate OCR Level