These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

OCR export to Acrobat Pro

eRes
Registered: Mar 4 2010
Posts: 4
Answered

I use Abbyy FineReader OCR to scan documents and export to Acrobat Pro 8.1.7/8.2. I've noticed that these documents now have duplicate sets of tags. When saving as .txt (text accessible), only one instance of the text shows up, but the duplicate tag sets make it difficult to identify in which set to add alt text to figures. Has anyone else experienced this?

Also, when adding alt text to figures, the alt text does NOT show up in the .txt (text accessible) file. Anyone find this as well?

My Product Information:
Acrobat Pro 8.1.7, Windows
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Hi,

I suspect the issue is associated with the "mix" of Abbyy FineReader output and Acrobat edits.

When I Save As to Text (Accessible)(*.txt) from my installs of
Acrobat Pro 7, 8, ver.8 3D, or 9 Pro Extended I have and do get the Alternate Text description.


Be well...

Be well...

eRes
Registered: Mar 4 2010
Posts: 4
Thank you for your response. Do you have any suggestions for a fix or work-around?
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Hi,

Sure, I'd scan direct to/with Acrobat Pro.
I'd keep AFR out of it (have had it, liked it, in days gone by I used to use it).
I'd use Acrobat Pro for my OCR.
If the PDF had to be a Tagged PDF I would consider the type and magnitude of content being scanned.
Letting Acrobat or, for that matter, any application "auto tag" OCR output is a bucket of woe.
If scanned, textual content is to be provided via a Tagged PDF (say for Accessibility) then I would have to use ClearScan. Fix up the "suspects". The characters that cannot be 'captured' are left as bitmaps.
These would have to be "folded" into the Tagged PDF's structure tree manually using the Figure element and appropriate Alternate Text description entries.

Doing Tagged PDF from scanned input content is always a "handcrafted" activity.
No shortcuts, no options.
Application imposed "tagging" is someone's "best guess".
A "best guess" on OCR content rarely, if ever gets it done properly - always cleanup
& cleanup can often take longer than using the source paper as a reference and
re-mastering in an authoring application followed by output of Tagged PDF &
requisite QC post-processing with Acrobat Pro.

Be well...

Be well...