These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

OCR recognition Error

uncletim
Registered: Nov 2 2007
Posts: 2

I have an error when attempting to do OCR Recognition with Acrobat 8 Professional that states, "Acrobat cannot replace the current file with the existing file because of the following error: This file cannot be found
 
This could be due to low memory or a low disk space situation"
 
I have neither low memory, nor low disk space, I am an Administrator on the machine I am using and I can find no other reason for this error. The file is only 52mb. Please help

My Product Information:
Acrobat Pro 8, Windows
pddesigner
Registered: Jul 9 2006
Posts: 858
Here are a few notes and tips that may help.

Note: Pages scanned in 24-bit color, 300 ppi, at 8-1/2–by-11 inches (21.59-by-27.94 cm) result in large images (25 MB) prior to compression. Your system may require 50 MB of virtual memory or more to scan the image.

Note: If this file is being appended to a PDF document, and you use "Save", the scan remains uncompressed. If the PDF is saved using "Save As", the scan will be compressed.

Tip: Run a Repair Acroat Installation, get the latest updates if necessary.

Tip: Check your virtual memory settings.

Suggestion: Scan the paper doc as a seperate PDF file and add it to the master PDF file.

My favorite quote - "Success is the ability to go from one failure to another with no loss of enthusiasm.

kristacreel
Registered: Dec 2 2011
Posts: 3
I'm using Acrobat 9 Pro. When I run my OCR recognition on my scanned pdfs to make them Accessible, it appears to work. However, it does not read back correctly and does not find any OCR suspects. Is there a solution to this? I have several scanned pdf's that are grouped into one pdf that need to be made accessible.

kristacreel
Registered: Dec 2 2011
Posts: 3
I'm using Acrobat 9 Pro. When I run my OCR recognition on my scanned pdfs to make them Accessible, it appears to work. However, it does not read back correctly and does not find any OCR suspects. Is there a solution to this? I have several scanned pdf's that are grouped into one pdf that need to be made accessible.

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Acrobat 9 provides three modes of OCR. Searchable Image, Searchable Image (Exact), and ClearScan.
The first two output a layer of invisible/hidden text. ClearScan replaces each recognized character image with an internal font [Fd(nnnn)]. The invisible/hidden text layer is what it is - there are no "suspects". Prior to Acrobat 9's ClearScan there was Formatted Text & Graphics. As with ClearScan this OCR mode replaced the images of characters. Finding & correcting "suspects" helped cleanup Formatted Text & Graphics output. It can do the same for ClearScan output. However, ClearScan is much improved compared to Formatted Text & Graphics. In Acrobat X ClearScan reflects even more improvement. Consequently, for most "decent" hardcopy sources of textual content you will observe that "Find Suspect" most often finds nothing.
.
As to post-processing the OCR output for accessibility.
Do not rely on a programmatic "make tags" — what you get will be a soup sandwich.
.
In the Tags panel perform "Create Tags Root". From the Options menu use "New Tag" - use [Document].
Use TORU to select each page in turn and, with the entire page marqueed, select "Background". This makes everything an artifact. Now for each page, use TORU to select what you "see" as text (for other than ClearScan output this is the image of characters) as you select the OCR output gets the blue highlight. Make the desired selection from the choices on TORU (Text, Figure, etc.). If any Figure tags are added you must add appropriate Alternate Text. Same if Forumula is used. If Table is used you must manually add each row and cells within a row and properly designated the header row. As well, for Table, identify the value for Scope for each TH element and provide the Table element with a Table Summary.
.
Once done with TORU, go back into the Tags panel and groom the structure tree. Ensure approriate use of each element. Example: the "H2" element is required to have a parent "H1". Are the elements (tags) usage semantically correct? Is the element hierarchy in the structure tree correct? — While no AT application is currently made to take full advantage of a well-formed structure tree (that is to say the wind fall fruit tends to be what is harvested rather than the full yield of the tree)if the Tags panel structure tree is not at least "workable" you'll be providing an unusable document.
.
"Reading" of the PDF.
First, with a PDF of a scanned image available perform OCR. Do this for three files using each OCR mode.
For each do save as. For each do Export — Text — Text (Plain).
View each text file.
Or (for output from Searchable Image / Searchable Image (Exact), do a Document — Examine Document
.
Once the Examine Document is done expand (click the "+") the "Hidden text" entry.
Click "Show Preview".
.
The more you do this the more you come to understand that OCR does not always correctly recognize characters AND it recognizes "stuff" in the image that has nothing to do with any textual content.
.
What you see in an exported text file or via Examine Document is what will be "read".
.
You dump the junk by making everything on a PDF page an artifact (TORU "Background"). You retrieve the OCR output associated with strings of text by process mentioned above.
.
You can obtain a workable structure tree. It does require time and a comfortable understanding of Section 14 in ISO 32000-1. The larger the page count the greater the time and effort.
.
From experience I know that I can rekey textual content faster in FrameMaker which provides upfront support for Tagged output PDF. Even with the requisite post-processing to provide a well-formed structure tree it takes much less time than the time required to strive for a "workable" structure tree associated with scans of textual content that have OCR applied.



Be well...