These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

OCR and Screenreaders

melanie7810
Registered: Jun 11 2010
Posts: 3

I am working with an individual who is blind and we are trying to use OCR to recognize scanned documents he receives for his job. We follow the directions on Acrobat to convert the scanned PDFs into recognized text, but it seems to fail and is not read by JAWS.

Is this possible and if so, what could be the reasons?

As a note, we are working with a temporary copy of Acrobat Professional. The individual did not want to purchase the product if it did not work for him.

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Hi,
How did you do the OCR?
Searchable Image, Searchable Image (Exact) or ClearScan?
The first two lay down a hidden layer of characters to accompany the scanned image of the text.
As OCR is not, at its core, "format" / "layout" / "grammar" aware it is dropping in its "best-estimate" of what each character is.
Often this can be very good. Often this can be very confusing.
A lot of variables associated with source paper quality and the actual scan device.

If you used either "searchable" choice then try a Save As to a text file.
Open the text file - you'll see exactly what the OCR output is.

Using ClearScan you get replacement characters for what is recognized.
Sometimes ClearScan "thinks" it has it ok but is not sure - this stuff becomes a "suspect".
With Acrobat you can manually correct "suspects".
Sometimes ClearScan just "does not get it" for some characters.
These are left as a bitmapped image.
In such cases, it does not matter if "we" precieve the characters as "ok" - 'cause "we" are not doing the OCR, eh?

AT cannot "read" images (e.g., bitmapped characters). That's what Alternate Text is for.
That brings us to "tagged" PDF.
For any AT (JAWS, Windows Eyes, NVDA, etc.) to be effective with a PDF the PDF must be a reasonably well-formed "Tagged" PDF.

While one can try to "tag" the hidden text output of either "Searchable Image" choices, it is really not productive (... yes, it is something I've played with - I'm odd that way ).Adobe's online "how-to" for scanned text content > OCR > accessible all leads you to use
of "Formatted Text & Graphics" (Acrobat 8) or ClearScan (Acrobat 9).
You use either of these. Then have Acrobat "tag" the PDF's page content (you get a "best-estimate" - sometimes pretty good - sometimes not - a function of what the content is)
Or, you tag it manually. This can give very good results; if you are fairly comfortable with the "rules of the road" for Tagged PDF as described in the ISO PDF Standard (ISO 32000-1) or PDF References that came before ISO acceptance and have some run-time doing it.
Sounds harder than it is .
It is really just a function of how simple or complex the PDF page content is.


If there is any way copies of the source "authoring files" could be obtained you'd get a better outcome.
Get these into Word or InDesign or FrameMaker. Output accessible tagged PDF from one of these.
Most times, most of us have Word available.
MS Word files (in Office 2007) can source ok and better accessible tagged PDF via Adobe PDFMaker (installed with Acrobat) or via the Office 2007 Save As PDF (with accessibility options "on").
Also, while I'm not a user, I've read that Open Office is also a "player".




Be well...

Be well...

melanie7810
Registered: Jun 11 2010
Posts: 3
I'm going to try the save as text method, this might be perfect!
Thank you!