These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

How to speed up OCR-process?

stronky4p
Registered: Dec 21 2007
Posts: 4

Hi,

I have to scan about 200 documents a day. I use a document feeder, which scans 20 pages per minute. Adobe Acrobar Professional 8 OCRs the scans at 20 pages per 3 minutes. So, 20 pages is 4 minutes, 200 pages is 40 minutes!

Is there a way to speed up the OCR-process?

I already have a fast computer (Intel Core 2 Quad Q6600 processor and 4 GB RAM), with Windows Vista. I extensively tested the different scan-options of Adobe, and found out that best results are: input resolution: 200 dpi, input colours: graycolours, document optimalization: high quality (the arrow is at the right side), OCR-option: searchable image (exact).

The files are legal contracts, so the scans have to have the same shape as the original papers, so scanning as text-and-image is not suitable. And after scanning, a program 'reads' the pdf-files, so they have to be OCR-ed.

Any suggestion how to speed up the process in Adobe? Maybe any suggestions for other software, which can scan and OCR documents faster?

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Some observations...

If it is in the budget, pick up a Kodak 3520 (or whatever is today's Kodak equivalent).

Canon ImageRunner 5000 series do an extremely fast & good job of providing the scanned images for input to Acrobat.The Canon ImageRunner 5055 will scan & OCR the paper then email/fax/ftp the output. Can select different scan resolutions. Can select Searchable Image (Exact).When a LAN/WAN environment is available, AdLib does an exceptionally fast (and fine) PDF Searchable Image (Exact). A network scanner could feed its output to AdLib, AdLib processes it and parks it back to your directory on a server.

Adobe Capture Cluster Edition.
Once configured, very nice (and fast OCR).
Can configure to provide PDF Searchable Image (Exact) & a text file holding the OCR'd characters.
Five scanners/desktops can feed the desktop that has Capture Cluster installed.

jmo -
You might be better served (over the long haul) to go with an effective resolution of 300 ppi. 200 ppi will get you in the ballpark for "legal" stuff (I guess); 300 ppi will give a "boost" to the OCR process.

If by "read" of the PDF you mean for Section 508 Accessibilty, be advised that OCR is not going to be the same thing as the text on the hard copy.
It is an education to compare the OCR characters with the hard copy (and that is a "clean" hard copy processed by a high end production scanner).

:)

Be well...

lkassuba
ExpertTeam
Registered: Jun 28 2007
Posts: 3636
Rick Borstein also has some suggestions on his blog to speed up the OCR process at:
http://blogs.adobe.com/acrolaw/2007/06/troubleshooting.php

Lori Kassuba is an AUC Expert and Community Manager for AcrobatUsers.com.