These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Clearscan and Memory

irap
Registered: Aug 6 2008
Posts: 56

Hi,

I was wondering how clearscan is affected by the amount of memory the computer has? We have almost 200,000 PDFs without text.

I'm planning to use a dedicated computer (PC) to batch process the files.

Thanks.

Ira

My Product Information:
Acrobat Pro Extended 9.0, Windows
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Hi,

The use of a dedicated box would be prudent.
OCR is a resource intensive process.
Do not expect to be able to use a box, doing OCR, for anything else.
Process small quantities of files rather than trying to do all at one time.
(Big junks will result in a locked box - and it'll take time to figure out which files need to be done again.)
Do not point the installed Acrobat on the dedicated box to stuff out in network space.
Network "burps" will result in lost processing time &, again, time to figure out what got through and what did not.Put all the files on the local HDD or an attached USB device.
Start with small numbers of files and bump it up to identify an optimal quantity of files that can get done without spin-crash-burn.
Say this is 20 to 30 files. After about 6 passes stop, close Acrobat, open Acrobat and restart.
Once or twice a day, close out all. Shut down the box. A few minutes later start up and get the work flow going again.

Yes, a "pain" -- better than the mess and lost time determining where you are when you get a locked box.
Yes, that can and most likely will happen if you bite off more than Acrobat/Windows/the local machine's resources can chew.

Variables of concern:
Local machine resources (are unnecessary processes turned off - something that can be overlooked).
Size of PDFs to undergo OCR - bigger takes more resources and time.
OCR wants as much RAM as it can get & writes frequently to the HDD
Turn off screen saver, snooze, sleep, etc.

Alternatives
I've used Adobe Capture Cluster in the past. On a dedicated box. Worked (and still does work) very nicely.
Minimal attendance by a "warm body" is need.
OCR applications similar to Capture Cluster, on a dedicated box, will perform in a like manner.

But, the best bet for "big", unattended jobs would be Server based applications.
Three "providers" that come to mind are Adobe, Abby FineReader, or AdLib.

Be well...

Be well...

irap
Registered: Aug 6 2008
Posts: 56
Probably a silly question.

The documentation I found doesn't seem to indicate that Capture Cluster does ClearScan. Can you confirm that it does.

Thanks.
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Neither Capture nor Capture Cluster provide ClearScan.

Capture Cluster uses Formatted Text and Graphics rather than ClearScan's custom font set.
Unlike ClearScan, the font(s) used for Formatted Text and Graphics are not a custom set.
So, TouchUp Text tool usage is more immediately available.
Workflows can be configured to compare character images to the a collection of fonts selected from those that are on the local machine.
With a proper selection, the OCR output is very usable (readable).

Cluster edition comes with the Reviewer Tool and Capture Assistant.
Reviewer tool permits cleanup and enhancement (such as OCR suspect review, OCR correction, & document touchup).
Capture Assistant permits offload of labor intensive tasks that require a warm body.
With these two tools you have a more robust "tool set" than what is bundled with Acrobat.

[url]http://www.adobe.com/products/acrcapture/pdfs/aacfaq.pdf[/url]
[url]http://www.adobe.com/products/acrcapture/comparison.php[/url]


Downside -
Application is good through Windows XP only.
Cost.
Older "technology"

Upside -
a Solid,reliable application (sort of a "DC 3" piece of work if you will)
No exposure to network "burps" or server side oops.

Be well...

Be well...