These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Automate OCR Batch

nwotter
Registered: Jul 16 2008
Posts: 5

I have several documents, some have OCR text and some do not. I would like to batch OCR the ones that do not have OCR text, but would like to have the process ignore documents that have OCR text.

In Acrobat 8 I can see that if I Examine the document it will show hidden fields for the OCR documents. Can I use Javascript to execute this command and then make a decision to OCR or not? Or is there a better way?

Thanks.

My Product Information:
Acrobat Pro 8.1.2, Windows
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
More of my natterings here; but, I have been able to run Acrobat's OCR across previously OCR'd PDF files.
So, I'd think that you'd be able to run the batch process across the contents of a folder/directory that contained a mix of image only and OCR'd PDFs. More specifically, this sort of grouping (that I've processed) was image only and searchable image exact PDFs.

Select Advanced > Document Processing > Batch Processing
In the Batch Sequences dialog, click on "New Sequence..."
Name the sequence (let's call it "OCR"). Click OK.
In the Edit Batch Sequence -OCR dialog, click "Select Commands..."
At the left, locate and select Recognize Text Using OCR, click on the "Add >>" button.
Click the "Edit" button.
In the Recognize Text - Settings dialog, select the desired Primary OCR Language, select the PDF Output Style (Searchable Image, Searchable Image (Exact), or Formatted Text & Graphics), and select a Downsample Images value.Remember, downsample will alter the image. If you've something that is going to a Federal government agency who in turn submits to NARA, the downsample is a no-no.

When done, click on OK.
To run the batch sequence unattended leave the box (right side, next to sequence name) empty.
Click OK.
Back to the Edit Sequence - OCR dialog.
Look over items 2 and 3 to see if you want to change the defaults.
Look over Output Options to see if you want to change the defaults.
Click OK to leave the Edit Batch Sequence - OCR dialog.

Back at the Batch Sequences dialog now. So, select Run Sequence and browse to the folder with your PDFs. Let the sequence run, check out the results, and grin.

Be well...

Be well...

nwotter
Registered: Jul 16 2008
Posts: 5
Well, I kind of figured that I *could* run OCR on previously OCR'd files but some of them are rather long and there are quite a few documents and the process takes time.

Just trying NOT to burn unecessary CPU cycles here... but if that's the only way then I suppose that's the way.
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
How do they say it down in Maine? "Ayup"
Actually, it is likely some other method exists. I'm still a student of Acrobat myself.

On the up side, the batch sequence does not need you in attendance.
Kill all unnecessary processes, run the batch, and go to Aunt Catfish for lunch.
Job is done when you get back.

Be well...

Be well...