I have several documents, some have OCR text and some do not. I would like to batch OCR the ones that do not have OCR text, but would like to have the process ignore documents that have OCR text.
In Acrobat 8 I can see that if I Examine the document it will show hidden fields for the OCR documents. Can I use Javascript to execute this command and then make a decision to OCR or not? Or is there a better way?
Thanks.
So, I'd think that you'd be able to run the batch process across the contents of a folder/directory that contained a mix of image only and OCR'd PDFs. More specifically, this sort of grouping (that I've processed) was image only and searchable image exact PDFs.
Select Advanced > Document Processing > Batch Processing
In the Batch Sequences dialog, click on "New Sequence..."
Name the sequence (let's call it "OCR"). Click OK.
In the Edit Batch Sequence -OCR dialog, click "Select Commands..."
At the left, locate and select Recognize Text Using OCR, click on the "Add >>" button.
Click the "Edit" button.
In the Recognize Text - Settings dialog, select the desired Primary OCR Language, select the PDF Output Style (Searchable Image, Searchable Image (Exact), or Formatted Text & Graphics), and select a Downsample Images value.Remember, downsample will alter the image. If you've something that is going to a Federal government agency who in turn submits to NARA, the downsample is a no-no.
When done, click on OK.
To run the batch sequence unattended leave the box (right side, next to sequence name) empty.
Click OK.
Back to the Edit Sequence - OCR dialog.
Look over items 2 and 3 to see if you want to change the defaults.
Look over Output Options to see if you want to change the defaults.
Click OK to leave the Edit Batch Sequence - OCR dialog.
Back at the Batch Sequences dialog now. So, select Run Sequence and browse to the folder with your PDFs. Let the sequence run, check out the results, and grin.
Be well...
Be well...