These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

OCR Batch processing

iaingblack
Registered: Aug 27 2007
Posts: 2

Hi,

Quick question.
We have a folder a PDF's which have been scanned from our photocopier and saved as PDF's but are essentially image files. We have purchased Acrobat 8 Professional to allow us to OCR said documents so we can search them.

I am able to set a batch process job to do our entire folder of PDF's, which is great, but i've been testing things out and if more PDF's are added to the folder and I re-rerun the job it OCR's ALL of them again, not just ones that need OCR'd. So, I was hoping I could add a piece of javascript to the batch job to search for a word such as 'the' and if it finds i the document must have been OCR'd previously, it can be skipped. However, i'm not sure if I can chain the batch process like this, there seems no way to have it conditionally do something based on the javascript return, or is there?

I'm just looking for a little help on wether it is possible before I spend lots of time on it. I know there are ways around this by manual methods (Batch OCR, move, delete old ones etc..) but I would like to set it up on this fashion so I can run it once a week and forget about it!

Cheers
Iain

Marceepoo
Registered: Sep 26 2007
Posts: 7
I'm trying to do the same thing, i.e., I want to create a javascript that I'll trigger from a VB.Net program. In other words, I want to have the VB.Net program tell the javascript to OCR a particular file, and then to save the file to a .txt format, and inform the VB.Net program that the job has been completed.

Any tips on how to make such a javascript would be much appreciated.

Thanks,

marceepoo
stronky4p
Registered: Dec 21 2007
Posts: 4
To Iain: can't you just make a work around? Just make two folders: one for the 'image'-pdfs, and another for the 'OCR-ed' pdf's? (you don't have to do this manually, just select another output-folder).

I also want to something like you and marceepoo described:
1. Scan lots of documents (contracts) with a document feeder.
2. OCR each document
(3. if possible: read part of the OCR-ed text (i.e. the contractnr) and put this contractnr in metadata)
4. Place each scanned document in a seperate file

Maybe someone has a javascript that can scans and OCRs documents?

Leonard
Marceepoo
Registered: Sep 26 2007
Posts: 7
Does anyone know the name of a book that includes javascript or vb code that we could use to get Acrobat to create a pdf document from a scanner?