These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Batch OCR has to re-OCR all files every time...please say it isn't so

lacro
Registered: Jul 21 2010
Posts: 3

I have read quite a bit about this and am truly astounded that I would have to re-ocr every file each time I run a batch OCR on a directory. I am posting here because I am hoping I missed something and that seems so fundamental would be skipped.

Lets assume I have 5000 pdf files located in a directory with 50 subdirectories that have 200 files each. If I were to run a batch OCR on the main directory it would OCR all of the files in the subdirectories as well. I know because I have done it. Now if I were to add some random, image only, pdf files to some of the subdirectories I would have to re-ocr EVERYTHING?

That might be fine for 50 files, but when you might have 10000 pdf files, it seems that there must be some way to identify which files have already been OCR'd and skip them and only OCR those files that have never been OCR'd. If that's the case, it would take far too long to ocr the files each time.

Ideas?

Thank you.

gkaiseril
Expert
Registered: Feb 23 2006
Posts: 4308
You need to design a workflow that includes moving the processed files to another folder.

George Kaiser

lacro
Registered: Jul 21 2010
Posts: 3
Unfortunately I can't do that. The problem is that all of the files are indexed by another master program that points to the .pdf files. Moving the files is not an option, the files have to stay exactly where they are.
gkaiseril
Expert
Registered: Feb 23 2006
Posts: 4308
You could create your batch process to select the files to be OCR'd at run time instead of selecting the folders.

Or you OCR them somewhere else before loading them to the folder being indexed.

George Kaiser