These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

batch extraction of searchable text from pdf's

alaidlaw
Registered: Mar 11 2011
Posts: 1

We have an application where we get a large number of pdf documents, that we need to extract the searchable text. Some of them are actually image scans and so will need to be ocr'd.
 
I have been searching for tools to do this.
 
It looks like Acrobat X can do this but not in an automated manner.
 
Is this impression correct?
 
Angus Laidlaw

rbogie
Registered: Apr 28 2008
Posts: 432
this tip does not answer the 'automated' part of your question, but it will point you in the right dircetion. the best tool for extracting seachable text is Microsoft Office Document Imaging (MODI). MODI comes bundled with MS Office Pro (in tools group). export (convert) PDF to TIFF and open in MODI. OCR the TIFF to MS Word.
thomp
Expert
Registered: Feb 15 2006
Posts: 4411
The "Actions" feature of Acrobat X can be used to OCR entire folders of PDFs. There are several videos and articles at this site. Search for "Acrobat X Action"

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script