I have 1000 images that I want to OCR (Searchable Image, Exact) and put in a PDF. The text within the images OCRs much better if they are 2-bit black/white images and filtered a bit to clean them up (despeckle, etc), but they are much more readable when they are the original color (plus any graphics are better in the original).
If I create a PDF with the 1000 b/w images and OCR it, can I then write a javascript to replace all the images with the original colour files while preserving the OCR'd hidden text that allows selection and text copy to clipboard? (The images will have the text in the exact same location, same image size, same dpi just easier to read)
It might be possible with a plugin. I know for sure it's possible with an external application because I've recently created such an application.
You're welcome to have a look at this page from my website:
http://try67.blogspot.com/2010/04/acrobat-batch-editremove-images.php
- AcrobatUsers Community Expert - Contact me personally at try6767 [at] gmail [dot] com
Check out my custom-made scripts website: http://try67.blogspot.com