Apologies for this being long and quite possibly stupid, but I'm green,
I've looked all over and can't find this precise information. Any help very
gratefully received.
I'm converting a bunch of my old paperbacks to ebooks (to be read
online and on an ebook reader), scanning the pages into PDF, then using
Acrobat Pro 8 on a MacBook (os 10.5) to OCR them so the text is
searchable, and saving them at a reduced size without too much quality
loss (closer to 'print' than 'web' quality). I'm dealing with several of these
files at a time so obviously want to use Batch Processing if possible.
However, I'm not techy or designy and don't understand the various
options available in the (far too complicated) 'PDF Optimization' menu.
The files when they emerge from initial scanning are huge, between 40Mb and
100Mb. When I batch-process them to OCR and put PDF Optimization on
for output, using the default settings, they (eventually) emerge as still-huge files, usually *growing*, not shrinking. Plus the fonts look awful.
However, after loooong experimentation, I've found that a good
compromise in terms of size and quality is to *first* run them through the
'Reduce File Size' command on the 'Document' menu, then OCR the
resulting file through the 'Recognize Text using OCR' subcommand on the
'OCR Text Recognition' command on the same menu. The end result of
that is it reduces their size very substantially (down to 6ish Mb), they
look perfectly ok and they're text-searchable. I've found that doing the
two steps - OCR & Shrink - the other way round doesn't work so well. I've no idea
why.
I have been told that the 'Reduce File Size' command is just a simplified
version of the 'Optimize Scanned PDF' command with certain settings
inbuilt. But I cannot find what those setting are. I've also
been told that those settings are the same as the defaults in PDF
Optimization, so if I simply run that command (or run PDF Optimization
as an output option on Batch Processing), the result should be the same.
I've tried this and it just is not even close to the same: the files are huge
and the final fonts don't look the same: 'Reduce File Size' almost
always results in a better finish for me, and, frustratingly, that command is not
available in 'Batch Processing'.
So my question is: is there a way to organize a 'Batch Processing'
sequence that will take a bunch of files and for all of them exactly
mimic the result, in file-size and aesthetic results, of taking each file separately
and first i) clicking the 'Reduce File Size' command, then ii) clicking
'Recognize Text using OCR'? If there is, I would be so grateful if
someone can tell me how to create it. At the moment I can batch-process OCR a bunch of shrunk files, but cannot shrink them except one at a time.
Thanks so much for any help.
For batch processing OCR though you really might consider using proper OCR software like Abbyy Finereader or ReadIiris etc. They are geared for batch work, let you define areas to OCR, remove spine and punch hole images, correct OCR errors easily etc, etc. Output can still be to pdf, with adjustable settings for dpi and jpeg quality.
...if the paperbacks weren't written by you, watch out for copyright!