These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Adobe Book Scanning Softwar

Bookmaker
Registered: Sep 11 2011
Posts: 3
Answered

I would like to create an e-book in pdf format that is searchable. The book to be scanned is 364 pages with both text and images. I have an hp scanjet 8250. What is the best Adobe product to use for this project? Does a project of this nature require more than one Adobe product?

Bookmaker

KellyMcC
Acrobat 9ExpertTeam
Registered: Jul 11 2011
Posts: 389
Accepted Answer
This is a very open ended question. Scanning a book of that size will take quite a bit of time. Acrobat Pro does have an OCR (Optical Character Recognition) feature, but that isn't the primary focus of the software. The cleanup work on OCR suspects may a VERY large undertaking. The 2nd part of of the question is creating an eBook: eBook files typically originate on the computer, they aren't scanned documents. Many different software applications go into creating an eBook. Here is a video on creating an eBook from Adobe InDesign CS5: http://tv.adobe.com/watch/visual-design-cs5/ebook-document-design-with-cs5-design-premium/

Kelly McCathran
Adobe Community Expert
Certified Technical Trainer+

Bookmaker
Registered: Sep 11 2011
Posts: 3
This information was very helpful. The process of converting a printed book to a digital book is more complicated than I thought. Nevertheless, it appears Acrobat (or Acrobat Plus) is suitable for a project of this nature. The eye-opener for me is the amount of time required to clean text files (by hand) after scanning. I do not yet understand how text files and image files are combined on a digital page to create a duplicate of the original printed page. However, I believe that will fall into place if I can lay out "by the numbers" the sequence of tasks (or steps) required for this project. As I understand it, adding "searchability" to a large pdf document requires software that includes that capability. In this case, the "large pdf document" is a 364-page book.

Bookmaker

KellyMcC
Acrobat 9ExpertTeam
Registered: Jul 11 2011
Posts: 389
Bookmaker,

Yes the work involved will be massive, but it can be done. You might even want to check out some 3rd party OCR software programs, where that is the complete focus of the application. Acrobat will make the document searchable, once you run OCR on the images as well.

Kelly McCathran
Adobe Community Expert
Certified Technical Trainer+

Bookmaker
Registered: Sep 11 2011
Posts: 3
In my case, the time required is not a major deterrence (I'm retired). My scanner has OCR capability which I used just long enough to create a jumbled mixture of words, letters and hieroglyphics. It took about 1/2 hour just to clean one page. I'll try this: (1)Use a 3rd party OCR program to scan a page to my computer; (2)Use on-screen text editing to clean the file; and (3)Use Acrobat to convert the clean file to a searchable pdf document. I assume Acrobat will somehow merge 364 individual (page) files into a single (book) file. Thanks for your help!

Bookmaker

UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
The OCR engine in Acrobat X is considered best-in-class, and outperforms many of the dedicated applications - but no matter how good it is, OCR can never be right all the time, and the results depend on the quality of the scan and the visual appearance of the original - a pulp-fiction style novel should OCR quite well, probably only one or two suspects per page, but if the book uses a more complex serif font, or has complicated layouts with muted colors, background graphics and flowing text (such as a magazine page), it can take a while to sort the problems.

I'd suggest installing the free trial of Acrobat X Pro, running a couple of pages through and seeing how it behaves with the various options.

The major factor with a scanned book is in how you decide to store the processed results - Acrobat can either store "searchable images" of each page, with hidden "real text" behind the words, or convert the words into "real text" by making a custom ClearScan font. Either way the result will look low-resolution on screen compared to the original scan, unless you use the "searchable image (exact)" conversion settings - and in that case your PDF will end up massive, as it'll be a set of 364 full-resolution images.

One option, again possible with Acrobat X Pro, is to OCR the file and then save it to Word - which will give you just the text and images on the page rather than a photo of the entire sheet of paper - you can then correct any layout issues, and print that Word document to a new, much smaller PDF file. I've got a tutorial in the Learning Center which shows this in action.Naturally, OCR and exporting features should only be used where the original document is your own, public-domain or where you have the express permission of the copyright owner. Google is finding out even as I type that scanning books without permission will land you in court!