by Donna L. Baker, ACE, Baker Communications
You preserve written documents and images by scanning them into your computer and saving them as images. See how Acrobat 8 takes scans to the next level.
One of several methods for producing an Adobe PDF file is by scanning documents into Acrobat. Over the last few versions of Acrobat, we’ve seen a gradual improvement in the integration of scanner software with Acrobat and improvements in the accuracy of the text capture. Acrobat 8 Professional continues that trend, offering an improved OCR engine, easier file optimization and the ability to create a PDF/A standards-compliant document.
Note: I’m looking at scanning with a simple home or office scanner. For a large archival project, consider using high-speed scanners or dedicated scanning software for efficiency.
Acrobat works with your scanner software directly. Any scanner configured for use with your computer or network can be used with Acrobat.
Follow these steps to scan a document into Acrobat:
1. Choose Document > Scan to PDF or click the Create PDF button and choose From Scanner to open the Acrobat Scan dialog box (Figure 1).
2. Choose from the list of scanners configured for your system from the Scanner drop-down menu.
Note: Click Scanner Options to open a dialog box in which you can select a paper size, and whether to use the scanner’s interface or hide it and use Acrobat only. The defaults are to hide the scanner’s interface and use a USLetter-sized paper.
3. Choose Front Sides or Both Sides from the Sides drop-down menu. The default is Front Sides; choose Both Sides if you are working with two-sided documents and a duplex scanner.
4. In Color Mode, choose from Color, Grayscale or Black and White options. Black and White is optimal for capturing text.
5. Specify a resolution for the scanned PDF. A resolution of 300dpi has enough information for text capture, but not too much as to bloat the file size.
6. Specify an output destination for the scanned page. Choose either a new document or append the scan to an existing file.
7. Select the Make PDF/A Compliant check box to apply standards features to the captured file.
Caution: Choosing the Make PDF/A Compliant option disables OCR choices, described in the section “Choosing an output style”.
8. Balance the size and quality by dragging the slider between the Small Size and High Quality ends of the continuum.
9. Choose a Text Recognition option; the default is Make Searchable (Run OCR); choose whether to include metadata, and accessibility features.
10. Click Scan; a Save Scanned File as dialog box opens. Name the file and specify its location.
11. Click Save to close the dialog box and start the scan.
12. Once the page is scanned, the Acrobat Scan dialog box opens (Figure 2). Select Scan more pages to continue scanning or Scanning complete to finish and then click OK.
Note: If you choose options to add metadata or accessible features, the Document Properties dialog box opens. Add the content as desired.
The quality of the text capture results are directly related to the clarity of the text on the scan. When possible, use black text on a white background. Colored or decorative fonts are difficult for the program to recognize and can lead to search and indexing errors.
A simple font at 12 points in size is optimal for capture. If your document’s text is smaller, increasing the scan resolution may provide enough additional data for interpreting the content reliably.
Older versions of Acrobat, as well as image-capture programs such as Photoshop, create image PDF files. The text and content on the page can be read and printed, but the text on the page hasn’t been defined as text characters.
If you open a document and aren’t sure if it contains images and text or an image of the page, click the page with the Select tool on the Basic toolbar. Select the TouchUp Object tool on the Advanced Editing toolbar and click the text on the page.
If you click a text area in the document and a block is selected, you have an image of the text (Figure 3). If you see the content of each word selected, the page contains searchable text.
Tip: If you hold the tool over the image for a few seconds, the popup menu displays, shown in the figure. The only commands available are to capture the text or copy the image—clearly there isn’t any text on the page.
Follow these steps to capture the content of an image PDF page:
1. Click Edit to open the Recognize Text – Settings dialog box. Choose options for OCR Language, Output Style, and Resolution and click OK to close the dialog box.
Tip: If you choose the Searchable Image (Exact) style, you can’t choose an image downsampling resolution.
2. Click OK to close the Recognize Text dialog box and start the capture process.
In Figure 5, it’s evident the content has been captured: Text has been selected with the Select tool, and the Text object popup menu is shown.
The same three OCR settings are available in the PDF Output Style drop-down menu and the Acrobat Scan dialog box, shown in Figure 1. The options include:
Both searchable image options leave the final PDF easy to read, as the captured page looks the same as the original. A formatted page, on the other hand, often results in character and font substitutions, as you see in Figure 6.
Characters that can’t be processed are called suspects. Choose Document > OCR Text Recognition > Find First/All OCR Suspect(s) commands to examine the converted page. The example document has no identified suspects, although there are a number of font discrepancies and misspellings.
The goal in scanning images is to produce a balance between quality and file size. To optimize a scanned page, click Options (next to the Optimization slider) on the Acrobat Scan dialog box shown in Figure 1, or choose Document > Optimize Scanned PDF to open the dialog box shown in Figure 7.
Choose the settings you want to apply to the scan and click OK. The options and their features are listed in Table 1.
Table 1. Image features to optimize
Image effect or outcome
Automatic or Off
Off, Low, Medium, High
Increases contrast between letters and background for clarity
Edge shadow removal
Off, Cautious, Aggressive
Removes visible shadowing on letters; too much removal makes some letters indistinguishable
Off, Low, Medium, High
Smooths uneven color
Automatic or Off
Removes the appearance of ink dots on the page
On or Off
Removes the glow surrounding text characters
Interestingly, I have experienced a significant difference in the ability of the OCR engines in Acrobat 7 and Acrobat 8. The image in Figure 8, for example, would have required a significant amount of tweaking to capture the text accurately in Acrobat 7. By contrast, capturing the text in Acrobat 8 is flawless.
If you aren’t sure how well the content has been captured, choose Document > Examine Document to open the Examine Document dialog box. In a scanned and captured file, you see Hidden text is selected on the dialog box (Figure 9). Click Preview to open the display.
Scroll through the preview pages to check out the content on the hidden-text layer. You can see the captured content on the preview window, and the actual captured text in the Hidden text field on the dialog box (Figure 10).
Note: You don’t want to delete the hidden text from the file! Read about the Examine Document process in my previous article.
Do you regularly scan documents into Acrobat for producing PDF files? Have you found any methods or tips that are especially useful? How about problems—have you run across any difficult scanning situations? How did you resolve the issue?