Dave Merchant October 15, 2010
Was this tutorial useful to you?
The video explains how to convert your scanned PDF files to other file formats such as Word and Excel in Acrobat X.
Dave Merchant October 15, 2010
With Acrobat X Pro we have a significantly-improved optical character recognition (or OCR) system, and also a class-leading export tool for saving PDF files as Microsoft Office documents.
Putting these together, you can see where we're going - Acrobat X becomes the ideal application to help you in those darkest of moments, when you find your only copy of a vital Word document or spreadsheet is a printout.
Converting back from paper to "original" can never be a totally seamless workflow, as the printout has of course lost some of the electronic structure that makes a table "a table," or identifies your header as being in Myriad Pro Semi-condensed Bold.
Acrobat won't be able to identify your fonts, but it can certainly rebuild the editable text, table cells and a lot of the formatting - and by saving your documents into a fully-editable file, reapplying your styles is really simple.
We can start by scanning to PDF directly with Acrobat using Create > PDF from Scanner.
Acrobat X can now automatically determine the color mode of each page and apply the most suitable compression - If I click Custom Scan, I can also decide if the scanned image will be OCR-ed into a searchable document, and so, what options I wish to run; and I can choose to make a PDF/A-1b archive file so the document is ready to be stored long-term.
I actually have a pre-scanned file in TIFF format, so before I open that, I'll click Edit>Preferences and check how Acrobat is going to handle TIFFs.
Under Convert to PDF, I'll scroll down and click TIFF, and click Edit Settings, and I get pretty much the same options as I saw earlier.
I can turn on Optimization and OCR, and under the settings I can choose to apply the new Adaptive Compression (the best possible method based on the colors of the file), I can decide if I'm going to deskew the page to remove any misalignment, if I'm going to remove any background to deal with colored paper, or improve the quality of scans from newspapers or magazines by sharpening and descreening.
I can also apply OCR automatically and choose the type to apply - I'll stick with Searchable Image in this case.
Click OK, OK, OK, and Acrobat will now remember that.
Now it knows what I want to do with TIFFs, I can just open one with Create > PDF From File, and choose a TIFF - Acrobat will automatically optimize and OCR the file, and produce a Searchable Image PDF.
The document we're looking at is still visually an image, so if we zoom right in we can see it's still made out of pixels, but it's searchable too - if I open the find tool (ctrl-F or CMD-F) and search for the word "grill", you can see the matches being highlighted.
Now we have our PDF file, saving to Excel is really easy - just click File > Save As > Spreadsheet > Microsoft Excel Workbook (compatible with Office 2007 and 2010).
I'll click Settings and make sure that we're running OCR again - as this is a searchable image PDF we need to run OCR to find the table; Click OK and save the file...
...and here's what it looks like in Excel.
We don't export cell colors, but as you can see it's done a great job of finding the cells, it's formatted our header text, and apart from a couple of places where the decision to center or left-justify was too close to call, it's the same table that a moment ago we only had on paper.
This last row even has wrapped flowing text - Acrobat X understands the difference between line breaks and paragraphs.
Of course we're not limited to tables and Excel files - here I have a couple of pages from a magazine, again this is a scan of a paper original, but this time I haven't applied any OCR - it's not even particularly straight, it's simply a series of full-page images.
I'll simply click File > Save As > Microsoft Word, and choose the Office Open XML format.
Again, under settings I'll make sure we're performing OCR - this time we certainly need to - and I'll keep Flowing Text and include images.
Click OK, click Save ...
and here's the result.
Now, we don't of course have exactly the same fonts, but we do have pretty much exactly the same formatting.
Acrobat X happily recognized things like paragraphs in columns, and down here on page 2 we have a headline that spans multiple columns.
All our images are placed properly, along with captions, and Acrobat has adapted the text so it exactly matches the placements within the PDF original - the last word on page 3 is "lifeblood", and here it is again in the PDF file.
We need to tidy up our fonts a bit, but with the help of Acrobat X, we're very close to a workable copy of a document that not so long ago would've been a lost cause, and a long night of typing it all in again.
In the process we've also gained an archive copy of the original document, which we can make fully searchable too.
|Acrobat X ProAcrobat X StandardAcrobat X Suite|