This tutorial shows you how to work with the Scan and Optimize features in Acrobat X. See what the all-new Acrobat DC can do for you.

Download a free trial of the new Acrobat.

Exporting scanned PDF files to other file formats with Acrobat X

Learn how to take a scanned PDF document and convert PDF to Word and convert PDF to Excel.

By Dave Merchant – October 15, 2010

 



In this tutorial, learn how to take a scanned PDF document and convert PDF to Word and PDF to Excel in Acrobat X.

View transcript

Exporting scanned PDF files to other file formats with Acrobat X

Dave Merchant – October 15, 2010

With Acrobat X Pro we have a significantly-improved optical character recognition (or OCR) system, and also a class-leading export tool for saving PDF files as Microsoft Office documents.

Putting these together, you can see where we're going - Acrobat X becomes the ideal application to help you in those darkest of moments, when you find your only copy of a vital Word document or spreadsheet is a printout.

Converting back from paper to "original" can never be a totally seamless workflow, as the printout has of course lost some of the electronic structure that makes a table "a table," or identifies your header as being in Myriad Pro Semi-condensed Bold.

Acrobat won't be able to identify your fonts, but it can certainly rebuild the editable text, table cells and a lot of the formatting - and by saving your documents into a fully-editable file, reapplying your styles is really simple.

We can start by scanning to PDF directly with Acrobat using Create > PDF from Scanner.

Acrobat X can now automatically determine the color mode of each page and apply the most suitable compression - If I click Custom Scan, I can also decide if the scanned image will be OCR-ed into a searchable document, and so, what options I wish to run; and I can choose to make a PDF/A-1b archive file so the document is ready to be stored long-term.

I actually have a pre-scanned file in TIFF format, so before I open that, I'll click Edit>Preferences and check how Acrobat is going to handle TIFFs.

Under Convert to PDF, I'll scroll down and click TIFF, and click Edit Settings, and I get pretty much the same options as I saw earlier.

I can turn on Optimization and OCR, and under the settings I can choose to apply the new Adaptive Compression (the best possible method based on the colors of the file), I can decide if I'm going to deskew the page to remove any misalignment, if I'm going to remove any background to deal with colored paper, or improve the quality of scans from newspapers or magazines by sharpening and descreening.

I can also apply OCR automatically and choose the type to apply - I'll stick with Searchable Image in this case.

Click OK, OK, OK, and Acrobat will now remember that.

Now it knows what I want to do with TIFFs, I can just open one with Create > PDF From File, and choose a TIFF - Acrobat will automatically optimize and OCR the file, and produce a Searchable Image PDF.

The document we're looking at is still visually an image, so if we zoom right in we can see it's still made out of pixels, but it's searchable too - if I open the find tool (ctrl-F or CMD-F) and search for the word "grill", you can see the matches being highlighted.

Now we have our PDF file, saving to Excel is really easy - just click File > Save As > Spreadsheet > Microsoft Excel Workbook (compatible with Office 2007 and 2010).

I'll click Settings and make sure that we're running OCR again - as this is a searchable image PDF we need to run OCR to find the table; Click OK and save the file...

...and here's what it looks like in Excel.

We don't export cell colors, but as you can see it's done a great job of finding the cells, it's formatted our header text, and apart from a couple of places where the decision to center or left-justify was too close to call, it's the same table that a moment ago we only had on paper.

This last row even has wrapped flowing text - Acrobat X understands the difference between line breaks and paragraphs.

Of course we're not limited to tables and Excel files - here I have a couple of pages from a magazine, again this is a scan of a paper original, but this time I haven't applied any OCR - it's not even particularly straight, it's simply a series of full-page images.

I'll simply click File > Save As > Microsoft Word, and choose the Office Open XML format.

Again, under settings I'll make sure we're performing OCR - this time we certainly need to - and I'll keep Flowing Text and include images.

Click OK, click Save ...

...

and here's the result.

Now, we don't of course have exactly the same fonts, but we do have pretty much exactly the same formatting.

Acrobat X happily recognized things like paragraphs in columns, and down here on page 2 we have a headline that spans multiple columns.

All our images are placed properly, along with captions, and Acrobat has adapted the text so it exactly matches the placements within the PDF original - the last word on page 3 is "lifeblood", and here it is again in the PDF file.

We need to tidy up our fonts a bit, but with the help of Acrobat X, we're very close to a workable copy of a document that not so long ago would've been a lost cause, and a long night of typing it all in again.

In the process we've also gained an archive copy of the original document, which we can make fully searchable too.



Products covered:

Acrobat X

Related topics:

Export PDFs, Scan and Optimize

Top Searches:


4 comments

Comments for this tutorial are now closed.

Lori Kassuba

7, 2016-03-02 02, 2016

Hi,

Another alternative is to scan to TIFF format, then you can open the TIFF file in Acrobat.

Thanks,
Lori

My scanner is scaning only in word format

6, 2016-02-17 17, 2016

My scanner model HP Csannerjet 5590….. is scanning only in word format…., but i need it in PDF format…...Please email me details


Please Help

Lori Kassuba

6, 2016-01-07 07, 2016

Hi asif,

Be sure to select the “Prompt for scanning more pages” checkbox in the Custom Scan Dialog.

Thanks,
Lori

asif

10, 2016-01-07 07, 2016

how to page scan continue in pdf page

Lori Kassuba

1, 2014-05-23 23, 2014

Hi sarat,

Here is a tutorial that explains how to edit text from a scanned PDF:
https://acrobatusers.com/tutorials/how-to-edit-a-scanned-pdf-file

Thanks,
Lori

sarat

9, 2014-05-20 20, 2014

I scan a document in PDF or JPG format but i need to know how i will edit it after scanning.Pls tell me the process.

Thanking You,
Sarat

David Kastendick

6, 2013-08-16 16, 2013

Hi Lisa,

This may actually be an issue with the way that the PDF was originally created.  Could you open the PDF and choose File > Document Properties?  What does it say next to the ‘Application’ and ‘PDF Producer’ lines?

Thanks,
David

lisa burley

10, 2013-08-09 09, 2013

trying to convert a 76 page PDF file to Excel using Acrobat X Pro…File > Save As > Spreadsheet > Microsoft Excel Workbook…the right side of the PDF image is the only thing that shows in Column A of the spreadsheet.

Comments for this tutorial are now closed.