These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Converting PDF to Word Doc - losing formatting

measley
Registered: Aug 7 2009
Posts: 5

I am trying to convert a PDF document to a word doc. The PDF was originally created in adobe acrobat 6.0 and i have acrobat 8.0. When i save as a word doc I lose the formatting of the PDF (this is purely a text doc - no images). How can I do the conversion and not lose the formatting?

My Product Information:
Acrobat Standard 9.1.3, Windows
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Having formating preserved when exporting or saving from PDF to MS Word is a function how well-formed the PDF's structure tree is.
It is the content of the structure tree that provides the format information when performing the export or save as.

This, of course, presumes that the PDF's source file was associated with an application that adequately supports a tagged output PDF and
that the final PDF is a well-formed (strucuture tree) tagged PDF.

To see if the PDF is tagged or not look at the Description tab of the PDF's Document Properties. The bottom left will have
"Tagged PDF: Yes" or "Tagged PDF: No".

If "No", the PDF has no structure tree. Using the Acrobat Professional product you can have it make a "best estimate" of what the structure tree might be and go from there. Expect to do some amount of "clean up" in the Word file.

If "Yes", the PDF ought to have a structure tree and something in it. A "yes" does not mean you have a well-formed structure tree.
Use the Full Checker, Adobe PDF checking option to get a better sense of what is present and what is missing.
(Advanced > Accessibility > Full Check)A "well-formed" structure tree in a tagged output PDF is the result of a "well-formed" authoring file in an application that provides robust support for producing the tagged output PDF.

As you have noted, a PDF without the well-formed structure tree can have the PDF's page content copied to Word; formating becomes part of the "clean up" activity in Word. A Word template that reflects what you observe of the PDF's page content "format" can help move the clean up activity along.


Be well...

Be well...

measley
Registered: Aug 7 2009
Posts: 5
Thank you so much!!! The pdf was not tagged so i added tags to it but the formatting is still lost after conversion. This is a two column document - can you think of anything else I might be able to do so i don't lose the two column formatting? Thanks again for your help!
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Hi,
On your original PDF, what was the "Application" and "PDF Producer"?
Look at the Description tab of the PDF's Document Properties.

What specific version of Acrobat are you using?

Some observations:

For a scanned image in PDF (simplistic here, but... ) -
If an untagged PDF is the imported image from a scan it would have OCR applied as step one to becoming tagged.
Then, using Acrobat, tag the PDF. Typically, the OCR text layer will be left to right, top down. No columns.

For a PDF with content from some authoring application (e.g., not a scanned image) -
If the PDF is the output of a non-Adobe process then, unfortunately, the PDF may be marginal.
That is to say the processes used all too often do not adequately adhere to the criteria identified in the ISO standard for PDF or the Adobe PDF Reference
documentation that preceeded the ISO standard.
Consequently, what's "painted" onto the PDF page for content may not have the "hook" that identifies that there are columns in play.
Similarly, document hierarchy (heading levels, paragraph styles, lists, tables, etc.) are not identified or inadequately identified.

Can Acrobat recognize columns in an untagged PDF during "add tags" and provide an export output into Word that reflects the columns?
Yes. Ran some trials. Word file - two columns - filler text. Output was an untagged PDF. Used Acrobat to add tags. Result was fine (expected as the "document" was very simple). Export to Word yielded the two columns. Another trial added Word headings. Acrobat's add tags dealt with these effectively.
Export to Word reflected the headings.

Do note that PDF content to Word typically gets parked in Word "frames". Adds to the clean up level of effort.
With that said, if the content author is using an appropriate authoring application with a measure of rigor vis-a-vis the docment's logical hierarchy AND does so in a manner that supports the goal of a well-formed tagged output PDF the what's in the PDF can be exported into Word with very little clean up required.

Be well..

Be well...