These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Word docx makes huge pdf/A files very slowly

dhillier
Registered: Aug 14 2010
Posts: 8
Answered

By chance I happened to make PDF/A-1b files from both doc and docx Word files with the same complex source material (using the Acrobat 8.2.3 "printer"). The docx PDF took much longer to produce and ended up much larger in size (about 80% bigger). Repeating the comparison with other complex Word files produced comparable discrepancies. I was "printing" from Word 2010 and lack a platform to check to see if Word 2007 does the same.
One wonders what meaningless (hopefully) docx information occupies the extra space, and also perhaps about the reliability of the PDF/A-1b as an archival format. It would be good to know if Acrobat 9 resolves the issue.

UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
If you open the Optimize dialog there's a button labeled "Audit space usage" which will show you what types of content make up what percentage of the data.

PDF/A files are always larger than their PDF counterparts as the standard requires complete font embedding and limits the compression we can apply to binary streams - but directly comparing DOC and DOCX inputs will never be simple, as Word handles even visually-similar content in very different ways when outputting to print.
dhillier
Registered: Aug 14 2010
Posts: 8
Thanks much to UVSAR for the tip about auditing space usage.

Here is a representative example of my experience "printing" a PDF from the same source material using DOC and DOCX file structure (apologies if the table doesn't line up):

DOC DOCX INCREASE
Images 20,946,924 38,096,356 82%
Content Streams 53,600 105,568 97%
Fonts 218,989 219,039 0%
Document Overhead 18,438 51,064 177%
Color Spaces 2,718 45,080 1559%
Extended Graphics States 69 69 0%
Cross Reference Table 4,900 10,040 105%
Total 21,245,638 38,557,216 81%

I realize it's unreasonable to expect older software to handle new formats efficiently. Perhaps Acrobat 9 does better.

But it turns out that saving PDF files directly from Word pretty much resolves my issue. Using the source material from the example above, Word 2010 produced from the DOC file a PDF of 15,734,927 bytes and from the DOCX file a PDF of 15,789,435 bytes. The tiny difference in file size is no more than one would expect from the overhead that supports presumably improved file structure.

And there was a bonus! Both of the PDF's produced by Word passed Acrobat 8's test for PDF/A-1a as well as PDF/A-1b. That's something that I wasn't able to achieve using the Acrobat ver. 8.2.3 "printer".

An attempt to produce a PDF/A file with Word 2007 from the DOCX version produced a slightly larger (16.7 MB) PDF. That document threw one arcane error when tested for PDF/A-1a compliance in Acrobat 8, but no errors when examined for PDF/A-1b.

UVSAR's second point about PDF/A files certainly makes sense, but was not my issue. One would be surprised not to pay a price for meeting an archival standard.
leonardr
Expert
Registered: Feb 14 2006
Posts: 333
dhillier wrote:
By chance I happened to make PDF/A-1b files from both doc and docx Word files with the same complex source material (using the Acrobat 8.2.3 "printer"). The docx PDF took much longer to produce and ended up much larger in size (about 80% bigger). Repeating the comparison with other complex Word files produced comparable discrepancies. I was "printing" from Word 2010 and lack a platform to check to see if Word 2007 does the same
I am confused how you are using Acrobat 8 with Word 2010 as that environment is NOT supported by Adobe....

Also, while you can use the Adobe PDF printer in general, Adobe recommends the use of the "Create Adobe PDF" buttons/ribbon that we install into office to produce higher fidelity PDF documents from Office.

Leonard Rosenthol
PDF Standards Architect
Adobe Systems

dhillier
Registered: Aug 14 2010
Posts: 8
Sorry to have been confusing. I am indeed using Acrobat 8 in an environment that is not supported by Adobe and then letting other Acrobat users know how it goes. It would be just great to hear from someone who has explored my particular situation using Acrobat 9, for that would help me decide whether the expense of still another Acrobat upgrade is warranted.

But with respect to leonardr's second point, until recently I was using Acrobat 8.2.3 with Office 2007. It proved not to be robust in the face of operating system upgrades. On two separate occasions I lost the Acrobat buttons on the Office 2007 ribbon. The second time that it happened I took the path of least resistance and simply started using the old reliable Adobe "printer". As you know, reinstalling Acrobat 8 sets in motion a long train of sequential upgrades; not a good use of my time. I like the "printer" a lot, for it can be used effectively with a wide range of software.
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
dhillier wrote:
And there was a bonus! Both of the PDF's produced by Word passed Acrobat 8's test for PDF/A-1a as well as PDF/A-1b. That's something that I wasn't able to achieve using the Acrobat ver. 8.2.3 "printer".
That's entirely as expected: PDF/A1-a requires the document has a tagged structure, and that can only be done with generation routes that use "export" - either from our Office PDFMaker plugin or the native application. When you print to Distiller, the document structure is not passed through (it is after all a virtual "printer" so is only concerned with the visible layout), and that means you can only ever use it to make PDF/A1-b files, as these don't need tags.
leonardr
Expert
Registered: Feb 14 2006
Posts: 333
dhillier wrote:
It would be just great to hear from someone who has explored my particular situation using Acrobat 9, for that would help me decide whether the expense of still another Acrobat upgrade is warranted.
Acrobat 9 does not support Office 2010 either. Please see our FAQ at .

Leonard Rosenthol
PDF Standards Architect
Adobe Systems

dhillier
Registered: Aug 14 2010
Posts: 8
Keeping ahead of word processing software upgrades is obviously very difficult. That's what makes the virtual printer approach so attractive, despite its more generic relationship to the document.

So the question that remains is whether Acrobat 9 fully supports the Word DOCX format, which I remember first using with Word 2007.
leonardr
Expert
Registered: Feb 14 2006
Posts: 333
dhillier wrote:
So the question that remains is whether Acrobat 9 fully supports the Word DOCX format, which I remember first using with Word 2007.
Office 2007 is supported by Acrobat 8.1 and later, which includes 9. It will incorporate the necessary "Acrobat Ribbon Bar" into the product to enable higher fidelity conversions of your Office documents inot PDF and PDF/A.

As to your size issues - PDF/A-1a (the "all") conformance level of PDF/A will always generate larger PDFs than PDF/A-1b (the "basic") level. Has NOTHING to do with their starting as .doc vs. docx.

Leonard Rosenthol
PDF Standards Architect
Adobe Systems