What are “PDF tags” and why should I care?

By February 1, 2006

 

PDF files contain many things. At a minimum, each contains the text, fonts, graphics, bookmarks, links, form fields and other elements of content that go to make up the electronic document "package" that is a PDF.

The order (in both temporal and spatial senses) of these contents, and why that order should matter to you, isn't obvious, but it is nonetheless important to understand.

figure

What is content order?

There are four basic ways in which the contents of any given PDF page may be ordered for the purpose of expression to a user. The first two are familiar to most users: screen and print. In both cases, the concept of "content order" is meaningful only in terms of the so-called "z-order" - which item is "in front" of or "behind" which other items appearing at that same location on the page.

For example, to correctly display a shaded box containing text, the correct z-order places the text "in front of" the shading. If the text appears "behind" the shading, it would disappear on screen and in print — likely not the desired effect. Getting the z-order right, however, says nothing about the linear ordering of the characters, words, lines and paragraphs of the text — that is an entirely different issue.

Up until version 5 of Adobe Acrobat, z-order was the only type of content order that could be applied to PDF-based content. Text and graphics appeared on the screen in an order that best supported accurate display and printing, and that was enough.

In Acrobat 5, Adobe Systems began to respond to the other two key considerations for content expression. The third type of content order - reading order - denotes the linear ordering of the letters and words of the text. Without the concept of reading order, a PDF file has literally no idea which letter or word precedes another letter or word on the page. All it knows is which z-order to use for the objects appearing at a given coordinate on the page.

figure
TouchUp Reading Order

Click image to enlarge

With reading order, the characters on the page are understood to have a linear sequence of appearance quite apart from z-order. When the reading order is correct, it becomes possible to accommodate disabled users who require a simple stream of text, or to use with PDAs and other mobile devices that reflow content to display a wide-formatted page on a small, narrow screen.

The fourth type of content order, logical, is an extension of the reading order concept. Logical order makes it possible to identify the relationships between blocks of content, each of which contains text in correct reading order. Logical order allows you to introduce concepts such as tables, lists and headings, as well as provide alternate text for images, descriptive text for links and form fields, and so on. A PDF file that includes logical order has been "tagged;" however, the fact that tags exist in a given PDF is itself no indication that the logical order is valid, or even close to valid. Properly tagging a PDF file is not the simplest of matters, as we'll see.

Taken together, the concepts of reading order and logical order are described as "structure." Properly executed, PDF structure binds all the document's content together as an ordered whole, with consistent, high-quality results in all modes of expression.

Why should I care about content order?

Imagine using your computer with the screen turned off, and you'll get some idea of how important logical order is to anyone who needs screen-reader software to read your PDF.

To most users, PDF files seem to work fine just the way they are. They look right and print well, and that is good enough. For many uses and users, however, it is not enough — and it may even be against the law. A PDF file that's not structured correctly is not accessible to users with disabilities, and it won't display properly on a PDA.

Section 508, enacted in June 2001, is an amendment to the Rehabilitation Act that requires all US Federal agencies to ensure that their web-based content is accessible to those who must use assistive technology to access electronic documents. State governments are following with their own accessibility requirements. Section 508 also requires that Federal contractors submit accessible documentation as part of their contract specifications, and many states are also beginning to implement this mandate.

Apart from government-mandated accessibility for electronic content, many businesses and organizations are finding sound business reasons for producing accessible documentation, forms, brochures and other content.

Beyond meeting the needs of the disabled, properly structured PDF files have a number of other qualities that can dramatically expand their usefulness in a variety of applications.

To enable the use of PDFs on mobile devices, Adobe offers versions of its free Adobe Reader software for a range of portable devices and operating systems, including Pocket PC, Palm OS and Symbian OS. For PDF files to display properly on these devices, and to reflow on-screen as well, the files must be structured, allowing text to be reflowed.

How to I determine and modify content order in a PDF?

Acrobat Professional allows the adding of structural tags to a PDF, but a degree of quality control — in the form of manual oversight - is still required to ensure the tagging process is performed correctly. There is little room for error in document tagging. Seemingly small errors in document structure can easily render a page incomprehensible. For example, consider how a page would read if footnote text were to appear after the last paragraph on the page (as implied by Left-Right-Top-Bottom reading order alone) instead of at the footnote location in the document text (as implied by logical order).

First, be sure to utilize Acrobat's "Add Tags" feature (Advanced > Accessibility > Add Tags to Document) to set an initial reading order by automatically generating PDF tags. "Add tags" generally does a decent job of setting reading order, at least within blocks of text. Unless the document is very simple, however, the automated reading order and tag structure alone is unlikely to produce satisfactory results. "Add Tags" is certainly not a quick-fix for Section 508 compliance.

If you created a PDF using the "Convert to Adobe PDF" function — installed as part of with Acrobat 7.0x Standard or Professional - in Microsoft Office, you can choose to "Enable accessibility and reflow" via the Adobe PDF > Change Conversion Settings menu to have your PDFs tagged when converting from Microsoft Word or Excel to PDF.

figure
Acrobat PDF Maker dialog. Be sure
"Enable accessibility and reflow" is checked!

To give you a sense of the reading experience with assistive technology that doesn't use tags, or on a PDA, Acrobat includes two tools that utilize structure in PDF files. Reflow (View > Reflow), allows the user to fit the document contents to their window, scaling the text to their preference. (See following before-and-after example) Read Out Loud (View > Read Out Loud), allows the user to listen to the reading order of the text through speakers or a headset.

figure

You can examine (and correct) the reading order generated by the automation by starting the Touch-Up Reading Order Tool (Advanced > Accessibility > Touch Up Reading Order) and then opening the Content panel. But be careful! There is no "undo" for changes made to the content. Save your work after you complete each page so you can never lose more than a page of work if you make a mistake.

figure
Use the order panel to correct the document's reading order before checking the tags.

Click image to enlarge

As for the tags, check them out via the View > Navigation Tabs > Tags. Drag and drop this panel into the left-hand edge of the window to place the Tags panel with the Bookmarks, Pages and other Navigation tabs. (Note: There is simply no substitute for reading the manual. Don't expect to become a PDF-tagging expert overnight!).

figure

Complex documents will require
validation in the Tags panel.

Note that reading order and tags do not as yet fully "harmonize" with each other when you make certain adjustments, as they really should. For the moment, it is up to the document author or manager to ensure that the file is validated for the intended usage or for the standards it must meet.

Conclusion

A PDF file equipped with quality-controlled tags may be read effectively using a screen-reader or other assistive technology that reads PDF tags. If the PDF file is also optimized for reflowing of content, it will read well using assistive technologies that do not use PDF tags, as well as on mobile devices. If accessibility is important (or mandatory), or if you want your files to work well on mobile devices, then you need to learn to tag your PDFs.figure

Key Take Aways

  • PDF files do not naturally "know" the correct order of text — this information has to be added and verified (that is, the file must be structured to support applications that require the text to be ordered).
  • Users who need assistive technology to read electronic documents require structured PDFs.
  • Mobile devices require structured PDFs.
  • While structure and tags may be automatically generated, they cannot be automatically validated. In most cases, some manual work is required.

Was this tutorial helpful?

Please Log in to provide feedback on this tutorial.

Rate this tutorial

Please Log in to rate this tutorial.

Rating:

Did you know?

  • You can ask a question and get an answer from one of our experts.
  • You can search our database of over 800 tutorials by product and/or topic.
  • You can leave a comment below for the author of this tutorial.

Products covered:

Related topics:

Accessibility

Top Searches:

Create PDF, PDF to Word, PDF editor, converting PDF to Word


0 comments

Leave a reply:

Have an urgent question? Post your question to our Ask an Expert forum for a faster response.

Commenting is not available in this channel entry.