These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

combine tagged pdfs and not lose ordered structure?

katm
Registered: May 23 2011
Posts: 31
Answered

Any way to combine or add tagged pdfs and not lose ordered structure?
I've created a 96 pg tagged pdf and cleaned it up in Acrobat X. Then I needed to replace the first 2 pages. I created 2 separate tagged pdfs and reordered then. Next I combined pdfs. The new binder came in with Parts. I dragged out the Articles and Stories, but all tags, which I'd reordered in the 96 pg pdf were partially out of order and will require quite a bit of reordering.
 

Kat

My Product Information:
Acrobat Pro 10.0.2, Macintosh
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Accepted Answer
... anyway to combine

Yes and no. When a Tagged PDF is used via replace page(s), insert page(s) and such this PDF's structure tree will be appended to the end of the existing structure tree.

When manually assembling many PDFs into one you'd open the first (say Chapter_1.pdf) then insert the next in sequence behind the last page of this "first in content" PDF. Continue down the line.


Similarly, using the combine feature you'd need the files to be ordered first to last (top down) in the combine dialog. In the Options dialog you might play with having the accessibility choice unticked. The PDFs to be combined already have their structure. Otherwise the software will add.

As to "Part", it is a higher level Grouping element. Remember, grouping elements group other elements into sequences or hierarchies but hold no content directly and have no direct effect on layout.
So, "Part" makes sense (particularly if Combine's accessiblilty option is "on"). "Part" is the second highest BLSE Grouping element (after "Document").
.
--| Document is the whole enchilda.
--| Part groups Articles or Sections.
--| Article is for those self-contained runs of text (the essay, the newsletter article, etc.).
(Article is not to contain other "Article" grouping elements as a "child" element.)
--| Section groups related content (many paragraphs of content within text flow). Section can/does have "Section" repeated as a child (e.g., a chapter's sub-sections).
--| Division (a generic block-level grouping element) for rounding up stuff (sort of akin the HTML's "Div").


There are others but the above are seen rather often.
The above list denotes the "sequence" of the BLSE Grouping elements. "Document" is never a child. "Part" can be a child of "Document" but not of Article or Section.

By the way, provided the structure tree within a "Part" tag is ok you'd not have to do anything with "Part".

Pull in the PDF at the link below; it is Adobe's ISO authorized release of ISO 32000-1. Start with Section 14.7. As the information gels you'll find yourself meandering through the grooves of the other sections.
The ISO Standard is on its way to "dash 2" and a freebie may not be available once it is published.

http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf
.

Be well...

katm
Registered: May 23 2011
Posts: 31
I tried replace pages. Didn't work my table came in tagless. So divided the file in to and dumped the page I wanted to get rid of then did combine pages with the pdf. It came in in good shape, though each time I do this it adds another Part section. So I get Doc>Part>Part>Article>Story>etc. Still not too much trouble to drag Articles out of Parts and dump the empty Parts. Thanks so much for you help.

Kat

katm
Registered: May 23 2011
Posts: 31
http://www.mcgraphics.us/source/icon.php

For CVbook_combine6_11_4.40.pdf I divided book into 9 sections then combined. The tables were all re exported pdfs from InDesign.
Still can't reflow page folio 20, which came in from NFM_pg20_21.v1.pdf.

I've not done much reordering, just wanted to work on smaller book section pdfs and see if I could combine and have all tags retained.

What are the tags that JAWS screen reader can read? Can it read Notes, other tags?

Much to do.
Thanks for all help.
kat



Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
CVbook_combine6_11_4.40.pdf is not available. NFM_pg20_21.v1.pdf is; have it.
.


Be well...

katm
Registered: May 23 2011
Posts: 31
http://www.mcgraphics.us/source/Q.php

CVbook_combine6_11_4.40.pdf should download.
Sometimes that simplest things are a bit touchy.

Folio 20, acrobat pg 22 won't reflow when combined. It won't reflow in NFM_pg20_21.v1.pdf
Thanks kat


Kat

katm
Registered: May 23 2011
Posts: 31
Moving thru each section. I think I'm getting the idea.
Folio 20 chart problem seems to be because the chart has the repeating header, that you warned me about on Folio 20 and 21.

In moving things around I deleted url hyperlink's "Link - OBJR". I tried to make a new tag but Acrobat gives me specific labels. I chose Link then tried to edit the tag but couldn't get beyond " Link - OBJR". Do I need a Link - OBJR for go to page hyperlinks?

Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Pulling together from this and the other thread:

File: CVbook_combine6_11_4.40; folio 20.
.
The structure tree for folio 20 (looking at the PDF I downloaded) is rather badly fragmented.
As mentioned before, the content mastered in InDesign does not apply a "Table" logical hierarchy.
Rather, it is (at its core) a "layout" table.
Although tagged (and post-processed manual to try to remediate) how well AT can "use" the information provided by the structure tree will be problematic.
Similarly there are issues with reflow.
Folio 21's content reflows.
However, not having a "true" header row/header cell configuration the reflow of folio 21 is missing something.
No header cell content to provide context for the table's data cells' content.
.
The link below is to a PDF I've placed at acrobat.com.
It is composed of two PDFs compiled (2nd inserted into first). As Tagged PDFs, each had a "root" element of [Document].
In the third PDF contained two [Document] elements. Not an appropriate element/tag semantics.
So, in the structure tree, I moved the second [Article_A] element (role maps to PDF element [Article] (/Art)).
Placed as a child to the remaining [Document] element.
Used Table Editor to establish each [TH] element's Scope attribute.
Language was set for [Document] by the authoring application and passed through to the Tagged output PDF.
I also set Language in the Advanced tab of the PDF's Document Properties.
In the Page Thumbnails panel both pages were selected and Tab Order was set to Use Document Structure.
.
The PDF has 3 tables. Two on page 1 and one on page 2.
For the 2nd table on page 1 I used a FM paragraph tag that centered the entered text.
In reflow, this aligns the data cells' content to the heading cells' content.
This may provided usability improvement for users of reflow. You can see the difference in presentation of the first and second tables on page 1.
.
Page 2's table reflects, in part, what is in the table on folio 20 of the PDF you provided for download.
.
For "daaTbls_4katm.pdf"

For "daaTbls_4katm.pdf"
.
.
About folio 20:
The strucuture tree contains three Tables.
The first is mal-formed. It contains a truncated [Link] element associated with the PDF Link in the bottom row, 2nd cell.
The second table - the last [TR] in [TBody] does not contain the "kporg/mydoctor." string.
Nor does it contain last row, 3rd column cell content.
The stray [Link] element under the 2nd table entry has the container for the link (mentioned above).
[body_table] holds the " . " that terminates the last sentence of cell in last row, column 2.
.
The [TD] element is improperly located as a child element to parent [_No_paragraph_style_].
This [TD] belongs as the third child of the last [TR] element for the second [Table] element.
.
The last child of [_No_paragraph_style_], [Table] is folio 21's table.
.
The [Link] element:
Yes, [Link-OBJR] is required. It is the "connection" between the structure tree and the link annotation on a PDF page.
It is what AT needs to "find" that link annotation.
Observe the expanded structure tree (in the Tags panel of "daaTbls_4katm.pdf").
I've placed an accessible link on the page 1, table 2's row 3, column 3 cell.
In the structure tree you'll observe the correct configuration.
Note: The text string over which the link annotation rests can be the child of a [Span] element.
The [Span] element must be a child of the parent [Link] element.
Sometimes you will see the container as first child of [Link] then [Link - OBJR].
This is "workable".
.
If having to create an accessible Link in a PDF there is one way to do it.
In the structure tree, locate where the [Link] element placement falls in the appropriate location relative the the logical hierarchy of content and structure tree.
Put the Selection Tool ("I-beam" with Arrow) in play. Select desired text string. Right click for context menu.
Click "Create Link". Walk through the dialog.
In the structure tree, locate new link, place to appropriate location as appropriate.
.
.
You asked about PDF elements/tags & AT. Contemporary AT, when parsing a Tagged PDF is expecting a structure tree.
AT is looking for the PDF elements to be applied and configured as discussed/described in ISO 31000-1.
Going forward, IS0 32000-1 will go to 32000-2 which will have some new PDF elements/tags.
As well ISO 14289-1 (PDF Universal Accessibility) will be available.
Both can be expected in 2012.
.
AT and PDF elements/tags:
AT will, in time, become more capable as the developers avail themselves of the ISO Standards.
Just now, the core PDF elements/tags are recognized. This presupposes somethings.
--| PDF Tags are appropriately assigned.
--| PDF Tags are appropriately placed in the structure tree.
--| Tables observe the configuration discussed in ISO 32000-1.
--| Tables, that are configured IAW ISO 32000-1, that are "complex" have the Header attribute and cell IDs correctly added.
Note that the more complex the table the more problematic it becomes for contemporary AT to work the information.
Same thing for reflow.
If the [Table] build is not per the discussions in ISO 32000-1 it pretty much becomes a toss of the chicken bones.
.
Contemporay AT is still lagging somewhat in making use of ISO 32000-1.
I suggest not hanging your hat on what any specifc flavor of AT does or does not do.
Just now, in some regards NVDA is becoming more "Tagged PDF" aware than other AT applications.
Observing the protocols discussed in the ISO Standard puts you are firmer ground.
.


Be well...

katm
Registered: May 23 2011
Posts: 31
I sectioned all exports into 9 pdfs after redoing all charts so no repeating page columns. I've cleaned tags up and redone redid reading order.
I combined all pdfs into one book pdf then set all page properties for doc reading order structure, set properties for English and Title.

That done Accessibility Report showed 4 errors, several link problem. Each was missing a Link – OBJR missing. But when I try to make one I can only get . If I try delete <> I get Link - OBJR. I thought maybe I could copy yours—didn't work. Any way to make the link exactly like your link for data cell r3c3?Lovely pdf, so elegant.
As always thanks for all your time and help.
kat

Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Here's a demo for creating an accessible link.
I'd put it together to assist someone a while back.
I used Acrobat Pro 8 in the demo as that's what was being used.
The demo steps out from the presumption that the content that is to have an accessible link annotation is unmarked content (content that was identified as an Artifact to be specific).

For you, the content may be part of content that has within some tagged PDF page content.
So, in some circumstances, you might not need to start with tagging selected content.
Look over the demo and play with a "scrape" PDF.
Hopefully it'll provide a "get the toes wet" opportunity.
.
I'll pencil in time to cobble something together from some of your PDF to build a, perhaps, more useful demo.
.
Accessible Link Demo
.
A post "submit" edit - Thank you for the comment re: the PDF (FM lends itself to such. One of many reasons I'm partial to it <g>.)

... fixed the "link"



Be well...

katm
Registered: May 23 2011
Posts: 31
Great movie. I'd been afraid to attempt to LINK, your movie made it easy.

Link didn't let me add http:// to the existing text/link. Nor could I make the Link Accessible. It does work. Did before, too.
I put the 1 page pdf up at http://www.mcgraphics.us/source/Q.php. Tag structure is from the entire CV. But the new link dropped down below document; old link is in Doc structure. There are Comments in the pdf.

Thanks.

Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
I've the PDF. I'll look it over and get back to you. I've a video project rendering on another box. After several attempts I got the time down some; but, still a bit "long-of-tooth". Alas, not enough run time with that sort of activity to have the elan of Dave Merchant's productions. I'll get it to my web space and post the link here.
.


Be well...

katm
Registered: May 23 2011
Posts: 31
I'll look forward to seeing your video project, though not clear what it is, if/when you post to your site.
Thanks again for your help.
kat

Kat

katm
Registered: May 23 2011
Posts: 31
Major success. I fixed Links - OBJR
Tags structure dropdown menu>Find "Unmarked Links">Tag Element.I think it works a bit better if you select in Structure hyperlink, then do the Tag Element. It puts the Links - OBJR below the hyperlink. It has to be dragged above hyperlink. But this is better than having the Links - OBJR at the very bottom of structure then having to find the hyperlink and drag up.


New Q. Will screenreaders read better than Acrobat's Read outloud reading: TTY, ID as words, and some TOC leaders as dot page #s, periods after a url as DOT.
kat


Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Success is always a most enjoyable event <g>.
I'll have to get back to you re: your question. Have some homework to complete this evening.
Have not forgotten the earlier question - have some thoughts on that; but, later.
.
Link to the video:
.
folio 20's tags tweaked, reflow available..


Be well...

katm
Registered: May 23 2011
Posts: 31
Your video is phenomenal!. Your voice instructions/explanations are perfectly paced, not too fast or dead silence. Great explaining which made it easy to understand. I learned a lot and intend to re-watch again and again and again. So I'm hoping you'll leave it up.

One problem I'm having with reflow is that eventually, I can't get some of my page back, even after quitting and restarting. So I'm keeping reflow use to an absolute minimum.

Here's the info I got on the last Q:

<< read as letters instead of words, >>
The industry is working on ways to do this. It will have to be a solution
from W3C, the accessibility community and AT manufacturers, if they can ever
agree on anything!

For right now, if it's in ALL CAPS, various AT will search their internal
dictionaries for the word and voice it. If it's A.L.L. C.A.P.S. with
periods, the word will be voice letter by letter. Otherwise, it's a crap
shoot and you can't control it. It's up to the various ATs to figure this
out for their customers.

<< how to get TOC leaders not say dot page #s and periods after a url read
as DOT >>
Not possible. It's up to the AT manufacturers to control how these are
voiced to their customers. Your only solution is to change the visual design
of your project so that it doesn't use dot leaders at all.

Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
For your questions - I'd say you established the bounding parameters for the issues rather well.
Understanding that AT vendors must, to some extent, operate in catch up mode mitigates somewhat.
Regardless, the information reinforces why it is not a "best practice" to benchmark accessibility of a PDF on any particular AT software or version thereof.
.
My thought is that a well-formed Tagged PDF that harmonizes well with ISO 32000-1 (and soon ISO 14289-1) information is desired goal.
.


Be well...

katm
Registered: May 23 2011
Posts: 31
I'm having a lot of trouble with Acrobat X's screen reflow. After reflowing lots of pages in a book, a page will occasionally lose its correct layout. Page seems to get stuck in reflow mode, and reflow will be grayed out.

Quit and restart don't always help. And sometimes even restarting my MacPro doesn't return the page to its proper layout. I don't suppose there is a way to add a 2nd scratch disk/drive for memory overload the way you can in Photoshop?

My current workaround is to save backups constantly, just in case I lose visual integrity for a page.

Kat

katm
Registered: May 23 2011
Posts: 31
Unbelievable!
After watching your movie — twice so far — I'm actually beginning to understand this.

http://www.mcgraphics.us/source/Q.php In dragging tags, I'm finding a lot of spans. What to do with Spans like the ones in Spans.pdf please? Some go on top of text which is already tagged.

thanks, so cool.

Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Remember the <Span> is an in-line grouping element.
It commonly appears when there is something in the line of text that is different from the standard flow of textual characters.
.
It is likely that the container's content is a special hyphen, space, return or other undisplayed character. Examples of these - em space, nonbreaking hyphen, forced return, etc.
So, in InDesign, look at what is specifically present:

--| between the end of "kp.org/medications" and the "1" that starts the footnote.
--| how is the "circle 3" placed on its line? What was present before it was placed? What InD paragraph tag was used for "Speak up when...." ? Is it inserting an undisplayed character? Sometimes en or em space is used in the configuration of paragraph tags/styles. These or similar undisplayed characters would give container "content" such as file "Spans.pdf" depicts.
.
It is high order probability that the <Span> elements' containers have an undisplayed character coming from what was mastered back in InD. Does InDesign, when installed, provide a "character sets" file? Get that with FrameMaker; my hard copy is much used <g>.
.
I'd not be moving what you've shown. I believe that you've undisplayed characters in play with the line(s) associated textual content. The <Span> elements simply account for what is present at the location as one traverses the structure tree.
.
Typically, folks are told to remove <Span> elements and containers such as the file shows.
Sometimes doing so can result in run-on reading by screen readers.
So, play with some testing PDF(s) to see what happens & then decide.
.
.
About reflow of "working" PDFs. I'd suggest finishing the structure tree grooming before using reflow.
A Malformed structure tree (elements not correctly placed, elements not correctly assigned, element semantics not observed) is akin to beach sand taking up residence in the netbook.
Get a well-formed structure tree. Use Full Check to identify issues. Print to accessible text file or evaluation. Use the Find tool (option menu in the Tag panel) - use all choices to cross check what is on the page(s). Once you are comfortable (or sort of comfortable <g>) give reflow a try.
.
You asked about a scratch disk/drive capability with Acrobat. No out of the box capability for that.
.
The video - glad it is useful. Keep in mind, for the long term, it'd be useful to have that Adobe provided PDF with the ISO 32000-1 content.
When watching the video do keep in mind that, for folio 20, all the puzzle pieces were available. With an understanding of Tag/element function and semantics the placement was all that was needed.
.



Be well...

katm
Registered: May 23 2011
Posts: 31
Thank goodness you've given me a clear understanding of s as I'm now supposed to change 157 abbreviations (in one of 12 books) like TTY to with Alt text—chuckle. Working on this project I vacillating between terror and fascination.So of course more questions for you.

Reading Order Panel: what is the correlation between RO and Tags, e.g., how does AT read if Tags are out of order but Reading Order is correct and visa versa?

Q — Accessibility Check errors:
1.Tab order, which doesn't disappear when I select all pages and select check on "Document Structure" for reading order.
2. Unmarked content, which is all the content I've artifacted in InDesign, like blue backgrounds, master page content. I artifacted all master page content though I wasn't sure if I needed to or not, and made all one page content that I need tag "live" on the page.

Q— How much sloppy structure I can get by with, e.g.:
1. InDesign puts most Link-OBJRs in a separate Link container, usually close to the the Link element container, but Tags>Find> doesn't turn up many Unmarked Links.
2. Exported ID pdf "Articles" usually multiple contain Story tags for a single page. Storys could be consolidated into 1 Story, but as I go down the structure tree, Tags are mostly in proper reading order.

Thanks, again for your help and patience.
Kat







Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
A quick reply - lights out approach over here on the eastern zone.

Abbreviations, acronynms, initialisms (&, yes much of it no more than some work discipline's "jargon").
So you (*manually*) provide Alternate Text discription for each of these (shudder). But first, you'll have to isolate the text string with the structure tree (<Span> becomes an intimate friend). Once you've edited the structure tree with success you can loop back top to bottom to manually add Alt Text.
Or, perhaps, for much of it, simply use plain language (aka spell it out). Of course there are many legitimate critters such as TTY that by "common usage" (which means the general population and *not* some discipline specific sub-sub-population) is "understood" - but, Alt Text for such is still what is needed. Remember, "I cannot 'see' it. Put yourself in to the "shoes" - cut a swath out of some panty hose. Get some of those craft "sticks" (like a ice cream stick piece of wood). Staple the piece of panty hose to the sticks. Staple a rubber band to the sticks. Place over the eyes. Do this for an hour while using the computer.
Back in the day we did this for "there's a fire - put it out" drills on the boats. Anyway, the vision obscuring device (VOD) can give you a sense of what one with a visual impairment must deal with.
.
Terror and fascination — aptly stated. You come to realize how much you take for granted when viewing/reading information.
.
Tab order. Please clearify. Pages panel in Nav pane. Seclect all page thumbnails. Right click for context menu. Select Propertes. Tick "Use Document Structure". — This is not available with Acrobat X on the Mac?
AT uses the structure tree. If no structure tree then Adobe Reader / Acrobat does the "on the fly" development of tagging for AT. Works nice for the simple and not so nice for the complex. Programmatic solutions are simply the implementation of what someone else determines is adequate for the user. Sometimes this results in "thank you, thank you, thank you" and other times "*&!!#*— ~`". So, best practice: Content author/PDF post-processor provides a well-formed structure tree for the Tagged PDF (result: Snoppy Dance).
.
Artifact(s) are, if you will, a sub-set of unmarked content. However, content set as an artifact tells AT the content can be ignored. Acrobat Full Check won't "flag" Artifacts as unmarked content. So, if Full Check indicates something is unmarked then some other content is lurking on the page(s).
.
Link / Link-OBJR - want to say something - but, really need to run a trial to make sure what I state reflects reality and not what I'm pulling from the memory bank. So, I'll return to this latter.
.




InD "Articles" typically role map the PDF element <Article> (/Art). InD "Story" typically role maps to PDF element <Section> (/Sect). Semantically, an Article can certainly contain child Sections and Section elements can have child Section elements. It can get busy. Does not hurt to simplify.
What is not semantically correct is nested /Art elements within a parent /Art element.
.
Owe you a video on <Note> element. FrameMaker (5.0 through 9.0) is pretty consistent. MS Word pre-2007 & MSW 2007 differ. In one scenerio (with MSW) the Note element is ignored. In essence Word decides to ignore the concept of footnote / endnote. An interesting approach.
.
Later.




Be well...

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
1. InDesign puts most Link-OBJRs in a separate Link container, usually close to the the Link element container, but Tags>Find> doesn't turn up many Unmarked Links.
.

If <Link-OBJR> is present (for the associated Link annotation on the PDF page) then Find: Unmarked Links yields no return/hit (the highlight on the PDF page).

fwiw: Proper output from the authoring application, for an accessible link, should provide:
<Link>
--| <Link-OBJR>
--| [container] {text string having the Link annotation over it}
.

Be well...

katm
Registered: May 23 2011
Posts: 31
I just discovered that you can use ID layers to control Accessible tagged pdf reading order. 1st read in Acrobat reading order panel is lowest in ID layers.
In ID layers heirarchy is a page by page sequence. This is a big time time save. And I don't have to do all the acronyms. Learning and moving forward

Kat

daka630
Expert
Registered: Mar 1 2007
Posts: 1420
You are working out of InD CS5.5, yes?
Impressive improvements there.
With proper content mastering and post-processing of Tagged PDF output the PDF fits on many "shelves" well.
Accessible, solid for reflow on mobile device and optimized for the search engine crawl when on the web.
And then there's ePub output.

New horizons open while having fun <g>.


Be well...

katm
Registered: May 23 2011
Posts: 31
Fun? Not quite there yet ; )

Kat

katm
Registered: May 23 2011
Posts: 31
http://www.mcgraphics.us/source/Q.php

TableAltText.pdf is what I'm supposed to do. It's the blue cells I'm having trouble with.
SpanInstructions.pdf is the how-to.

I don't know if I understand, but
SFM_pg22ChartNoteTest.pdf is my attempt , and I'm not sure if it is working.

Thanks for the help.

Kat