These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

GetText - order of lines in a table

cl5792
Registered: Jul 17 2008
Posts: 53

I am using GetText to extract the data from a PDF. It is working fine, except the order in which the lines are build is off from a top to bottom read.
Also, I have a couple of tables on the page. One seema to be reading the 1st column down and then the next column down, but the row order is off. and another table reads correctly in row order and all the way across columns for each line.

Is there a specific rule applied as to how the data is read in the PDF?

Thanks

My Product Information:
Acrobat Standard 8.1.2, Windows
thomp
Expert
Registered: Feb 15 2006
Posts: 4411
No, the text data is a PDF can be ordered in any way the design tool that created it sees fit. There is no rule or reason behind it. This is becuase each string of text has it's own coordinates, so ordering is irrelevant. Text is placed on the PDF page accordering to its specified X and Y location. It's order of appearance in the content stream has nothing to do with it.

First off, this is not a JavaScript question and it is very rude of you not to explain how you have implemented your solution. "GetText" is an IAC fucntion, not a JavaScript function. The equivalent in Acrobat JavaScript is "getPageNthWord()". There is also a function in Acrobat JavaScript named "getPageNthWordQuad()" which provides the coordinates for each string on a page. The equivelant in the IAC is "GetBoundingRect()". If you are looking for word ordering, especially in a table, then you need to use the word's quad, and/or rectangle, to determine how it fits into the table.

Please do not ask questions here again without specifiying the technology that you are using.

Thom Parker
The source for PDF Scripting Info
[url=http://www.pdfScripting.com]pdfscripting.com[/url]

The Acrobat JavaScript Reference, Use it Early and Often
[url=http://www.adobe.com/devnet/acrobat/]http://www.adobe.com/devnet/acrobat/[/url]

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script