These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Extracting Text from Specified Quads v. Acrobat Javascript

CJ Flores
Registered: Feb 20 2009
Posts: 4

Hi,

I am an Adobe Acrobat 8.1.3 Professional user, new to Javascript.

I am extracting K1s from a single large file, which I have been able to do via a batch process. I would like to extract the name of each individual, which is in a specific location on each K1, and set it as as part of the filename on that individual's extracted K1.

Through internet searches and purusing the Adobe Reference and Developers handbooks, I have been able to find the coordinates of the name using getPageNthWordQuads() and so on, but am stuck as to how I then use these coordinates to extract any information.

Can anyone offer some help? I'd appreciate any tips I can get.

Thanks

My Product Information:
Acrobat Pro 8.1.2, Windows
thomp
Expert
Registered: Feb 15 2006
Posts: 4411
The complementary function to "getPageNthWordQuads()" is "getPageNthWord()" which returns the Nth word on the page. Typically, there are a couple of ways you would use these two functions together.

1) If there is a known location on the page for a particular piece of data, then you'd look for quads that fit into that location, then extract the words associated with those quads.

2) if a particular piece of data is associated with a keyword, for example "Invoice Number", then you'd look for that key word, then use the quads for that keyword to build a search location for the actual data. Then go back and search for quads that meet that criteria.

Thom Parker
The source for PDF Scripting Info
[url=http://www.pdfScripting.com]pdfscripting.com[/url]

The Acrobat JavaScript Reference, Use it Early and Often
[url=http://www.adobe.com/devnet/acrobat/javascript.php]http://www.adobe.com/devnet/acrobat/javascript.php[/url]

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

CJ Flores
Registered: Feb 20 2009
Posts: 4
Thom,

Thanks for the reply; your posts on this forum are the main reason I have been able to develop this much so far.

I was able to use these functions in conjunction to find out the coordinates of the name on the K1 using this code (Name on an example K1 is 'Norman'):

/*Find Quads*/

var ckWord, numWords;
for (var i = 0; i < this.numPages; i++ )
{
numWords = this.getPageNumWords(i);
for (var j = 0; j < numWords; j++)
{
var ckWord = this.getPageNthWord(i, j);
if(ckWord = "NORMAN")
{
console.println(this.getPageNthWordQuads(i,j))
}
}
}

The Quads that it returned are:

quads: [[43],[756],[78],[756],[43],[745],[78],[745]]

I want to be able to extract the text located in the above coordinates and put it into the filename for hundreds of K1s.

Unfortunately I am at a standstill in development need an example of a code that might do this to progress further. If anyone wants to take the time to draft something or post an existing example you'd really be helping me out.

Thanks
thomp
Expert
Registered: Feb 15 2006
Posts: 4411
The quads returned are the 4 (x,y) coordinates of the corners of a box surounding the word "NORMAN". So the question is, how big is the area above the word NORMAN? All you have to do is use this information, the location of NORMAN and the size of the area above NORMAN, to calculate the coordinates of a search box. Then go through all the words on the page to find the ones that fit in the box.

While simple in concept, this is a time consuming process, especially without more specific information. I don't think anyone is going to provide you with an exact solution for free. This forum is for providing direction so that you can accomplish the task.

Thom Parker
The source for PDF Scripting Info
[url=http://www.pdfScripting.com]pdfscripting.com[/url]

The Acrobat JavaScript Reference, Use it Early and Often
[url=http://www.adobe.com/devnet/acrobat/javascript.php]http://www.adobe.com/devnet/acrobat/javascript.php[/url]

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

CJ Flores
Registered: Feb 20 2009
Posts: 4
Okay, thanks for the guidance. I'll give it a shot.
TecNik
Registered: Feb 25 2008
Posts: 17
Hi there,

I found the code listed very useful for something I needed.
Thought I ought to post this reply to say the line below needs a tweak:-

if(ckWord = "NORMAN")

I think it should have 2 '=' in it:-

if(ckWord == "NORMAN")

The first time I used the code I got a long list of quads.

Regards,

Nick
gavin2u
Registered: Jul 3 2009
Posts: 76
The quads returned are the 4 (x,y) coordinates of the corners of a box surounding the word "NORMAN".  So the question is, how big is the area above the word NORMAN?  All you have to do is use this information, the location of NORMAN and the size of the area above NORMAN, to calculate the coordinates of a search box.  Then go through all the words on the page to find the ones that fit in the box.
mmm, a little complex, i think. :(
ITFLA
Registered: Jan 28 2011
Posts: 3
After a great deal of effort, I was able to use the tips above and iterate the link text as follows:

for(var page = 0;page < this.numPages;page++)
{
var b = this.getPageBox("Crop", page);
var l = this.getLinks(page, b);
console.println("Page " + page + " has " + l.length + " links");
var numWords = this.getPageNumWords(page);
if (l.length > 0)
{
for(var iLink = 0;iLink < l.length;iLink++)
{
var target = (l[iLink].rect);
var result = "";
var selection;
for(var i = 0;i < this.getPageNumWords(page);i++)
{
selection = this.getPageNthWordQuads(page,i);
if(target[0] <= selection[0][2] && target[1] >= selection[0][5] && target[2] >= selection[0][0] && target[3] <= selection[0][3])
{
result += this.getPageNthWord(page,i) + ' ';
}}
console.println("Page " + page + " Link " + iLink + ": (" + result + ")");
}
}
}