These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Read Text and convert to Field via JavaScript

ysegura
Registered: Feb 16 2011
Posts: 12
Answered

We are printing document/output from business application to PDF.
Once PDF completes and opens up, we run a Javascript to triger "Auto Form Recognition". However, Some text failed to be recognized as fields.
 
Is there a way to read a specific vector of the PDF document, read/recognize the content and, if needed, create a new field on the fly placing the read value as content of this new field just created?
 
Please Advice/help!

YS

My Product Information:
Acrobat Pro 8.1.7, Windows
maxwyss
Registered: Jul 25 2006
Posts: 255
If it is to find specific text, you can parse the document and find the text by using the getPageNthWord() method. With some logic to look at neighbors (if you have more than a single word to search), you can then extract its coordinates with getPageNthWordQuads, and based on that, you can create your field.

I am not aware that something similar exists for graphic elements; for that you would need a custom plug-in.

On the other hand, IMHO, the effort you have to put into fixing and touching up auto-recognized fields overpasses the effort to simply create those fields manually. And then, you do it right, and the way you want it.

HTH.

Max Wyss.

ysegura
Registered: Feb 16 2011
Posts: 12
Max, thx. For clarity, if you just printed a doc to PDF, and you noticed the document contains information such as:

INVOICE: 12567
VENDOR: 4START_LANDSCAPING

how would you go about reading (obtaining) the correspoinding INVOICE# (12556) and the VENDOR (3START_LANDSCAPING), so you could use this information later to either name the file, change Metadata, create fields, query database, etc.. with such captured information? Of course, we are not interested in converting the PDF to a form, since it is intented to be used once and be archived.

Does the getPageNthWord() still the right function for the Job to capture this info?

YS

ysegura
Registered: Feb 16 2011
Posts: 12
If I have the coordinates (Quads) of the location in the page where the info/data always prints/shows, would it be possible to obtain the content based on such Quads?

YS

maxwyss
Registered: Jul 25 2006
Posts: 255
Accepted Answer
If you have the coordinates of the text, it would be easier to find the right words. You still would have to parse the page, but you could very quickly exclude a word from the process, if its coordinates are not within the range you are looking for. And you may not need to do as much guesswork as otherwise.

Guesswork means applying Regular Expressions to the words, in order to figure out whether they make sense. This is what you would do if you would not have coordinates to look at.

HTH.

Max Wyss.