This tutorial shows you how to work with the JavaScript features in Acrobat X. See what the all-new Acrobat DC can do for you.

Download a free trial of the new Acrobat.

Getting external data into Acrobat X JavaScript

Learn about changes to workflow automation scripting in Acrobat X.

By March 6, 2011

 

The newly released Acrobat X SDK boasts only a few updates to the Acrobat JavaScript model. But overall, these updates are important ones. Acrobat is now using the new JavaScript 1.8 engine. There were some changes to the security model (making it stricter), a useful DOM object was deprecated, new digital-signature functions were added for document certifications and the UI was given a major overhaul (covered in this video,  Acrobat X Automation Scripting Changes).

The two most severe changes, one exciting and one depressing, have to do with how Acrobat interacts with external data in the context of workflow automation. These updates completely change how we think about automation scripting in Acrobat, and it is these bipolar updates that are covered here.

Background

The goal of automation is to condense a set of complex, time-consuming, difficult, error-prone and/or repetitive operations into a simple, robust workflow. JavaScript is by far the preferred tool for automating Acrobat. It has hooks into most of the Acrobat functionality, and (for a programming language) it is easy and fast to develop. Nevertheless, it does have its drawbacks. The Acrobat JavaScript Model is not “complete.” There are holes in the model that can cause solutions to complex workflows to be somewhat awkward. Better than the manual process, but not ideal. 

One of these holes was a clean and simple way to access data in an external file. It is common for the data used in a workflow to be stored in some external data source, such as a database, Excel, CSV or XML file. For example, say a document needs to be individually watermarked and labeled for each user, then e-mailed to each user. The user information is maintained in a CSV file on the local file system. Watermarking, labeling and e-mailing a PDF are all simple tasks for an automation script. The awkward part is getting the information out of the CSV file. In earlier versions of Acrobat, there were three different, general-purpose methods for solving this issue.

  1. Connect the CSV file to ODBC, and then use the ADBC object to access the data.

    Awkwardness: Windows-only solution; dependent on having the correct ODBC drivers; requires the extra step of manually hooking the data file into ODBC.
  1. Load the CSV into a file attachment, where the data can then be extracted and parsed.

    Awkwardness: Least awkward, but adding a file attachment is indirect and unnecessarily modifies the PDF file. There are also security restrictions on certain file types.
  1. Build a PDF form to act as an intermediary to capture lines of data.

    Awkwardness: Only works with Tab Separated files, so data must be converted. Requires the extra element of the specially built PDF to be maintained and installed with the automation script.

The changes to the new JavaScript SDK have a serious affect on this solution landscape, both good and bad effects. Let’s take a look at the bad first.

No more database access (sort of)

The missing object in the Acrobat X SDK is ADBC, which is used for accessing an ODBC-connected database. From a purely scripting perspective, direct database access is the simplest, cleanest solution for accessing external data. The ADBC object communicates with the DB through standard SQL, and this is the only method for accessing external data that allows the original data file to be modified.

Given the pure, superior data-connecting power of the ADBC object, it was surprising to learn it is also one of the least used and most difficult to maintain features in Acrobat JavaScript. Plus it’s considered a security risk. The object is severely limited because it only operates on Windows. It’s awkward because the ODBC connection has to be manually set up on each system where the automation script is used. 

Because of the security issues, Adobe took steps in Acrobat 8 to make it more difficult to use. A registry setting was added for turning ADBC on and off. The default value is off. As it turns out, this was also a heads-up to developers, “ADBC is Dead.” As of Acrobat X, the ADBC object no longer exists.

Ironically, the same technology as the ADBC object is implemented in the LiveCycle Scripting Model. It has the same limitations and awkwardness (Windows only, etc.), and is less secure because it operates from the document context and does not require a special registry setting.

So, if you are currently using ADBC in your automation scripts, you have the option to move to LiveCycle. Stefan Cameron has written a bit on this topic at his blog, ADBC Now Disabled by Default. I also have some information on LiveCycle database connections in an article titled, Acrobat, PDF, and Excel Spreadsheets.

Direct access to external files

If you had to choose only one thing to improve Acrobat automation scripting, you couldn’t do much better than adding a no-hassles function to read the contents of a disk file, and they did it. This is the best update made to the SDK, the util.readFileIntoStream()

This function reads the contents of a file into a stream object. The “contents” are the data bytes that make up the file. Disk files can contain anything, including Unicode and binary data. JavaScript is a text-based language. The core functionality is designed to handle Unicode, but it doesn’t deal with raw binary data very well. In order to use binary data in the Acrobat JavaScript model, the developers at Adobe created the stream object, which stores binary data. This object is used in any situation where a script needs binary data-- for example, to hold icon-image data for the app.addToolButton() function. So, binary data is the reason the util.readFileIntoStream() function reads data into a stream object, instead of something more convenient like a text string.

For most automation scripting purposes the disk files, such as the CSV file used in the example above, will contain plain text. To use the data from the file, the stream needs to be converted into a text string. Fortunately, Acrobat provides an easy solution as shown in the following code:

//Read file data into stream
var stmFileData = util.readFileIntoStream();

// Convert data into a String
var strTextData = util.stringFromStream(stmFileData);

The first line calls the util.readFileIntoStream() function with no input parameters. However, this function does have two input parameters-- the file path and a Boolean for determining how the data is encoded in the stream object. If the file path is not specified, Acrobat displays the file open dialog to allow the user to browse for the file. If the user selects a file, the contents are returned as a stream object. If the user cancels the dialog, this function returns an “undefined” value. In the next line, the stream object is converted into a String using the util.stringFromStream function. The data can now be parsed using regular String functions.

If the file contains binary data, it cannot be converted directly into a string, so the stream data must be handled differently. The stream object has a read function for extracting the data as hexadecimal encoded text. Each two text characters read from the stream represents one byte of file data. To parse this data, you’ll need to know how every byte of data in the file is used. This is an advanced topic, but in general, a single byte can be converted into a number or text character with the following code:

//Acquire 1 byte of stream data, 2 text characters(Hex Encoded)
var cDataByte = stmFileData.read(1);

// Convert Data into a Number
var nVal = parseInt("0x" + cDataByte);

// Convert Data into a text character 
// (Assuming that it is in fact a text character
var cVal = String.fromCharCode("0x" + cDataByte);

Example:

The PopulateFieldsFromXML_Sample.pdf file contains a folder level JavaScript file and an XML data file. The XML data file is a list of customer information: name, company, and e-mail address. The folder level script places a toolbar button on the Plug-in Add-ons tools panel in both Acrobat X and Reader X. Pressing this button executes code that reads data from the XML file, parses it into an XML object (using X4E), then displays a menu of names acquired from the XML. When a name is selected from the menu, the script writes the associated customer data into fields on the PDF form, if it is open. Installation instructions are included in the PDF form.

Note that in Adobe Reader X, the Tools Panel is displayed only when the current PDF has been enabled with Reader Rights. However, the JavaScript tool is in fact loaded and running. The main function of the JavaScript tool can be accessed from any scripting context in Reader. For example, a menu item can be added to Reader to execute the function, or even a regular form button on a PDF.

This tool could be extended to work with LiveCycle PDF forms and/or generalized to use data in the XML file to control which fields are populated on the form.

The script, i.e., the main function, is at the top of the “PopulateFieldsFromXML_Tool.js” file. 


Did you know?

  • You can ask a question and get an answer from one of our experts.
  • You can search our database of over 800 tutorials by product and/or topic.
  • You can leave a comment below for the author of this tutorial.

Products covered:

Acrobat X

Related topics:

JavaScript

Top Searches:

Edit PDF, create PDF, Action Wizard

4 comments

Comments for this tutorial are now closed.

Michael Gullon

7, 2015-05-27 27, 2015

Hi Thom,

I’d interested to know if the xml file can reside on a server where it can be updated and continuosly accessed by many users?  So when one or more users opens the template acroform on their computer, they always get the latest linked xml data via the server?
Hope that makes sense!

Thanks for a very interesting article

Thom Parker

5, 2015-02-09 09, 2015

Ethan,
  On the subject of trust and privilege, Have you read this article:
https://acrobatusers.com/tutorials/using_trusted_functions

Thom Parker

5, 2015-02-09 09, 2015

Ethan, 
  Acrobat has a rather confusing scripting architecture because scripts are used everywhere Acrobat features are customizable. Actions are just one location where they can be used. Here’s a article I wrote on it:
https://acrobatusers.com/tutorials/scripting-actions

And here is an article on folder level scripts, which are another way to automate Acrobat:
https://acrobatusers.com/tutorials/entering-folder-level-scripts

you can learn a lot more about Acrobat’s scripting setup by watching the videos here:
http://www.pdfscripting.com/public/Free_Videos.cfm

Ethan

1, 2015-02-06 06, 2015

Hello, Thom!

I’ve been reading a number of articles that you’ve written on this site, and I have some questions. In terms of Trusted Folder Level Scripts and Privileged Objects and all that, I am mostly confused.

The other thing that confuses me is where these scripts need to be installed.

Currently, I test a little script as far as I can in the Javascript Console (as you suggested), and then I paste that code into “Action Wizard > Create New Action > More Tools > Execute Javascript”. I just don’t know how this is all supposed to fit together. Is there a crash course somewhere? I don’t even begin to know what to Google first.

Lori Kassuba

10, 2013-12-13 13, 2013

Hi Chris Davies,

Open the PopulateFieldsFromXML_Sample.pdf file mentioned in this article. You’ll find the .js file as an attachment to this PDF file.

Thanks,
Lori

Chris Davies

8, 2013-12-10 10, 2013

hmm I thought I commented but I don’t see it.  Thom, I couldn’t find the .js or .xml files, are they still up?

Chris Davies

4, 2013-12-10 10, 2013

hey I can’t find the .js or .xml files.  Where are they??

Thom Parker

4, 2013-01-28 28, 2013

Why yes Esteban, you can use Net.HTTP as another way to acquire external data. You’ll find an example for this at www.pdfscriting.com. However, Net.HTTP is a privileged object, so it cannot be used in a PDF script. It can only be used in a trusted folder level script.

And to Michael. JavaScript has been incorporated into the full Created Suite for many years now.

Esteban Vasquez

11, 2013-01-09 09, 2013

You can use Net.HTTP.request to make requests to a local server running something like Python. This will allow you read and write pretty to anything, including files or databases.

Michael Anderson

12, 2012-12-05 05, 2012

Very interesting article, especially since JavaScript has become so powerful in the last decade that I wouldn’t mind it being incorporated into other Adobe software.

Comments for this tutorial are now closed.