Acrobat User Community

Getting external data into Acrobat X JavaScript

By Thom ParkerMarch 6, 2011

The newly released Acrobat X SDK boasts only a few updates to the Acrobat JavaScript model. But overall, these updates are important ones. Acrobat is now using the new JavaScript 1.8 engine. There were some changes to the security model (making it stricter), a useful DOM object was deprecated, new digital-signature functions were added for document certifications and the UI was given a major overhaul (covered in this video,  Acrobat X Automation Scripting Changes).

The two most severe changes, one exciting and one depressing, have to do with how Acrobat interacts with external data in the context of workflow automation. These updates completely change how we think about automation scripting in Acrobat, and it is these bipolar updates that are covered here.

Background

The goal of automation is to condense a set of complex, time-consuming, difficult, error-prone and/or repetitive operations into a simple, robust workflow. JavaScript is by far the preferred tool for automating Acrobat. It has hooks into most of the Acrobat functionality, and (for a programming language) it is easy and fast to develop. Nevertheless, it does have its drawbacks. The Acrobat JavaScript Model is not “complete.” There are holes in the model that can cause solutions to complex workflows to be somewhat awkward. Better than the manual process, but not ideal. 

One of these holes was a clean and simple way to access data in an external file. It is common for the data used in a workflow to be stored in some external data source, such as a database, Excel, CSV or XML file. For example, say a document needs to be individually watermarked and labeled for each user, then e-mailed to each user. The user information is maintained in a CSV file on the local file system. Watermarking, labeling and e-mailing a PDF are all simple tasks for an automation script. The awkward part is getting the information out of the CSV file. In earlier versions of Acrobat, there were three different, general-purpose methods for solving this issue.

  1. Connect the CSV file to ODBC, and then use the ADBC object to access the data.

    Awkwardness: Windows-only solution; dependent on having the correct ODBC drivers; requires the extra step of manually hooking the data file into ODBC.
  1. Load the CSV into a file attachment, where the data can then be extracted and parsed.

    Awkwardness: Least awkward, but adding a file attachment is indirect and unnecessarily modifies the PDF file. There are also security restrictions on certain file types.
  1. Build a PDF form to act as an intermediary to capture lines of data.

    Awkwardness: Only works with Tab Separated files, so data must be converted. Requires the extra element of the specially built PDF to be maintained and installed with the automation script.

The changes to the new JavaScript SDK have a serious affect on this solution landscape, both good and bad effects. Let’s take a look at the bad first.

No more database access (sort of)

The missing object in the Acrobat X SDK is ADBC, which is used for accessing an ODBC-connected database. From a purely scripting perspective, direct database access is the simplest, cleanest solution for accessing external data. The ADBC object communicates with the DB through standard SQL, and this is the only method for accessing external data that allows the original data file to be modified.

Given the pure, superior data-connecting power of the ADBC object, it was surprising to learn it is also one of the least used and most difficult to maintain features in Acrobat JavaScript. Plus it’s considered a security risk. The object is severely limited because it only operates on Windows. It’s awkward because the ODBC connection has to be manually set up on each system where the automation script is used. 

Because of the security issues, Adobe took steps in Acrobat 8 to make it more difficult to use. A registry setting was added for turning ADBC on and off. The default value is off. As it turns out, this was also a heads-up to developers, “ADBC is Dead.” As of Acrobat X, the ADBC object no longer exists.

Ironically, the same technology as the ADBC object is implemented in the LiveCycle Scripting Model. It has the same limitations and awkwardness (Windows only, etc.), and is less secure because it operates from the document context and does not require a special registry setting.

So, if you are currently using ADBC in your automation scripts, you have the option to move to LiveCycle. Stefan Cameron has written a bit on this topic at his blog, ADBC Now Disabled by Default. I also have some information on LiveCycle database connections in an article titled, Acrobat, PDF, and Excel Spreadsheets.

Direct access to external files

If you had to choose only one thing to improve Acrobat automation scripting, you couldn’t do much better than adding a no-hassles function to read the contents of a disk file, and they did it. This is the best update made to the SDK, the util.readFileIntoStream()

This function reads the contents of a file into a stream object. The “contents” are the data bytes that make up the file. Disk files can contain anything, including Unicode and binary data. JavaScript is a text-based language. The core functionality is designed to handle Unicode, but it doesn’t deal with raw binary data very well. In order to use binary data in the Acrobat JavaScript model, the developers at Adobe created the stream object, which stores binary data. This object is used in any situation where a script needs binary data-- for example, to hold icon-image data for the app.addToolButton() function. So, binary data is the reason the util.readFileIntoStream() function reads data into a stream object, instead of something more convenient like a text string.

For most automation scripting purposes the disk files, such as the CSV file used in the example above, will contain plain text. To use the data from the file, the stream needs to be converted into a text string. Fortunately, Acrobat provides an easy solution as shown in the following code:

//Read file data into stream
var stmFileData = util.readFileIntoStream();

// Convert data into a String
var strTextData = util.stringFromStream(stmFileData);

The first line calls the util.readFileIntoStream() function with no input parameters. However, this function does have two input parameters-- the file path and a Boolean for determining how the data is encoded in the stream object. If the file path is not specified, Acrobat displays the file open dialog to allow the user to browse for the file. If the user selects a file, the contents are returned as a stream object. If the user cancels the dialog, this function returns an “undefined” value. In the next line, the stream object is converted into a String using the util.stringFromStream function. The data can now be parsed using regular String functions.

If the file contains binary data, it cannot be converted directly into a string, so the stream data must be handled differently. The stream object has a read function for extracting the data as hexadecimal encoded text. Each two text characters read from the stream represents one byte of file data. To parse this data, you’ll need to know how every byte of data in the file is used. This is an advanced topic, but in general, a single byte can be converted into a number or text character with the following code:

//Acquire 1 byte of stream data, 2 text characters(Hex Encoded)
var cDataByte = stmFileData.read(1);

// Convert Data into a Number
var nVal = parseInt("0x" + cDataByte);

// Convert Data into a text character 
// (Assuming that it is in fact a text character
var cVal = String.fromCharCode("0x" + cDataByte);

Example:

The PopulateFieldsFromXML_Sample.pdf file contains a folder level JavaScript file and an XML data file. The XML data file is a list of customer information: name, company, and e-mail address. The folder level script places a toolbar button on the Plug-in Add-ons tools panel in both Acrobat X and Reader X. Pressing this button executes code that reads data from the XML file, parses it into an XML object (using X4E), then displays a menu of names acquired from the XML. When a name is selected from the menu, the script writes the associated customer data into fields on the PDF form, if it is open. Installation instructions are included in the PDF form.

Note that in Adobe Reader X, the Tools Panel is displayed only when the current PDF has been enabled with Reader Rights. However, the JavaScript tool is in fact loaded and running. The main function of the JavaScript tool can be accessed from any scripting context in Reader. For example, a menu item can be added to Reader to execute the function, or even a regular form button on a PDF.

This tool could be extended to work with LiveCycle PDF forms and/or generalized to use data in the XML file to control which fields are populated on the form.

The script, i.e., the main function, is at the top of the “PopulateFieldsFromXML_Tool.js” file.