Splitting and rebuilding strings

Learn how to convert or extract information from a string of text.

By Thom Parker – July 7, 2006

 

Scope: All Acrobat versions
Skill Level: Beginner
Prerequisites: Familiarity with the Acrobat JavaScript Console

It’s often necessary to convert or extract information from a string of text. In fact, this is such a common task that JavaScript gives us a whole range of ways to search and manipulate strings. In this tip we’ll cover a simple and easy to use, but powerful method for splitting and rebuilding strings.

Strings of text, no matter what they are used for, almost always follow some simple pattern. This pattern is the key to manipulating the string in a useful way. The most common pattern is groups of characters separated by a delimiter. This particular pattern is so common in fact, that JavaScript provides a special function for dealing with it, the split() function. In the following example this function is used to break a sentence into individual words.

var strSentence = "This is a short sentence"; 

// split sentence into an array of words, single space delimiter 
var aWords = strSentence.split(""); 
var nNumWords = aWords.length;

In this example the split function is used to get a word count, but it could also be part of a script that builds an index or finds word frequency. Splitting the sentence into an array puts it into a form that makes it easier to manipulate. Let’s look at another example of a common string pattern, file paths. In Acrobat, JavaScript file paths are separated by the “/” character. In the following example the path is acquired from the current document.

// Acquire path from current document 
var myDocPath = this.path; 

// break into an array of path components 
var aPathComps = myDocPath.split("/"); 
var pathRoot = aPathComps[1]; // second element 
var myFilename = aPathComps[aPathComps.length-1]; // Last element

Assume the disk path to the file is C:\MyPDFFiles\myFile.pdf. The path in Acrobat JavaScript is then /c/MyPDFFiles/myFile.pdf. This is the path returned by the this.path property, it has 3 components, but the next line of code actually splits the path into 4 components. This apparent mismatch is because the Acrobat path starts with the “/” character, the same character used for the splitting, resulting in an empty element in the first index of the aPathComps array. This is the reason that the line of code that gets the path root, i.e. the hard drive letter, uses the second array entry rather than the first. The last line of code acquires the file name, which is the last entry in the array.

Now that the path is split into an array, the path elements can be easily manipulated using the array functions. For example, use the following code to find the full path to a different file in the same folder location as the current file.

// First pop the old file name off the end 
aPathComps.pop(); 

// Next, add the new file name to the end of the array 
aPathComps.push("MyOtherFile.pdf"); 

// Put the path back together as a string 
var strNewPath = aPathComps.join("/");

This same technique provides a quick way to split a file name from its extension.

var fileExt = myFilename.split(".").pop();

This codes works because the pop() function removes and returns the last element of the array. To get the bare file name, the first element of the array, use the shift() function, which removes and returns the first element of the array. In the following statement the file name, acquired in an earlier example, is split into the main file name and the file name extension. The main file name is then shifted off of the array, all in one line of code.

var fileNmRoot = myFilename.split(".").shift();

This statement could be used to create a modified file name for saving the document after an editing operation like adding a watermark. In the following code it is assumed that a watermark has just been added to the PDF. The file path is first split into the aPathComps array and the file name removed from the end of the array, as shown in an earlier example. A new file name is created from the old one and pushed back onto the path array to form a file path to a completely new file name for saving the watermarked version of the original file.

// First split document path into an array 
var aPathComps = this.path.split("/"); 

// Get File Name off end of array 
var myFileName = aPathComps.pop(); 

// Get Root of file Name v
ar fileNmRoot = myFileName.split(".").shift(); 

//Use original file name as base for new file name 
var fileNmNew = fileNmRoot + "_watermark.pdf" 

// Add new file name back into array of path elements 
aPathComps.push(fileNmnew); 

// Create new file path by joining the path array 
var strNewPath = aPathComps.join("/"); 

// Save to the new file name 
this.saveAs(strNewPath); 

Assuming that the path returned by this.path is /c/MyPDFFiles/myFile.pdf, the value of strNewPath would be /c/MyPDFFiles/myFile_watermark.pdf.

This string manipulation technique can even be used to rebuild strings into fancier formats, such as converting a date into a registration or account number. For this example a two part ID separated by a dash is created. The first part will be the date, taken from a field on the form, and the second part will be a 4 digit random number.

var strDate = "3/20/2006"; // Starting Date 
var strFirstPart = strDate.split("/").join(""); 
var nSecondPart = Math.random()*10000; 
var strRegNum = util.printf("ID%s-d", strFirstPart, nSecondPart);

Notice the split() and join() functions are done in the same line. This works because the return value from the split() is an array which is passed as the base object for the join(), which is an array function. This type of daisy chaining allows for the creation of complex functionality in a single line. It’s convenient here because all that needs to be done is remove the forward slashes from the date string. The result will look like this:

ID3202006-1234

For more information on string and array functions consult any standard JavaScript reference. Below is the official core JavaScript web reference.



Related topics:

JavaScript

Top Searches:


0 comments

Comments for this tutorial are now closed.

Comments for this tutorial are now closed.