Scope: All Acrobat versions
Skill Level: Intermediate
This regular expression matches the word "dog." The expression does not contain any special characters (only standard-text characters). It is case sensitive and it matches the specified characters verbatim, nothing more and nothing less. It matches them in the order and case in which they are written. It will not match "Dog" or "DOG" or "doog." Also, it will only match the first occurrence of "dog" in the text to which it is applied. For example, the following sentence includes two occurrences of "dog." The regular expression above will find only the first one.
My dog smells worse than your dog.The original Regular Expression can be easily modified to be case insensitive and to match all occurrences through the addition of some special characters.
var myRegExp = /dog/; // Literal Notation var myRegExp = new RegExp("dog"); // Object Notation
var myRegExp = /dog/; var myText = " My dog smells worse than your dog"; if(myRegExp.test(myText)) app.alert("Found a dog!",2)
For the first variation, change the string to:
var myText = " My doggie smells worse than your pooch";
Even though both occurrences of "dog" have changed, the test will still return true. That's because the regular expression doesn't care what's in front or behind the pattern. It's just looking for the three letters, exactly how they are written in the expression. To find the individual word "dog," the expression needs to be modified to look for word boundaries.
var myRegExp = /\Wdog\W/;
Now things are starting to look cryptic. That's one of the main characteristics of regular expressions, they can look scary. Remember, regular expressions date to the stone age of computing, but they are not as bad as they look. With a little knowledge, writing these expressions will seem easy in just a short time. For example, the "\" character in the expression above is called an "escape," and it tells us the next character has a special meaning. The Escape is used a lot. It gives regular characters special meaning and turns special characters into regular characters. The special meaning of the "W" is to match any non-word character. Things like spaces, new lines, and punctuation.
The current string and regular expression, as we've just modified them, will fail the test because the word "dog" does not exist by itself anywhere in the text. Now let's change the text to:
var myText = " My pooch smells worse than your dog.";
This text will pass the test and display the alert because the word "dog" is preceded by a space, and followed by a period. The period is a non-word character.
Let's make this more complex. Change the text to capitalize "Dog."
var myText = " My pooch smells worse than your Dog!";
The test will now fail, because the upper case "D" in "Dog" does not match the lower case "d" used in the regular expression. To make the expression match both "dog" and "Dog" change it like this:
var myRegExp = /[Dd]og/;
This square brackets "[ ]" enclose a list of acceptable variations in a single-character match. As many characters can be put in square brackets as needed to cover all variations needed for the match. For example:
var myRegExp = /[Ddlgm]og/;
This expression matches "Dog," "dog," "log," "gog" and "mog."
But to get back on track, let's say the match must be completely case insensitive. We don't care which, if any, letters are capitalized. In this case, use:
var myRegExp = /dog/i;
The "i" following the end of the expression is called an attribute. There are only a few attributes and they are generally for more advanced features. But this one is easy, it makes the match case insensitive. Try it with this text:
var myText = " My pooch smells worse than your DOG!";
For the next example, we'll change the expression to match multiple characters.
var myRegExp = /do+g/;
The "+" symbol means match one or more occurrences of the preceding thing. In this case, the "+" is preceded with the single "o" character so it will match "dog," "doog," "dooog" or any number of "o"s in the word "dog." Try it with this sentence:
var myText = " My pooch smells worse than your Doooog!";
Now let's take a small diversion and look at one of the most common regular expressions that I use, the empty test. I use it mostly to detect empty form field values and empty string variables.
var rgEmpty = /^\s*$/;
This expression looks very cryptic because it is composed entirely of special characters, but it is much simpler than it first appears. The caret symbol, "^" matches the beginning of the text and the dollar sign "$" matches the end of the line. Using these special characters means the rest of the pattern must match the entire line of text verbatim, i.e. from the beginning to the end. The rest of the pattern is composed of two elements, the "\s" special character and the asterisk "*" special character. The "\s" matches any white space. White space is anything you can't actually see but has an effect on the text, such as spaces, tabs, and new lines. The "*" symbol means match zero or more occurrences of the preceding thing. So this pattern matches either nothing (an empty string) or a string of blanks.
var myText = " My dog smells worse than your dog"; myNewText = myText.replace(/dog/,"pooch");
Notice that the "replace()" function is a member of the String Object, not the Regular Expression Object. The regular expression is the first argument to this function. When this code is run, the result is placed in the variable "myNewText." Try it, and you'll see that only the first occurrence of "dog" is replaced. To replace all occurrences the regular expression will need to be modified like this.
myNewText = myText.replace(/dog/g,"pooch");
Notice the "g" attribute added to the expression. It means global, so the pattern is applied globally to the text string.
It would be impossible to provide a complete reference for using regular expressions here. They are just too rich for one article. Table 1 and 2 below show a short list of commonly used special pattern-matching characters.
Table 1 - Character Matching
|\D||Matches anything but 0-9|
|\s||Matches white space, includes spaces, tabs, and new lines|
|\S||Matches anything but white space|
|\w||Matches word characters a-z, A-Z, 0-9, and the underscore|
|\W||Matches anything but a word character|
|.||Matches any character|
|^||Matches the beginning of a line|
|$||Matches the end of a line|
Table 2 - Character Repetition
|?||Match 0 or 1 occurrence of the previous item|
|*||Match 0 or more occurrences of the previous item|
|+||Match 1 or more occurrences of the previous item|
The special characters in Table 2 and the last three in Table 1, as well as other special characters-- like the square brackets and parentheses (which weren't discussed)-- can't be used to match their respective characters in a text string. Because, of course, they are themselves special characters. The way to get around this limitation is to prefix them with the escape character, "\." Here's an example that matches dollar amounts:
var myRegExp = /\$\d?\d\.\d\d/; var myText = " The hot dog cost $1.75!";
This expression will match the dollar sign, followed by one or two digits, followed by the decimal point (i.e., period), followed by two digits. From Table 2, you can see the "?" character means match 0 or 1 of the preceding item. In this expression, it means match zero or one digits.
There are entire books covering the subject of regular expressions and there's a vast library of information available on the web. Just do a search for "Regular Expression." One of the best sites is this one:
It has a library of cut-and-paste regular expressions for all kinds of common tasks (such as validating a telephone or social security number), as well as tools for building and testing regular expressions.
|Acrobat XIAcrobat XAcrobat 9|
|Edit PDF, create PDF, Action Wizard|