Using unicode text

Learn how to work with the built-in Unicode support in Acrobat JavaScript.

By Thom Parker – November 19, 2006

In JavaScript version 1.3 and later all strings are stored internally as Unicode. This built-in Unicode support gives document developers instant access to a whole world of characters not available on the keyboard or in special font sets like Wingdings or Symbols. In fact, nearly every character set ever used, including Klingon, has a Unicode encoding.

A Unicode encoding is four hexadecimal digits. The two most significant digits indicate a class of characters and the two least significant select an individual character from the set. For example, the Latin characters are all in the 0x00 class which closely matches the standard ANSI set of western European characters. There are character classes for all manor of symbols and languages. Here are a couple examples:

// Format a currency string with the Euro symbol 
event.value = "\u20AC" + event.value; 
// Set the document title in Japanese 
this.title = "\u30A5\u30CB\u30B3\u30C9 \u30C6\u30B9\u30C8";

Unicode strings can be applied to Form field values and tool tips (field.userName property), button captions, all annotation text, document properties, and just about anywhere text is accessible to Acrobat JavaScript. The only real restriction with Unicode in Acrobat is that Acrobat has to have the correct font for the Unicode encoding. This problem can usually be solved by re-running the Acrobat Installer to add font libraries.

The UnicodeStringBuilder.pdf example file uses JavaScript to build Unicode strings and apply them to a variety of objects in the PDF.