These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Usage of fonts with non-unicode glyph mapping in PDF documents as copy protection

indigo-water
Registered: May 3 2011
Posts: 2

Hi,
 
is there a safe way to prevent that text can be copied from a PDF document? The protection with a password can be removed very quickly with appropriate software.
 
In this forum the user UVSAR has proposed to use special non-unicode fonts:
 
“It’s just a font family with non-unicode glyph mapping. We've used the same concept for years.
 
Acrobat handles embedded fonts in a unique way, if the typeface is embedded in the PDF but NOT installed on the reader's system. When rendering the glyphs to screen, it of course uses the embedded font's encoding table; but when copy-n-pasting text it sends the data as plain text, assuming that wherever it's pasted, the encoding is correct. …
 
The disadvantages are that the PDF isn't searchable, it is likely to print incorrectly (many printers cough when sent non-standard soft fonts), screen-readers won't read it and Google won't index it. It's 90% 'secure' against a casual n00b hacker, but the data is still all there if someone wants to spend 5 minutes extracting it.”
 
(http://acrobatusers.com/forum/security/security-embedding-unique-fonts-different-encoding)
  
I have tried hard to find such non-unicode fonts. But I could not discover these fonts.
 
Can anyone give a hint on where I can find such non-unicode fonts? The fonts should look similar to Arial.
 
Or is there another way? Is it even possible to change an existing true type Arial font accordingly? What software is needed for that and what steps are required?
 
Is it possible to change the font code page accordingly with a java script program in an existing PDF document?
 
Many thanks in advance.

My Product Information:
Acrobat Pro 9.0, Windows
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
It's not a special type of font, it's a special type of PDF - the Unicode mapping tables within the PDF tell Acrobat how to link the glyphs with the "real" character codes, and if they're missing or corrupted, the result is the inability to copy/paste the text in a readable format. Sometimes the table is lost or damaged during the PDF creation process, hence the posts to our forums asking why a particular PDF can't be copied.

Acrobat itself cannot intentionally destroy the Unicode maps, it has to be done using low-level tools.
indigo-water
Registered: May 3 2011
Posts: 2
Hi UVSAR,

merci for the quick reply.

Could you be so kind to tell me which low-level tool is needed to modify the Unicode maps in an existing PDF document and how to apply it?

Thank you.

UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
Within a PDF file, fonts stored using Identity-H each have a /ToUnicode dictionary object, within which is an ASCII string array of hex numbers called a CMap table, with the glyph code in the left column and the Unicode character offset in the right column. Acrobat and Reader uses this table to build the strings when you copy or export text from the file, but if the values are jumbled up (or replaced with the same value in every row), the PDF still renders perfectly and will pass all preflight checks but all the text in that font is uncopyable.

To edit the CMap table you need one of the tools for working with PDF Cos objects, such as Windjack's PDF Can Opener, NIX PDF Surgeon or PDFTron CosEdit. There isn't a plugin available to the public specifically to obfuscate PDF fonts, and Cos editors are relatively expensive if that's all you want to do, but they are powerful general-purpose tools for people who work with PDF files professionally as they can fix many problems that Acrobat can't.


It is possible for someone to manually reconstruct the CMap by exporting the font table (using Preflight > Options > Create Inventory), working out by eye what each glyph should map to, then going into the Cos object and re-editing the CMap table, but of course you need the tools to do it - so for 99% of users, the document is uncopyable - it's far harder to fix CMaps than it is to remove a permissions password and it cannot be automated, as the CMap table is the only place where the meaning of each character shape is stored. Even if you have the same font installed on your computer, they cannot be compared.More important than the security aspect of preventing someone from grabbing your work, obfuscating the CMap tables prevents search engines from indexing the file - as they see only the "junk" Unicode. You may want to do that, but if you don't, it's an important thing to remember!

We're planning an in-depth tutorial on this in the next couple of weeks, I'll update this thread with the link when it's posted.