These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Security via Embedding Unique Fonts with different Encoding

Orcun
Registered: Nov 10 2008
Posts: 4

Hello,

I have a PDF file created by Ghost Script which has some unique fonts looks like created only for that document. They have their own encoding. As a result; I can view and print the document but when I select text Ctrl+C and paste to some other text application (name it Ms Word or Notepad; I tried all) it gives me a garbled text. Not a text actually but some signs, squares etc.

There are many tools that can remove the Copy/Print protection via the standard PDF creation. But this protection is awesome. There's no way to copy/paste unless you convert it to TIFF and remake a PDF and apply OCR which is a very long way.

Is there anyone know how to reproduce this ?

Here are some screenshots from the file (From Ctrl+D). Im pretty sure they will explain better.

http://img371.imageshack.us/my.php?image=pdf1qb4.jpg

http://img371.imageshack.us/my.php?image=pdf2fo1.jpg

http://img129.imageshack.us/my.php?image=pdf3bg4.jpg

This is how it seems when I copy and paste to notepad.
http://img371.imageshack.us/my.php?image=pdf4fo5.jpg

Thanks in Advance.

My Product Information:
Acrobat Standard 9.0, Windows
lkassuba
ExpertTeam
Registered: Jun 28 2007
Posts: 3636
Can you provide a link to a sample PDF file?

Lori Kassuba is an AUC Expert and Community Manager for AcrobatUsers.com.

Orcun
Registered: Nov 10 2008
Posts: 4
URL http://orcun.baslak.com/KORYO.pdf

Its 5.5Mb (38 pages with images)
Orcun
Registered: Nov 10 2008
Posts: 4
Any updates about this ?
lkassuba
ExpertTeam
Registered: Jun 28 2007
Posts: 3636
It appears as though these fonts are bitmap images -- thus Acrobat doesn't process the image as a font. These means that they're not searchable nor can you use cut/paste. If you have the fonts on your system, you can embed them with Acrobat 9 Pro. otherwise you'll need to go through the steps you've mentioned above.

Lori Kassuba is an AUC Expert and Community Manager for AcrobatUsers.com.

UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
No, they're not bitmaps - it's just a font family with non-unicode glyph mapping. We've used the same concept for years.

Acrobat handles embedded fonts in a unique way, if the typeface is embedded in the PDF but NOT installed on the reader's system. When rendering the glyphs to screen, it of course uses the embedded font's encoding table; but when copy-n-pasting text it sends the data as plain text, assuming that wherever it's pasted, the encoding is correct.

With some of the fonts in the example PDF above, all that's happened is the numeric positions of each glyph are shifted about by a non-standard lookup table. What appears on the screen as "Apple" may human-read as "Apple", but inside the font it's actually a different character string (say "HG5DRF"). When Acrobat cut-n-pastes the characters, it copies the actual text - HG5DRF - as that's what the string is actually saying in the Postscript. The same will happen if the file is opened in something like Illustrator.

If you have the non-standard fonts installed on your system then you can of course copy the text freely, and it'll appear 'correct' as soon as you change it to the same typeface in your other application. You can even type with them, as normal. If you don't have the font, you get garbage - but 'meaningful' garbage as it's a 1-to-1 lookup. It's like pasting something from Arial into Wingdings and wondering why you suddenly get a smiley face.

OCR-ing the PDF will pull out the human-readable text, and there are several software packages that'll either extract the embedded font from the PDF, or re-encode the lookup table.

The disadvantages are that the PDF isn't searchable, it is likely to print incorrectly (many printers cough when sent non-standard soft fonts), screen-readers won't read it and Google won't index it. It's 90% 'secure' against a casual n00b hacker, but the data is still all there if someone wants to spend 5 minutes extracting it.
Orcun
Registered: Nov 10 2008
Posts: 4
It appears that the font used in that PDF is only created for it. There are 2 solutions to it. Either you have the font or OCR the PDF.

But I wanna remake a PDF like this from other documents. Is there a way I can reproduce this kind of PDF via some software or Acrobat ?

Uvsar btw; If you dont put password to read a PDF file (ex. you only password protect for printing and copy-pasting) PDF file is not secure either. It's only 10 mins of googling. Imho; this unique encoding is a taugher way for a normal PC user since it requires a lot of work to do (OCR) if you dont have te correct font installed but a Printing-Disallowed PDF can be cracked with a <1Mbyte software under 5seconds.Anyway; let's come back to main question. How can I reproduce this ?
plevy
Expert
Registered: Jul 8 2008
Posts: 80
There are tables in the font that map the glyph indices stored in the PDF page content to the individual font glyphs that will be displayed or printed. There can also be unicode mapping tables in the font that map these indices to unicode characters. Usually, the glyph indices use the basic characters for roman fonts so the index for A is represented by an ASCII A character code. If that convention is followed in the font, copy and paste tend to work correctly even in absence of the unicode tables.

So in the case of the ghost script docs, the fonts probably don't have unicode tables nor do they use a standard encoding for mapping the glyph indices to characters. The result is the garbage you get during copy/paste.

I am not aware of any products that will take a PDF and mangle the page contents and corresponding font tables to give you protection in this way, but the transformation is fairly straightforward. Perhaps someone else knows of such a font tool.