These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Cut and Paste Becomes Jibberish

se108
Registered: Mar 4 2010
Posts: 8

hi,

i am trying to compile my visa statements (which are in pdf format) into an excel spreadsheet and when i copy the table from the pdf and paste it into excel (or any other program for that matter) the text comes out as total jibberish. Have been wracking my brain all day trying to figure this out, please help!

Thanks

My Product Information:
Acrobat Standard 9.2, Windows
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
It's probably caused by non-standard font encoding in the original PDFs - depending on how they were created, the mapping between the visible shapes of each letter and the "real" character in the code can be lost, so Acrobat displays something you can read but inside the data it's mixed up. You can tell if this is the case as the "gibberish" will be a 1:1 substitution (so if "q" becomes "h" when you paste, every q will be an h). There's nothing you can do to fix it from within the PDF I'm afraid - it's sometimes intentionally done by the PDF producer so files can't be exported or scanned by search engines.
se108
Registered: Mar 4 2010
Posts: 8
thanks ... yep. i tested it and it is a 1:1 thing like you say, so there is no workaround for this?
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
Not easily, no - as far as Acrobat is concerned it just reads the character codes from the PDF stream (the gibberish) - the fact it reads correctly on screen is irrelevant, so re-embedding the font or copy-pasting the contents will just give the same thing again.

The "solution" is to find out what's wrong with the application that made the PDF in the first place, and fix it so the font encoding tables are embedded correctly (assuming whoever's making the PDFs actually wants to, and you can contact them to ask).
se108
Registered: Mar 4 2010
Posts: 8
ouch, so there's no workaround such as printing the pdf to a file and then opening that file with adobe distiller or something like that? i've been googling this issue and have seen some noise about that possibility but I honestly can't say I understand it ... i need to get these pdf's into an excel format or it will cost me major hours of tedious manual copying, argh
donaldin007
Registered: Mar 8 2010
Posts: 1
I am facing problems in cutting and pasting from PDF to any application, It becomes so hectic and work gets suffered. We have to waste our time converting the application to word and delete the non required items from it, Can u please suggest me the easy way to come out of this All excercise??
se108
Registered: Mar 4 2010
Posts: 8
you say "not easily no" ... i would settle for the difficult solution then because my ccard company cannot send me the format i need and this is holding up my whole taxes process, what is the "not easily" workaround??? please?????
UVSAR
Expert
Registered: Oct 29 2008
Posts: 1357
One option if you can't get hold of a "correct" version of the PDF is to run the faulty one through Acrobat's OCR utility, which should be able to turn the visible letters into their correct characters - however you'll lose the original flow of text, so the ability to easily copy/paste non-linear things like tables will suffer.

OCR won't work on a page which has renderable text, so you need to re-distill by printing the PDF to another PDF, and in the print options dialog, under Advanced, choose "print as image" and the highest resolution. Process the resulting bitmap page via the Document..OCR menu. You can also use Photoshop to open the PDF, rasterize it and re-save as PDF (Photoshop's resolutions are better than the print-as-image ones, so it helps if the text is really small).

At low level it would be possible to process the PDF data through a hex editor, to re-insert the font mapping data. However there's currently no commercial software that'll do it automatically, so it's a long and complicated job to manually deduce the corrections for each character. Unless you're copying a novel, it's probably quicker to type the entire thing in again. There's a reason why some people use this broken-mapping effect to secure their documents!
se108
Registered: Mar 4 2010
Posts: 8
Thanks alot UVSAR, I actually dowloaded a trial version of ABBYY FineReader 9.0 Professional Edition and got the job done, whew ...