These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Text strings

mh53
Registered: Sep 18 2007
Posts: 2

I am looking for effective ways of storing text strings from a pdf document in a database. Are there effective pdf text string readers, or is there a way to "un" pdf the document (into text strings)?

My Product Information:
Acrobat Pro 8.0999999999999996447286321199499070644378662109375, Windows
dthanna
ExpertTeam
Registered: Sep 28 2005
Posts: 248
To 'unpdf' the document, open in Acrobat, File | SaveAs. Change the file type (dropdown) to .TXT. Save. Done.

As for some products... there is one called Text Extraction Toolkit.

A word of warning - when ever possible work with the data generator to have the output generated in a digestable format for later use. Most composition engines can also give a line data version of the document, saving you this step. I have seen a lot of output where the resulting text was fragmented due to how the composition engine created the document in the first place. The text will appear in the order it was applied to the PDF page - not in the order you might normally read it.

Douglas Hanna is a member of the Production Print Technology team at Aon.
www.aonhewitt.com