Hi all,
I saw that there's a new IT forum and thought I would use this new place to share with you my experience regarding Java packages for the creation and manipulation of PDF files.
I work as a Java developer of web-based applications, and in addition I dabble in creating custom-made tools for Acrobat and Reader.
One of my goals in doing this research was to break out of the (many) limitations of JavaScript in Acrobat and create independent tools that will allow better manipulation of PDF files, including on-the-fly creation of PDF files in a web application and a dynamic highlighter.
I had a look at several packages, and used the following three:
Apache PDFBox (http://pdfbox.apache.org):
This is the one I used the most so far. It is very extensive and well-documented with plenty of sample applications.
I couldn't find an easy way to use it for a dynamic highlighter that I was trying to create, though.
It supports Adobe's XML highlighting feature, but that is very limited. As far as I could see, there's no way to get the exact location of a word in order to create an annotation above it.
Another downside is that as far as I could see there aren't any forums or community support for this package, and I'm not sure it's still being developed and updated.
However, so far this has been my package of choice, for ease of use and power.
For example, I used it to create a batch processor application which scans a folder (and all sub-folders) for PDF files and then changes links to point to new locations, or creates a list of all link locations.
iText (http://itextpdf.com):
Very easy to use when creating a new PDF. It offers a handy structure of paragraphs and lines for content. Much easier than PDFBox's relative content positioning system.
Documentation is not bad, but they keep reffering to their book and trying to sell it. That's fair enough, I guess, but what isn't so fair is that they recently changed their license agreement to "GNU Affero General Public License", which states:
"You can be released from the requirements of the license by purchasing a commercial license. Buying such a license is mandatory as soon as you develop commercial activities involving the iText software without disclosing the source code of your own applications. These activities include: offering paid services to customers as an ASP, serving PDFs on the fly in a web application, shipping iText with a closed source product."
So basically, if you use this package to create on-the-fly documents on the web (which I planned on doing), even if it's not a commercial service, you need to buy a commercial license from them!
Also, the Producer metadata info in any PDF that is created or manipulated with iText must include their name.
So despite its ease of use and other advantages, I've decided against using this package for fear of legal implications.
PDF Clown Project (http://www.stefanochizzolini.it/en/projects/clown/):
This is an interesting one-man project, which shows a lot of potential.
It has some good tools and samples, and an active users community (although the SourceForge forums are not working at the moment).
However, the latest version (0.0.7 Alpha) is more than a year old and requires an update, which the creator (Stefano Chizzolini) promises will arrive soon.
I will continue following this project to see what new features the next versions will include.
If anyone had any experience with these (or other) packages and would like to share it with me and the other readers, I'd be happy to hear it!
First, the tried and true, Adobe's own PDF library available at www.datalogics.com (watch that trailing 's', unless you want to go buy a barcode scanner). If your product needs to have tried, true and pure Adobe PDF this is the library for you!
The second is PDFlib (www.pdflib.com), another excellent library. One of the beauties of PDFlib is that if your code already generates PostScript you can snap the PDFlib library in with out much effort since the PostScript operator calls are mirrored in the PDFlib function calls. It is fast, compact, supports many platforms - including IBM z/OS - yes, PDF creation out of COBOL in MVS! and has a zillion language bindings.
Thanks for the post!
-Doug
Douglas Hanna is a member of the Production Print Technology team at Aon.
www.aonhewitt.com