Ok well what I am trying to do is use VBA code to save an Acrobat Version 7.0 PDF into a text file so it can be pulled into a database using monarch. The actual saving of the file as text works, and here is the code I have used to accomplish that:
Dim AcroXApp As Object
Dim AcroXAVDoc As Object
Dim AcroXPDDoc As Object
Set AcroXApp = CreateObject("AcroExch.App")
AcroXApp.Hide
Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
AcroXAVDoc.Open PDF_PATH & filename, "Acrobat"
AcroXAVDoc.BringToFront
Set AcroXPDDoc = AcroXAVDoc.GetPDDoc
Dim jsObj As Object
Set jsObj = AcroXPDDoc.GetJSObject
jsObj.SaveAs OUTPUT_PATH & OutputFile, "com.adobe.acrobat.plain-text"
AcroXAVDoc.Close False
AcroXApp.Hide
AcroXApp.Exit
Now as I said the code itself runs perfectly and saves the pdf as a plain text file. The problem I am having is that the original pdf is in neat column format, but the resulting text file loses all the whitespace and creates only a single space between column fields. This is insufficient for my needs as some of the fields have spaces in them, so i cannot use a space-delimited method to import them.
My current work-around is that rather than save the file as text, i save it as a html version 3.20 file by changing "com.adobe.acrobat.plain-text" to "com.adobe.acrobat.html-3-20 "and i use monarch to anaylze that file. This works mostly, but it has had many bugs to work out because of inconsistencies in the way acrobat saves the file as html.
The application i've built does work for the most part, but i am concerned that if it receives some odd data that it will break because as it is built it is somewhat volatile. I need 7 monarch models as it is just to gather all of the data correctly.
I was wondering if there was any way to save the pdf file as a text file but still retain the column format and whitespace from the original pdf. This would alleviate most all of my problems. Also I use Acrobat version 6.0 standard, Monarch Pro 6.00 and MS Access 2000. I appreciate any help anyone could give me.
There is hope. You have a couple of options. First, there are several 3rd party tools for converting PDF into other formats. I'm sure that you could find one that extracts table data.
Second, as you've already figured out, Acrobat will save to other formats, perhaps "Rich Text", or the Word format would be better suited to your purpose?
Third, and this is the fun one. You could write an Acrobat automation script with JavaScript to extract the row and colum values and then pass this information back to your VBA script for writing directly into the DB. This technique is the most direct, and contained, of all the strategies, since data flows directly from the PDF into your Acess DB.
Thom Parker
The source for PDF Scripting Info
[url=http://www.pdfScripting.com]pdfscripting.com[/url]
The Acrobat JavaScript Reference, Use it Early and Often
[url=http://www.adobe.com/devnet/acrobat/]http://www.adobe.com/devnet/acrobat/[/url]
Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script