These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

How to Extract the LC Form Fields Values and Store them to Database.

tarekahf
Registered: May 1 2008
Posts: 91

We have designed a huge LiveCycle Form using Adobe LiveCycle 7.1.

The form will be submitted by the clients using the Standard Email Submit Button, and the PDF Data will be attached to an Email which is sent to a pre-defined Address.

When the Emails are received by the Person in charge, he will save the Data Files each one with a unique name on a local folder.

When finished receiving all the Data Files by Email, he will use Adobe Acrobat Utility from the menu "Advanced/Forms/Create sheet from Data Files", and this tool will allow the user to select all the saved Data Files and it will convert them to CSV File as you may already know.

The format of the Data Elements on the PDF Form is very complex as it has many repeating elements (1-to-may relationship) such as the Applicant Data and list of his Experience, for example:

some name
1/1/1970

Some Organization 1

Some Organization 2

Some Organization 3

When converted such XML Data from the PDF into CSV, the result looked very complex, as the Repeating Elements were converted to columns with index counter. This is making it nearly impossible to write a program to work with such CSV Format and convert it to a Database Structure in the 3rd Normal Form.

Objective:

Write a program or some script to fully automate the extraction of the XML Data from the PDF Forms Files and convert it and store it in a Database Structure in the 3rd Normal Form.

Remember that we will have many files stored in the a folder which has PDF Files with XML Data Embedded in them.

Please help.

Regards ...

Tarek.

My Product Information:
LiveCycle Designer, Windows
tarekahf
Registered: May 1 2008
Posts: 91
I am still looking for help.

One option I found is using OLE or Acrobat IAC from .NET, but I am new to this area, and the project is very urgent.

We will accept professional service as the User in Charge confirm to me that they can contract with 3rd party developers to support us in that regard.

Any one can help will be appreciated.

Following is a quick spec of the required .NET program :

1. Enumerate through all the PDF Files in a given folder (based on Adobe LiveCycle Form),

2.For Each PDF File:

2.1. Extract the XML Data from the form,

2.2. Convert the XML String to XML Document,

2.3. Extract the single fields and repeating group of fields and update the Database,

2.4. Generate Key Values.

2.5. Send email Notification to the sender of the form, to confirm receiving the form and start initial processing.

Regards ...

Tarek.
abell86
Registered: May 6 2009
Posts: 8
Did anything ever result from this problem? It sounds very similar to an issue I'm having with developing an automated timesheet system. My main problem now lies in the format of the exported csv file and I'd like to customize the output so it is compatible for importing into a database.
tarekahf
Registered: May 1 2008
Posts: 91
YES !

Fully working solution was implemented several months ago ! I used Acrobat SDK / IAC from VB.NET Windows Application.

If you like, I can post a sample of the most important parts of the code.

I will be back to work on Saturday, and just to remind me, you can post a reply by then.

Tarek.
Hillyman
Registered: Apr 21 2009
Posts: 15
I would love to hear how you resolved your problem. Please post for the rest of the community.

Hillyman
abell86
Registered: May 6 2009
Posts: 8
Yes a sample would be fantastic. It is essentially the same dilema I'm having with updating a database, just different information. I've got the major stuff with the form submission all pretty much taken care of, I just need to handle the data after all of the returned forms have been collected.
tarekahf
Registered: May 1 2008
Posts: 91
I will provide the sample soon, in the mean time, please keep in mind the following:

If you want to transfer the field values of LiveCycle forms to Database or CSV file or what ever, there are 2 main different methods:

1. On-Line using Web Based Submit: This method is used if the form size is relatively small, and the users of the form will always have on-line connection with the server. In this case, one form will be updated/submitted at a time. To use this method, you have to sue Server Side Technology like ASP.NET or JSP.

2. Off-line using Windows/Client Application: Use this method if the form size is relatively large, takes long time to complete, and the client may not have reliable always on Internet Connection. In this case, the user will fill the form off-line, and send back the results using email file attachments. On the host, use Client Application (such as VB.NET) integrated with Acrobat SDK/IAC or OLE to extract the field values from the PDF by looping over the ready files one at a time from a designated shared folder or local hard-disk.

I will post the sample code for the 2nd method. Please stay tuned ...!

Tarek.
tarekahf
Registered: May 1 2008
Posts: 91
Here is the quick guide to do that:

1. Get doc: Programming Acrobat JavaScript Using Visual Basic (VBJavaScript.pdf). Read it. It is very useful. Search using Google, you should find it.

2. Install Acrobat 7.0.5 or later. Install Acrobat SDK which has all the resources and the samples your may need. But be careful, you will get lost inside SDK.

3. Add a ref. to Interop.Acrobat.dll assembly (Acrobat Type Library) in your VB.NET Project.

4. Following is a listing of the code. Note that it uses background jobs, which is not related to Acrobat. So do not be confused.

        Dim path As StringDim out_path As StringDim oAcroAVDoc As Acrobat.AcroAVDocDim oAcroPDDoc As Acrobat.AcroPDDocDim gapp As New Acrobat.AcroApp Dim XMLdoc As New XmlDocumentDim xmlClass As New XMLTransfeer.XMLClassDim result As StringDim file_name As StringDim updated As StringDim jsObj As New Object Dim DS As New Prize.BusinessEntities.PrizeDS [Shared].ClearListBox(lbxSucceeded)[Shared].ClearTreeView(treeFailed)   Dim percantage As DoubleDim i As Int16Dim files As String()[Shared].SetControlPropertyValue(Label3, "Text", " 0% complteted")[Shared].SetControlPropertyValue(ProgressBar1, "Value", 0) ProgressBar1.Value = 0oAcroAVDoc = CreateObject("AcroExch.AVDoc") If Not Directory.Exists(Input_tb.Text) Or Not Directory.Exists(Output_tb.Text) Or Not Directory.Exists(Pros_text.Text) ThenMsgBox("Please Insert valid paths")Else'TryIf Input_tb.Text <> "" Thenpath = Input_tb.Text

If Output_tb.Text <> "" Thenout_path = Output_tb.Text

files = Directory.GetFiles(path, "*.pdf") [Shared].SetControlPropertyValue(ProgressBar1, "Maximum", files.Length)[Shared].SetControlPropertyValue(ProgressBar1, "Step", 1)[Shared].SetControlPropertyValue(ProgressBar1, "Visible", True)  Dim succeededCount As Integer = 0

Dim failedCount As Integer = 0

 For i = 0 To files.Length - 1

Dim tr As System.Data.SqlClient.SqlTransaction

'
Dim conn As New SqlClient.SqlConnection(My.Settings.PrizeConnectionString)'conn.Open()tr = Nothing

Try

oAcroAVDoc.Open(files(i).ToString, "XML Export") oAcroPDDoc = oAcroAVDoc.GetPDDoc()

jsObj = oAcroPDDoc.GetJSObject()

jsObj.disclosed = True

Dim theXML = jsObj.getXMLData(jsObj)

 XMLdoc.LoadXml(theXML)


updated = XMLdoc.GetElementsByTagName("Form_Id")(0).InnerText  result = xmlClass.Load(XMLdoc, tr, DS)

 file_name = result.Replace("/", "_")jsObj.getField("root[0].pageOne[0].Form_Id[0]").value = resultXMLdoc.GetElementsByTagName("Form_Id")(0).InnerText = resultDim fullOutPath As String = out_path + "\" + file_name + ".pdf"jsObj.Saveas(fullOutPath)


File.Copy(files(i).ToString, Pros_text.Text + files(i).ToString.Replace(path, ""), True)File.Delete(files(i).ToString)


  tr.Commit()

 [Shared].AddItemToListBox(lbxSucceeded, files(i))

 succeededCount += 1


[Shared].SetControlPropertyValue(lblSucceeded, "Text", String.Format("Total Succeeded File Count: ({0})", succeededCount.ToString))  Catch ex As System.Runtime.InteropServices.COMException

If tr IsNot Nothing Then

tr.Rollback()

End If

Dim FileNode As New TreeNode(files(i))

FileNode.ImageIndex = 0

FileNode.SelectedImageIndex = 0

AddExceptionToTreeNode(FileNode, New Exception("Either file schema is missing or file version is old", ex))[Shared].AddNodeToTreeView(treeFailed, FileNode)

failedCount += 1

[Shared].SetControlPropertyValue(lblFailed, "Text", String.Format("Total Failed File Count: ({0})", failedCount.ToString)) Catch ex As Exception

If tr IsNot Nothing Then

tr.Rollback()

End If

Dim FileNode As New TreeNode(files(i))

FileNode.ImageIndex = 0

FileNode.SelectedImageIndex = 0

AddExceptionToTreeNode(FileNode, ex)

[Shared].AddNodeToTreeView(treeFailed, FileNode)

failedCount += 1

[Shared].SetControlPropertyValue(lblFailed, "Text", String.Format("Total Failed File Count: ({0})", failedCount.ToString))'
oAcroAVDoc.Close(1)Finally[Shared].CallControlMethod(ProgressBar1, "PerformStep", Nothing)percantage = ((i + 1) / files.Length) * 100[Shared].SetControlPropertyValue(Label3, "Text", Math.Truncate(percantage).ToString + "% completed") oAcroAVDoc.Close(1)'conn.Close()End Try

 Next

 Else

MsgBox("Please select the output folder that contain the Saved Application Forms")End If

 Else

MsgBox("Please select the input folder that contain the Application Forms")End If

End If

5. The command: "jsObj.getXMLData(jsObj)" uses Folder Level Script of Acrobat. To activate it, create a "MyScript.js" file, and place it inside Acorbat\javascrip folder on the Client PC running the VB.NET Project. It must have the following function:

function getXMLData(theDoc) {//return xfa.data.saveXML();//app.alert("Hello.");return theDoc.xfa.data.saveXML();}

6. Remember, every time you change the .js file in 5 above, make sure to Kill Acrobat.EXE before you run the VB.NET Project. This Folder Level Script file, you can use it to do some advanced Acrobat Scripting which cannot be done easily inside VB.NET, yet, you can call such functions from VB.NET. But, I noticed that some js features are buggy, so be coareful. Meaning, debug your work carefully.


7. The command "result = xmlClass.Load(XMLdoc, tr, DS)" is doing all the Database Access/Update, which uses XML Doc the Source.

8. Once Again, this is using Windows Client VB.NET, and as per the license, you are not supposed to use this method from ASP.NET. I have not tried it, so if some one tried to use it from ASP.NET and it worked (for fun), please let me know.

I hope this will be of help to you all....

Tarek.