A very interesting problem with metadata...

2011-09-05 07:35:13

xcalibur101

Registered: Sep 4 2011

Posts: 2

Hello, this issue has us contemplating leaving the IT environment to open a food joint. Hope you guys can help.

Background:
We have hundreds of thousands of PDF files created over a 12 year span; all of these files contain no metadata whatsoever and are of the historical kind so OCR is nearly non-existent. The files are named using naming conventions given by historians, so each “collection” has its own naming structure completely different from the others.

The Task:
To automatically (thru batch, script or third party software), utilize the existing naming convention of each individual collection and files and populate its own basic metadata fields.

The problem(s):
I could create a script that transposes the directory structure into a CVS file, from there, not sure if I can parse it on to an XML file or if is even possible to make an XMP or FDF file. And assuming that it can be done, how do you make a batch that reads from the file containing the directory structure and incorporates it into the PDF file itself.

Examples
Collection 1: YYYYMMDD-Pub_Type-Pub_Number
Collection 2: Pub_Number- Pub_Type-Author-Desc- YYYYMMDD
From both examples the data can be manually entered into the metadata fields, but since each file is different, it will take forever and a day to accomplish that.

We also contemplated mass murder/suicide but figure it was better to ask for ideas/help… :-)

My Product Information:
Acrobat Pro Extended 10.1, Windows

2011-09-05 10:02:35

UVSAR

Registered: Oct 29 2008

Posts: 1357

Grab the documentFileName property of each doc as you open it in batch, and write the parsed string to the doc.info.Title, doc.info.Subject, etc as required. How your script recognizes the pattern is a different matter, but by the look of your examples a simple regexp on the hyphens will pull out the fields.

You can't write directly to the XMP block with JavaScript, only via Preflight or with plugins.

2011-09-06 13:49:59

xcalibur101

Registered: Sep 4 2011

Posts: 2

Thank for your reply. Your solution did provide the necessary clues for a more targeted solution; in the end, I end it up writing an object Pascal script that seems to do the job.

Albeit not as elegant solution as the one you proposed but it did the job.

Once again thank you for your time and feedback.

2011-09-06 16:10:12

thomp

Registered: Feb 15 2006

Posts: 4411

Actually you can write directly to the XMP metadata. There is an example in the Acrobat JavaScript Reference. Look up the "doc.metadata" property, Example #3

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

A very interesting problem with metadata...