These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

How to Extract Pages Containing Specific Text with Javascript

RaptorF22AAA
Registered: Sep 30 2010
Posts: 3
Answered

My company runs reports providing monthly performance analyses. These reports are quite lengthy (range from 200 to 400 pages or thereabout). There are specific tables in the reports with unique headings that I would like to extract and compile in Excel. Acrobat X does a beautiful job with converting these pages into a very usable Excel format.
 
I would like a batch process that would browse through these reports (about 60 per month), extract the specific-text-containing pages and save them in a separate folder as Excel files.
 
I was able to do this process easily in batch for reports that had the pages I wanted located in a specific page number for all reports using the script below after pointing to the folder in "Start With" and after running the script, asking that the file be saved in an Excel format:
 
/*This line deletes from FIRST PAGE to numbered +1 page*/
this.deletePages({nEnd:1});
/*This line deletes from SECOND PAGE (of the remaining after the above script) to the last page*/
this.deletePages(1, this.numPages-1 );
 
So, my question is, instead of this script, is there a script that I can use to extract a page that contains specific text or alternatively, delete all pages of the document except for the page that contains specific text?
 
Your articles have been very insightful and are greatly appreciated.
 
Looking forward to hearing from you.
 
With thanks,
Charles.

MCA

My Product Information:
Acrobat Pro Extended 10.0, Windows
gkaiseril
Expert
Registered: Feb 23 2006
Posts: 4307
I am not aware of an existing script, but you could write one.

The Annotation object has a 'page' property for an annotation, you can create an array with an element for each page and the contents of each element would be Werther a page has a comment annotation or not. From there you can get a list of all pages without annotations and delete them.

George Kaiser

RaptorF22AAA
Registered: Sep 30 2010
Posts: 3
I need help with this.

MCA

thomp
Expert
Registered: Feb 15 2006
Posts: 4411
Accepted Answer
I've actually written a search and extract automation tool, which members can download at www.pdfscripting.com.

But there is a solution you can find right here at this website, with a little work. Look on the Actions Exchange. You'll find two Actions that provide what you need. One is called "Extract Commented Pages". This extracts all pages from a PDF that contain comments. You could further specialize the script to extract only pages that contain certain kinds of annotations.

That takes care extracting pages, now you need to place annots on those pages. The Actions list contains another one named "Find and Highlight Words and Phrase". The first bit of this Action is a "Redaction Find" action that searches the PDF for words or phrases, and then adds a "Redact" annot to the word. This is the bit you need to add to the "Extract" action, so that it will find the pages that need to be extracted. You'll also need a last step that deletes all the "Redact" annots.

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

try67
Expert
Registered: Oct 30 2008
Posts: 2398
I have developed a variety of tools to achieve just that. If you're interested, contact me personally.

- AcrobatUsers Community Expert - Contact me personally at try6767 [at] gmail [dot] com
Check out my custom-made scripts website: http://try67.blogspot.com

RaptorF22AAA
Registered: Sep 30 2010
Posts: 3
Thanks a lot Mr. Thompson.

MCA