These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Detecting and Deleting blank pages in Acrobat X

JusticeBanker
Registered: Dec 31 2010
Posts: 2
Answered

We were recently upgraded to Acrobat X because we asked for the ability to detect and delete blank pages from MFP scanned documents. These documents are mostly duplex, but have many blank reverse pages. In Acrobat 9 we had to manually find and delete the blank page, but was promised the ability to do it with X (eliminating the need to buy a PDF cleaner app). We have nto been able to figure it out. Have we been had?

My Product Information:
Acrobat Pro 10.0, Windows
thomp
Expert
Registered: Feb 15 2006
Posts: 4411
Accepted Answer
Who told you this? Was it suppose to be a feature of the scan, or OCR? I'd suggest you go back to them and ask them what they meant.

However, you have always had a way (although not a good one) to detect blank pages on a scan. The question you have to ask is, "what's a blank page?" In the PDF format, every scanned page is a single raster image. To know anything about that page it has to be OCR'd. If the OCR does not find any text you could say that the page is blank. It would be simple enough to write a script to delete any page that does not contain any text. You could even restrict the check to a certain area of the page to exclude headers and footers.

This method is flawed because OCR will occasionally convert garbage into text, i.e., "is it a spec, or a period?" Also, it does not detect images. But it's better than nothing. The script could be written to present the user with each page it thinks is blank and ask if it should be deleted.

The whole thing could be written as an Action. See the seminar on Jan 19th.


Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

JusticeBanker
Registered: Dec 31 2010
Posts: 2
Thanks for your reply. Obviously, I had been misinformed. We have multi-function copier/printer/scanners that deliver scanned documents, duplexed with many blank pages, as PDF via email. What I was told is that the latest version of Acrobat could detect and delete a configurable level of text or images, i.e. if a page has less than a certain amount of 'stuff' it is blank and should be deleted. After a bit of research on my part it seems this was a bunch of hooey. I will look into creating an Action that will improve our process. Thanks.
CPA texas
Registered: Feb 6 2011
Posts: 1
I have not found that feature either but Fujisiu scanners now have scansnap software that will do this for you
user13
Registered: Aug 3 2011
Posts: 3
JusticeBanker - We are looking to do the same thing. We don't have programming/scripting skills but was wondering if you found a way to create an action for this. Thom, I've tried using the script below without success and was wondering if you could help? Thank you in advance.
try {
// save a copy of original document
var newName = this.path;
var filename = newName.replace(".pdf","_Original.pdf");
this.saveAs(filename);
for (var i = 0; i < this.numPages; i++)
{
numWords = this.getPageNumWords(i);
if (numWords == 0)
{
// this page has no text, delete it
this.deletePages(i,i);
}
}
}
catch(e)
{
app.alert(e);
}
thomp
Expert
Registered: Feb 15 2006
Posts: 4411
What do you mean, "without success"? did it delete all the pages, no pages, or just some wrong ones?

before a scanned document can be tested for content it has to be OCR'd. Were the PDFs OCR'd. OCR often interprets speckles as periods and commas, so "getPageNumWords()" will sometimes return a number when there isn't anything on the page. The blank page test should be expanded to look for actual letters on any page that returns a small number.

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

user13
Registered: Aug 3 2011
Posts: 3
Hi Thom - I meant that I didn't have success with the script running. I get the error UnsupportedValueError: Value is unsupported. ===> Parameter cPath. Not sure how to set the cPath up in the script. Sorry, I'm NOT a programmer and found the script online. Just trying to play around and figure out how to test it and see if we can even use it. I've tried using the script debugger following your tutorial (http://acrobatusers.com/tutorials/how-debug-your-script) but also get the script as being undefined. Thank you for your rapid response! I will read more and view more tutorials and see if I can figure it out.
thomp
Expert
Registered: Feb 15 2006
Posts: 4411
cPath is the input for the "saveAs" function. It doesn't like your "fileName" parameter. To test out the page delete part of the script you should delete the file save bit.

The delete loop should be run from the last page. It won't work as written because when it deletes a page the pages all shift down in number, causing the loop to skip the next page in line.

Just try running this code in the console window.

for(var i=this.numPages-1;i>=0;i--)
{
   if(this.getPageNumWords(i)==0)
      this.deletePages(i);
}

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

user13
Registered: Aug 3 2011
Posts: 3
You're the best! The script does exactly what we need! Thank you for all your help!