[b]Scenario:[/b]
I have multiple clients who scan locally and then import the (image or PDF) file into our web-based software. Clients have Spicer Image Ax 7.5 installed used to import and rasterize the documents before the software uploads it to our server.
To this point, we haven't had many issues with importing PDF files.
A recent issue I initially thought was due to the scanned PDF containing JBIG2 image compression, which Spicer 7.5 can not handle. Spicer would load the PDF, but all the pages would be blank. If I open the document in Acrobat, add a command button (or similar visual change) and then save, Spicer will load the PDF and show a blank page with a visible command button. Which makes me think it's something to do with the image itself.
A recent wrench in the JBIG2 theory - I was sent more PDFs that had failed, over half of which had CCITT (FAX) image compression, which has always been able to import in the past.
I was sent various files with various Adobe versions and Creators (all will show up blank upon import):
Adobe v1.3 - from PaperPort (CCITT)
Adobe v1.4 - from PaperPort (CCITT)
Adobe v1.5 - from PaperPort (JBIG2)
Adobe v1.6 - from PaperPort (JBIG2)
Adobe v1.7 - from PaperPort (JBIG2)
Adobe v1.3 - Scanned through Acrobat 9 (CCITT)
Adobe v1.4 - Scanned through Acrobat 9 (CCITT)
Now - I can open any of these in Adobe Reader or Acrobat Pro - and they display just fine. [b]If I use the "Optimize Scanned PDF" feature - the resulting file imports and displays perfectly in our software. Even when I disable all of the extra options on the "Optimize Scanned PDF" window (like despeckle, deskew, etc) - it works just fine.[/b] My attempts with other features, like "PDF Optimizer" and the "Pre-Flight" fixes hasn't seemed to have an affect on the blank page issue.
Yesterday I tested the new version of Spicer (8.3), and all of the original failed documents the client sent me can import correctly with the new version. The Spicer release notes for versions after 7.5 don't list any bug fixes, but I'm contacting them as well. Our next software release is months away - and I have to imagine that there is some kind of setting during the scan process that can be changed to resolve the issue. We have numerous other clients (and developers) who import scanned PDF files without issue every day.
------------------------
[b]The Question:[/b]
I need to determine what changes "Optimize Scanned PDF" feature makes. I'm hoping that will help me determine which setting needs to be altered during the scan process. I've been trying to compare the original document to one that has been through the "Optimize Scanned PDF" process. The Acrobat "Compare Document" feature seems useless in this case. I've looked at side-by-side Pre-flights and there are some differences:
Property - Original | Optimize Scanned PDF (with deskew, descreen, etc. turned off)
X position on Page - 0 | 0
Y position on Page - 0 | 7.863998
Image Width on Page - 610.000000 | 608.015991
Image Length on Page - 792.000000 | 783.919983
Horizonal Image Resolution - 300.275391 ppi | 301.255249
Vertical Image Resolution - 300.000000 ppi | 301.255249
Width - 2544 | 2544
Height - 3300 | 3280
Bits per color component - 1 | 1
Treated as a mask - False | False
Compression/encoding - CCITTFaxDecode | CCITTFaxDecode
Plates - 1 | 1
Layers - None | None
Other than the resolution differences - nothing jumps out at me as majorly different.
------------------------
Any help would be appreciated. I've been searching various forums for the last two days - with not much luck. I've previously told the client to scan to TIF instead of PDF before uploading, but they didn't seem to like that answer.
-James
I will also see if the client can scan a dummy document so I can post a link to it without violating any security.
So I've been doing a bit more research, and stumbled upon "Advanced PDF Tools" software (eval) on verypdf.com.
I open the PDF in question in that program and proceed to try different options. I narrow it down to just one change I make that causes the file to work correctly in Spicer 7.5.
If I change the Monochrome image compression from "Original" to "CCITT G4" and save, the document works perfectly. The odd part is (according to Acrobat's pre-flight) - [b]the image compression already is CCITT in the original document![/b]
When I compare the pre-flights of the original to the newly-working PDF, I can't find a single difference! I opened them up in Textpad to compare - their structure is definitely different, almost all of the (legible) values are teh same, though and I noticed the "length" difference:
Original - /Filter/CCITTFaxDecode/Height 1888/Length [b]15270[/b]/Subtype/Image/Width 1600>>stream
Working - /Filter/CCITTFaxDecode/Height 1888/Length [b]14382[/b]/Subtype/Image/Width 1600>>streamNo clue if that means anything. My best guess is there is some underlying JBIG2 or some other type of compression hidden within the PDF? The PDF is just a single page with a single scanned image on it - and that scanned image (according to pre-flight) is CCITT.
I'm at a loss.