These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Searching OCR - ClearScan

cityofcda
Registered: Jan 11 2008
Posts: 10
Answered

I recently changed my OCR PDF Output Style from Searchable Image to ClearScan, once I learned that the file is clearer and smaller. However, I just realized when using the Windows Search on a folder, it doesn't recognize files OCR'd using ClearScan. I've created test folders using both types and it continues to only find what I'm searching for in the OCR - Searchable Image document, not the OCR ClearScan document. What do you make of this (if you can understand what I've explained :-).

My Product Information:
Acrobat Pro 9.3.1, Windows
DuffJohnson
Expert
Registered: May 30 2006
Posts: 96
cityofcda wrote:
I recently changed my OCR PDF Output Style from Searchable Image to ClearScan, once I learned that the file is clearer and smaller. However, I just realized when using the Windows Search on a folder, it doesn't recognize files OCR'd using ClearScan. I've created test folders using both types and it continues to only find what I'm searching for in the OCR - Searchable Image document, not the OCR ClearScan document. What do you make of this (if you can understand what I've explained :-).
Please post an sample problem file.

Duff Johnson
w - http://www.duff-johnson.com
t - http://www.twitter.com/duffjohnson

cityofcda
Registered: Jan 11 2008
Posts: 10
Thank you for your response duffjohnson. Since I wasn't sure how to post the file, I sent you an email.
rbogie
Registered: Apr 28 2008
Posts: 432
there seems to be no compelling reason to use 'clear scan' as opposed to 'searchable image' or 'searchable image exact'. the main distinction is that 'searchable image' creates a text layer in which the font is invisible. that is, if you delete the image, the page will show no image and no text (the font is devoid of color). with clearscan, on the other hand, if you delete the image, you are left with visible text (the font is black or gray). suggest you experiment with 'content' (on navigation panel) and draw your indendent conclusions about the utililty of clearscan vs. 'searchable image'.
DuffJohnson
Expert
Registered: May 30 2006
Posts: 96
Ok, so I couldn't replicate your problem, cityofcda. Text-search worked fine for me.

rbogie, fundamentally, ClearScan can produce very-high quality results on text compared to searchable image - and on many documents, can do so with a substantial reduction in file-size. ClearScan also enables a variety of advanced capabilities such as tagging and reflow.

Duff Johnson

Duff Johnson
w - http://www.duff-johnson.com
t - http://www.twitter.com/duffjohnson

cityofcda
Registered: Jan 11 2008
Posts: 10
My apologies, duffjohnson. I don't know what was going on last week that searching ClearScan documents was not working. I tried it again this morning, and now it seems to be working...go figure! I do appreciate your time.
rbogie
Registered: Apr 28 2008
Posts: 432
points of information: clearscan produces a smaller file size because it strips away the byte-laden bitmap (or portions if it). clearscan is 'sharper' because it generates text with black fill, which may or may not be backstopped by bitmap or fragments of bitmap. (the algorithm figures out which portions, if any, of bitmap to discard and which to retain.) in contrast, 'searchable image' leaves the bitmap intact and generates invisible text (i.e., text without fill), leaving the OCR'd document with the appearance of the scanned source paper. depending on the quality of the bitmap (the image source material), clearscan can come close to producing a PDF that has the appearance of one generated from an authoring word processor. but clearscan is as susceptible to delivering OCR flaws as is any other OCR engine.