Hi KathyS, With Acrobat 9 Professional, try - Document > OCR Text Recognition > Recognize Text Using OCR. As in earlier Acrobat applications you will some choices for the PDF Output Style. With Acrobat 9 Professional these are: Searchable Image | Searchable Image (Exact) | ClearScan Try ClearScan. [url]http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS2A3DD1FA-CFA5-4cf6-B993-159299574AB8.w.php[/url]
While the hidden OCR text from Searcable Image & Searchable Image (Exact) can edited (use the TouchUp Text Tool) these edits do not affect the image of the text that is present. Play with this and use the Examine Document feature to view the Hidden Text.
Good Morning, i am trying to OCR a pdf document all 23 pages. The issue is with the result, all the formatting that is returned is in such a bad state that its almost useless. What is the best way to do this scan to WORD with a better formatting?
Hi whatzzup, As you noted, OCR - *does not do "formatting/layout"* - Many of the higher end OCR vendors provide agumented features with their OCR application that provide a semblance of "format/layout" ...with that said, once in a word processor application you will still have more than trival post-processing activities required. Regardless, you may want to do a web search for these. If the vendor(s) offer a trial of the application you could install it and see if it satisfied you needs.
If the 23 page "image" is not loaded with equations, or other such complex content but is, rather, "straight" textual content then re-typing into Word would take a "fair" typist perhaps 2 hours transcribe. If comfortable with Word templates/headings/styles then a non-complex layout/format ought to not take more than 1 hour. Doing 'fix-up' in Word will take something greater than 3 hours.
Having made passage through the "fix the OCR'd content in MS Word" morass in days gone by I prefer the transciption route myself.
assuming you have MS office, save the PDF as TIFF; open in MS document imaging (MS office > MS office tools); use 'send text to word' tool. Alternative: OCR to Word on a TIFF is also supported in the 'Imaging' app that MS bundles (start> programs> accessories) with the OS installation. Acrobat is not designed to be a general OCR tool.
I personally am not a fan of office's built in OCR. I use cvision tech's [url=http://www.cvisiontech.com/products/general/maestro-recognition-server.php]PDF OCR software[/url] instead.
I've tried OCR-ing a PDF document to make it text searchable. However, certain words (for instance an uncommon name and numbers) can't be found when "text searching" even after having OCR-ed the document.
(on the Advanced Editing menu) to edit.
Be well...
Be well...