These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Poor conversions from PDF to DOC

ncralex
Registered: Sep 23 2010
Posts: 9

I scan documents into Acrobat using a printer/copier with no problems. When I try to export the documents into Word, I get funny characters, jumbled text and text boxes all over the place.
 
What is it that $86 OmniPage 17 can do this flawlessly and $1000 Acrobat Pro Extended cannot? What settings need to be set? With Adobe, the fonts are off as is the justification!!!
 

My Product Information:
Acrobat Pro Extended 9.0, Windows
lkassuba
ExpertTeam
Registered: Jun 28 2007
Posts: 3636
What settings are you using when you convert your scan to PDF?

Lori Kassuba is an AUC Expert and Community Manager for AcrobatUsers.com.

ncralex
Registered: Sep 23 2010
Posts: 9
lkassuba wrote:
What settings are you using when you convert your scan to PDF?
It's coming directly from a BizHub C352:

Scan to: PDF
Res: 200 dpi
Side: 1-sided
Scan As: Text/Photo*
Size: Auto
Color: Auto

*This thing won't accept just "text" and I suspect this has something to do with the issue at hand.
lkassuba
ExpertTeam
Registered: Jun 28 2007
Posts: 3636
Under the Configure Presets Dialog, what do you have selected under the Make Searchable Options button?

Lori Kassuba is an AUC Expert and Community Manager for AcrobatUsers.com.

ncralex
Registered: Sep 23 2010
Posts: 9
lkassuba wrote:
Under the Configure Presets Dialog, what do you have selected under the Make Searchable Options button?
Primary OCR: English
PDF Out: Searchable Image
Downsample: Lowest (600)
lkassuba
ExpertTeam
Registered: Jun 28 2007
Posts: 3636
Try using the Clearscan option instead.

Lori Kassuba is an AUC Expert and Community Manager for AcrobatUsers.com.

ncralex
Registered: Sep 23 2010
Posts: 9
lkassuba wrote:
Try using the Clearscan option instead.
I still get boxed and jumbled-up text in Word when I convert the PDFs to Word.
lkassuba
ExpertTeam
Registered: Jun 28 2007
Posts: 3636
In the Save As menu when you export to Word, try changing to the Retain Page Layout option.

Lori Kassuba is an AUC Expert and Community Manager for AcrobatUsers.com.

ncralex
Registered: Sep 23 2010
Posts: 9
lkassuba wrote:
In the Save As menu when you export to Word, try changing to the Retain Page Layout option.
Already done. Does Word 2010 have anything to do with the problem?
rbogie
Registered: Apr 28 2008
Posts: 432
the convertion of a scanned PDF to DOC (MS Word) or other word processor format is a perennial problem. there is not a perfectly satisfactory solution. the best results are obtained by scanning to TIFF with MS Office Document Imaging [MODI is an "office tool" that comes with MS office pro]. Scan at 300 dpi (400 dpi if the source comes with tiny alphanumerics). If what you have is PDF, export it to TIFF and open it in MODI. On MODI's "tools" menu select "send text to Word" (MODI applys OCR and exports the text data in MS Word). From there, you'll need to work on formatting, etc. Tip: if the source image material is low quality, the OCR output will be unavoidably flawed; you'll have to correct the flaws in your DOC. Tip2: Scan to TIFF (not to MDI, MODI's native format) because TIFF is supported in Acrobat and can readily be converted to PDF.
ncralex
Registered: Sep 23 2010
Posts: 9
rbogie wrote:
the convertion of a scanned PDF to DOC (MS Word) or other word processor format is a perennial problem. there is not a perfectly satisfactory solution. the best results are obtained by scanning to TIFF with MS Office Document Imaging [MODI is an "office tool" that comes with MS office pro]. Scan at 300 dpi (400 dpi if the source comes with tiny alphanumerics). If what you have is PDF, export it to TIFF and open it in MODI. On MODI's "tools" menu select "send text to Word" (MODI applys OCR and exports the text data in MS Word). From there, you'll need to work on formatting, etc. Tip: if the source image material is low quality, the OCR output will be unavoidably flawed; you'll have to correct the flaws in your DOC. Tip2: Scan to TIFF (not to MDI, MODI's native format) because TIFF is supported in Acrobat and can readily be converted to PDF.
Well, the first issue is that the exact same file that won't convert properly in Acrobat to Word 2010 does it excellently in Omnipage 17. So somewhere along the line, Adobe has made their product way too complex. There's no reason for the "industry leader" to make such a poorly performing product. Also, Office 2010 seems to have eliminated MODI. If I bring something into Word as a .TIFF, that's what it will stay as; there's no way of switching it to text (that I know of). Now Adobe advertises that its software can do this, and if it cannot, then they need to stop advertising it and refund some funds because this is ridiculous.

The document that is giving me problems, amongst other documents, is a simple legal brief. It's 100% text!

Also let me say Adobe is ridiculous. If you dare call customer service, you get dropped somewhere in India where you have to use all your power to understand what they are saying -- I even called their corporate HQ in San Jose, CA and they said outright that they don't have customer service based in the United States.
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
Hi ncralex,
You may find the Windows 7 & Office 2010 information at Adobe's
Acrobat FAQ page of interest.As to export of OCR content to other applications (word processor/page layout/what not).
Yes, some tools give a nicer semblance of format and layout than others.

Regardless, I've found that none do 100 percent - always some clean up called for (formatting/layout).



However, for 'digital' (as opposed to scanned images) textual and graphic content in a PDF; if the content author bothers to produce a tagged output PDF then the content can readily be exported (with spot on format,layout, font info, etc). Any clean up that might be called for is a function of how well the content author mastered in the authoring file and how robust the associated tag management facility is.




For a concise overview of this view the recording of "Tech Talk: Tagging PDF Content for Reuse" at the AUC's
Tech Talks

Be well...

rbogie
Registered: Apr 28 2008
Posts: 432
MODI comes bundled with MS Office Professional. If you want me to convert the brief to DOC, send me the image PDF (or a few pages of same). But it sounds like you are satisified with the conversion to DOC with Omnipage.
ncralex
Registered: Sep 23 2010
Posts: 9
daka630 wrote:
Hi ncralex,
You may find the Windows 7 & Office 2010 information at Adobe's
Acrobat FAQ page of interest.As to export of OCR content to other applications (word processor/page layout/what not).
Yes, some tools give a nicer semblance of format and layout than others.

Regardless, I've found that none do 100 percent - always some clean up called for (formatting/layout).



However, for 'digital' (as opposed to scanned images) textual and graphic content in a PDF; if the content author bothers to produce a tagged output PDF then the content can readily be exported (with spot on format,layout, font info, etc). Any clean up that might be called for is a function of how well the content author mastered in the authoring file and how robust the associated tag management facility is.




For a concise overview of this view the recording of "Tech Talk: Tagging PDF Content for Reuse" at the AUC's
Tech Talks
I have the same problem with Office 2007 as well...


This isn't a *semblance* of proper form; it IS proper form. If I printed the original PSF and the Omnipaged result, you wouldn't be able to tell them apart. How can I post it on here for people to see? In Word post Adobe, text is stacked on top of other text. I'm not asking it to nail each font face; just to get it "right" and stop putting blue boxes around each text area. Adobe also doesn't mention anything about Office 2007 which also yields the same result.
ncralex
Registered: Sep 23 2010
Posts: 9
rbogie wrote:
MODI comes bundled with MS Office Professional. If you want me to convert the brief to DOC, send me the image PDF (or a few pages of same). But it sounds like you are satisified with the conversion to DOC with Omnipage.
Office Pro 07 and 10?

I'm not satisfied because we've ponied up nearly $1000 for the software and it doesn't do what it claims and returning software is as about as easy as pushing a camel through the eye of a needle. Until AA9 and MSO07/10, I never had problems with either of these packages. No one at Adobe can or will help and I can't even post images of what I'm talking about here... if the software isn't 100% compatible with MSO07/10, then they should say so on the *BOX*, not the website.
rbogie
Registered: Apr 28 2008
Posts: 432
post at
https://acrobat.com/features_online_workspaces.php
ncralex
Registered: Sep 23 2010
Posts: 9
rbogie wrote:
post at
https://acrobat.com/features_online_workspaces.php
I'm going FROM pdf to doc though... AA already does the conversion TO pdf.
daka630
Expert
Registered: Mar 1 2007
Posts: 1420
acrobat.com - that's the ticket.
Link to getting a free account is:
Free account sign upbtw - camel and needle are easier .

Be well...

rbogie
Registered: Apr 28 2008
Posts: 432
ncralex said: "I can't even post images of what I'm talking about ..."
answer: post images of what you're talking about at:
https://acrobat.com/features_online_workspaces.php
or
www.youSendit.com
and post the link in this thread
jingpohuorg
Registered: Nov 11 2010
Posts: 1
http://www.jingpohu.org

www.jingpohu.org