I do a lot of OCR on files using multiple languages, either primarily in English with quotations in Chinese and/or Japanese and Sanskrit (diacritically marked roman text), or primarily in Chinese or Japanese with quotations in English and Sanskrit. I have two questions:
1. Is there any way to get the Sanskrit diacritically marked characters to come out correctly? For example, ā ě ḍ show on these forum input page as they should, a-longmark, e-hacek, and d-underdot (hmm, did I get those descriptions right?) -- but they end up differently after Acrobat 8 OCR. Is Acrobat 9 going to do any better?
2. Last night I was working with articles with an opening page or so in French and the body of the text in English, with many Japanese quotations. I realized that if I did OCR with Japanese specified it seemed to get everything; the only difference seemed to be slightly different highlighting behavior when searching in the OCR'd text. I am told by someone at Adobe that Acrobat 9 will not change this behavior. OK, but my worry (and second question) is, Do I lose anything when doing this? That is, when setting the primary OCR language to Japanese, do I lose access to English dictionaries or in some other way hamstring the OCR process?
Thanks, in advance, for any suggestions and advice. I'm using a MacBook running the latest version of OS X.
in a similar way, i need to OCR lots of documents with greek, english and, sometimes, hebrew. i did not know Acrobat could do this. usually, i OCR with english as the primary language and everything else comes up rubbish. it sounds like i am missing a trick. how do you make it work right?
thanks and regards,
BW