These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

multilingual (roman/CJKV) ocr

blindcorpse
Registered: Dec 12 2008
Posts: 6

I do a lot of OCR on files using multiple languages, either primarily in English with quotations in Chinese and/or Japanese and Sanskrit (diacritically marked roman text), or primarily in Chinese or Japanese with quotations in English and Sanskrit. I have two questions:

1. Is there any way to get the Sanskrit diacritically marked characters to come out correctly? For example, ā ě ḍ show on these forum input page as they should, a-longmark, e-hacek, and d-underdot (hmm, did I get those descriptions right?) -- but they end up differently after Acrobat 8 OCR. Is Acrobat 9 going to do any better?

2. Last night I was working with articles with an opening page or so in French and the body of the text in English, with many Japanese quotations. I realized that if I did OCR with Japanese specified it seemed to get everything; the only difference seemed to be slightly different highlighting behavior when searching in the OCR'd text. I am told by someone at Adobe that Acrobat 9 will not change this behavior. OK, but my worry (and second question) is, Do I lose anything when doing this? That is, when setting the primary OCR language to Japanese, do I lose access to English dictionaries or in some other way hamstring the OCR process?

Thanks, in advance, for any suggestions and advice. I'm using a MacBook running the latest version of OS X.

BirksWorks
Registered: Jan 25 2011
Posts: 1
dear BC,

in a similar way, i need to OCR lots of documents with greek, english and, sometimes, hebrew. i did not know Acrobat could do this. usually, i OCR with english as the primary language and everything else comes up rubbish. it sounds like i am missing a trick. how do you make it work right?

thanks and regards,

BW