Hi all,
My Client requirement is to do a PDF search (non-english) in the Search module of his e-learning website. When i try to extract the contents of PDF for indexing, some of the characters are neglected during extraction (empty spaces in that area,when i view the indexed contents in Luke). I am getting these problem for languages like Tamil/Hindi.
The Client is very adamant that he wants PDF search.
What is the solution for this...Please give me a ray of light or guidelines.
Thanks and Regards,
aras
Non-Western text in PDF relies on intact Unicode mapping - so unless you view the exported text in the same working space and have fonts installed to support all the glyphs, stuff will be missing or corrupted. There can also be cases where the PDF has incomplete Unicode maps, so text visible on the page is not exportable. You can test for this by searching for the words within Acrobat itself.