These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

Making image PDFs word searchable on a Mac

hezekiah-1812
Registered: Mar 11 2011
Posts: 5

I am downloading a lot of PDFs from the New York Times archives (c. 1855) for some historical research I'm doing.
 
I assume the Times has OCR'd their images of these old articles because the text is word-searchable online.
 
Here are my questions, in descending order of the elation they might evoke. (BTW, I'm on a Mac, running Leopard.)
 
1. Is the OCR text somehow hidden in the downloaded PDFs and how would I discover and retrieve it, using Acrobat Pro 9?
 
2. Would it be worth trying Acrobat's built in OCR tools? (Mid-19c. Times type is pretty wavy and clunky.)
 
3. Given that Macs can now word-search documents' body text, can I at least put key words in text box notes or stickies on these PDFs, to help me find important passages? (I've tried both text boxes and stickies but it didn't work.)
 
Thanks.

My Product Information:
Acrobat Pro 9.4.2, Macintosh
AnnaLu
Expert
Registered: Aug 13 2010
Posts: 20
I am not going to do much for your elation meter but I can give you a little help with number 3. If you put keywords into a PDF's metadata, your Mac can find them. I am running Snow Leopard and it couldn't find keywords in text boxes or stickies but it DID find them in metadata. I am expecting Leopard will perform the same. Go to File>Properties>Description>Keywords to insert keywords into a PDF.Regarding number 2, I would try a page and see what you think of the results. It's easy enough to try. If you end up with lots of manual correcting, your keyword idea will probably be simpler.

Good luck!

hezekiah-1812
Registered: Mar 11 2011
Posts: 5
Thanks, AnnaLu, that worked!

And I may try Acrobat's OCR tool.

Here's an extra credit question. Do you know of any book or online or magazine article that could easily be entitled: "How OCR is revolutionizing historical research." Or, "OCR, the latest desktop-computer software." Or, "How old texts are yielding their secrets to OCR." (Feel free to use that last one.)

I'd've clicked on Accept this answer, but was afraid I'd lose the opportunity to ask this follow up.

No biggie, this one.
Merlin
Acrobat 9ExpertTeam
Registered: Mar 1 2006
Posts: 766
Acrobat's OCR tools helps me for some ressurections:

http://abracadabrapdf.net/articles.php?lng=fr&pg=627http://abracadabrapdf.net/articles.php?lng=fr&pg=619http://abracadabrapdf.net/articles.php?lng=fr&pg=618(Clic yellow links to download)

;-)
hezekiah-1812
Registered: Mar 11 2011
Posts: 5
Thanks, Merlin. I'll give the OCR tools a try.

Your site and those links look interesting but are in French, which I "no parlay voo."
AnnaLu
Expert
Registered: Aug 13 2010
Posts: 20
I would search on "John Warnock" and "Octavo" and see if any articles come up that meet your needs. Here is an article to start your research:

http://www.adobe.com/ap/epaper/spotlights/octavo/
hezekiah-1812
Registered: Mar 11 2011
Posts: 5
Thanks for the tips. I found two interesting sites:

http://www.adobe.com/ap/epaper/spotlights/octavo/

http://techcrunch.com/2009/05/02/it-turns-out-that-google-even-has-a-competitive-advantage-in-scanning-books/

Signing off on this subject now. I'd close out thread but don't know how.
Merlin
Acrobat 9ExpertTeam
Registered: Mar 1 2006
Posts: 766
hezekiah-1812 wrote:
Thanks, Merlin. I'll give the OCR tools a try.
Your site and those links look interesting but are in French, which I "no parlay voo."
Yes, Acrobat's OCR cannot automatically translate OCRized documents…
We should ask Adobe to add this feature!
:-)))


French is not difficult to learn, more than half of English language words are common with French language.
;-)

In any case, you can test my portfolios by searching words like "John Warnock", "Adobe", "Bill Gates", "Microsoft", "Steve Job", "Apple", etc. Those words are also shared in both languages.



hezekiah-1812
Registered: Mar 11 2011
Posts: 5
I'm back with a follow up question. I tried Googling this but didn't get a satisfactory answer.

Do Acrobat Pro 9's OCR tools allow one to convert text in a PDF image *on screen*?

I seem to recall reading somewhere that you can scan a hard copy, paper sample that will "teach" the OCR tool to "read" such type on screen thereafter. But I lost track of where I read this — or I'm imagining it!

I'd be willing to buy a plug-in or third party vendor tool. For instance, I saw this online for $113:
ABBYY FineReader Express Edition for Mac. http://www.abbyy.com/finereader_for_mac/
But couldn't figure out from the description if it can do this.

Maybe I can't find an answer online because it can't be done!

I just have too many 1855 New York Times articles to scan them all. If I can't OCR them on screen, I just might as well keyboard the relevant passages I need.