These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

search for text in PDF by VBA with only Adobe Reader installed

silberlöwe2
Registered: Feb 14 2010
Posts: 6

My problem is widely known and frequenty posted, for instance:
"Can anyone help me to open and search for a specific text string in a PDF document, return a true or false indicator (and nothing else)?"

The answers mostly refer to and include
Set gApp = CreateObject("AcroExch.App")

which, as I understand, works only with a certain level of Adobe Acrobat being installed.

My question now:
I want to give this type of functionality (via an MSAccess Form, i.e. populate a ComboBox with PDF filenames which answer YES to certain text occurences) to - say 20 - users in my company who have Adobe Reader 9.1 installed and not more.

Having this number of Adobe Acrobat licenses would be a heavy overkill which I just can't afford.

Any sugestions? many thanks in advance.

Walter

grayangora
Registered: Mar 18 2010
Posts: 8
It is absolutely possible. I also only use Reader and yesterday wrote code to search for strings. The only issue that I have found is that the document you are searching has to be a legitimate PDF document (as opposed to something scanned to PDF from a copier, say)

I am still learning this and right now, am searching for text between the words "DATE" and "RE" in documents, to capture the report date which would appear as

DATE: January 20, 2010
RE: .....


I want to pick out the "January 20, 2010" from the PDF - though right now the code just searches for WORDS and does not pick up punctuation. I have to figure out the punctuation next. This code runs in Excel right now and I have specified the path to the PDF file previosly in the string variable gPDFPath

This is an extract of the code I am using. I write the text I find out to a file. You could modify it to just report your boolean instead of writing to the file.

Set gApp = CreateObject("AcroExch.App")
gApp.Hide

'Set PDDpc object
Set gPDDoc = CreateObject("AcroExch.PDDoc")

' open the PDF
If gPDDoc.Open(gPDFPath) Then 'success
Open strfileout For Append As #ioutfile
Set jso = gPDDoc.GetJSObject


bleFound = False

iNumPages = gPDDoc.GetNumPages
If Not jso Is Nothing Then
For j = 0 To iNumPages - 1
count = jso.getPageNumWords(j) 'if argument is 0, searches current page, else searches all

lMarkLeft = 0
lMarkRight = 0

For i = 0 To count - 1
word = jso.getPageNthWord(j, i)
If LCase(word) = "date" Then lMarkLeft = i + 1
If LCase(word) = "re" Then lMarkRight = i - 1
If lMarkLeft > 0 And lMarkRight > 0 Then
bleFound = True
For m = lMarkLeft To lMarkRight

If bleFound = True Then
If m < lMarkRight Then
Print #ioutfile, jso.getPageNthWord(0, m); " ";
Else
Print #ioutfile, jso.getPageNthWord(0, m); ",";
End If
End If
Next 'm
bleFound = False
lMarkLeft = 0
lMarkRight = 0
'print information between "date:" and "re:" to file, as this is the report date
End If

Next i

.....
silberlöwe2
Registered: Feb 14 2010
Posts: 6
Hello Grayangora (sounds rather familiar to me as I call myself "Silberlöwe", that is silver lion),

many thanks for your answer. Unfortunately, it contains the very reference I am trying to avoid:

*** Set gApp = CreateObject("AcroExch.App") ***

because I think this method is only available within a rather mighty - say rather costly - developer's environment.

I am beginning to suspect that I am posting a somewhat "politically incorrect" question, but I am not trying to develop a really "Acrobat-based" application, only a selection tool for already existing PDF documents.

So my question remains open, and I am sure you wish me good luck with it!

Sincerely

Silberlöwe2

Walter

grayangora
Registered: Mar 18 2010
Posts: 8
Nice to meet you Silver Lion! I don't have the Acrobat developer's environment so don't worry. I am doing the same thing you are. So ... I am using Excel 2007 to develop my code in. I only have Acrobat Reader, so don't despair. However, you do need the reference to Acrobat, which should already be available with Office 2007. Not sure about earlier versions.

Here are my references in Excel

Visual Basic for Applications
Microsoft Excel 12.0 Object Library
OLE Automation
Adobe Acrobat 9.0 Type Library


Make sure these are checked in your Excel Visual Basic environment under the Tools/References menu. Then paste this code below and see if you can at least open the PDF. Update the gPDFPath to point to your own PDF and let me know what happens.

Private Sub AcrobatFindText(aryTexttoFind() As String, strfileout As String)
On Error GoTo ErrHandler

'IAC objects
Dim gAvDoc As Object
Dim oPagesSrc As Long
'variables
Dim Rsp 'For message box responses
Dim gPDFPath As String
Dim i As Integer, j As Integer, m As Integer
Dim sText As String 'String to search for
Dim sStr As String 'Message string
Dim foundText As Integer 'Holds return value from "FindText" method
Dim gPDDoc As Acrobat.CAcroPDDoc
Dim pg As Integer
Dim jso As Object
Dim count As Integer
Dim word As Variant
Dim result As Variant
Dim foundErr As Boolean
Dim iNumPages As Integer
Dim strPageString As String
Dim strSearch As String
Dim ioutfile As Integer
Dim bleFound As Boolean
Dim lMarkLeft As Long
Dim lMarkRight As Long

bleFound = False
ioutfile = FreeFile
'hard coding for a PDF to open, it can be changed when needed.
gPDFPath = "C:\Documents and Settings\....." ' FILL IN YOUR PATH HERE

'Initialize Acrobat by creating App object
Set gApp = CreateObject("AcroExch.App")
gApp.Hide

'Set PDDpc object
Set gPDDoc = CreateObject("AcroExch.PDDoc")


' open the PDF
If gPDDoc.Open(gPDFPath) Then 'success

Stop

end if

end sub
silberlöwe2
Registered: Feb 14 2010
Posts: 6
hello Grayangora,
many thanks again; I was in fact typing a second reply to your where I acknowledged your reference to Reader as being really interesting to me.

Now I am going to test your suggestions, although I am using only 2003 environment, and report my results later on.

Thanks again!

Silberlöwe2

Walter

grayangora
Registered: Mar 18 2010
Posts: 8
Yes please let me know! In the meantime, I am going to work on figuring out how to search for a specific string, and also how to look for punctuation. I have a crude string search solution, but am not happy with it.
grayangora
Registered: Mar 18 2010
Posts: 8
Update: I figured out how to include punctuation in words my code is returning, in case you also need this, for example, I want to include the comma in January 20, 2009 but right now, I am just getting the first word = 'January', next word = '20', third word = '2009'. Here's a good reference

http://www.aces.edu/ctu/techref/software/acrobat/5.x/AcroJS.pdf

See syntax below: So I changed my code from word = jso.getPageNthWord(j, i) to word = jso.getPageNthWord(j, i,false)

getPageNthWord
Parameters: [nPage], [nWord], [bStrip]
Returns: String
Returns the nth word on the page.

nPage is the zero-based index of the page to operate on. If nPage is not specified then nPage is
the first page in the document.
nWord is the zero-based index of the word to obtain. If nWord is not specified then nWord is
the first word on the page.
bStrip is a boolean indicating that punctuation and whitespace should be removed from the
word before returning. Default is true.
grayangora
Registered: Mar 18 2010
Posts: 8
OK! I just figured out how to search for a string!!!! So, in my document, I am searching for the string "Funnel of Doubt". You need to declare an Acrobat object of type AVDoc, this is what contains the search capability. A short code excerpt:

Dim gAvDoc As Acrobat.AcroAVDoc
dim gPDFPath as string
Dim foundText As Integer 'Holds return value from "FindText" method
Dim Rsp 'For message box responses

gPDFPath = "C:\......" ' FILL IN PATH TO YOUR PDF HERE

'set AVDoc object for searching
Set gAvDoc = CreateObject("AcroExch.AVDoc")

If gAvDoc.Open(gPDFPath, "") Then
sText = "Funnel of Doubt"
'FindText params: StringToSearchFor, caseSensitive (1 or 0), WholeWords (1 or 0), ResetSearchToBeginOfDocument (1 or 0)
foundText = gAvDoc.FindText(sText, 1, 0, 1) 'Returns -1 if found, 0 otherwise

Else
' if failed, show error message
rsp = MsgBox("Cannot open" & gPDFPath, vbOKOnly)
End If
If foundText = -1 Then
'compose a message
sStr = "Found " & sText
resp = MsgBox(sStr, vbOKOnly)
Else
' if failed, show error message
resp = MsgBox("Cannot find" & sText, vbOKOnly)
End If
rbryant
Registered: May 18 2011
Posts: 1
This a great thread!! I am excited to get it working. I keep getting an error "ActiveX component can't create object." What am I missing?

Getting just a little closer to understanding...Thanks for any help.