These forums are now Read Only. If you have an Acrobat question, ask questions and get help from one of our experts.

How to export to Text and keeping empty pages

griddie
Registered: Jan 11 2010
Posts: 8
Answered

Hi,

I'm using Acrobat Pro 9 exporting a PDF-file whith empty pages to text. The empty pages are left out in the text file - there is no page-brake inserted into the text file for those pages. Therefore I can not use the text-file as a search index in another program, since it has another page count than the original PDF :-(

Isn't there any way to tell Acrobat to keep the empty pages upon export?

Does anyone know another export format which containes the text and page format in one file?

Why is the XML-Export not containing any page-brakes?

Thanks for any advice ;-)
Griddie

thomp
Expert
Registered: Feb 15 2006
Posts: 4411
You should consider replacing the empty pages with some token text. One way to do this would be to add a header or footer with page numbers to all pages.

XML is a data format. The PDF is decomposed into parts and organized in the XML tags so that it can be inteligently extracted, rebuilt, or repurposed at some later time. I'm pretty sure there is a way to tell which page a piece of text was on.

Thom Parker
The source for PDF Scripting Info
[url=http://www.pdfScripting.com]pdfscripting.com[/url]

The Acrobat JavaScript Reference, Use it Early and Often
[url=http://www.adobe.com/devnet/acrobat/javascript.php]http://www.adobe.com/devnet/acrobat/javascript.php[/url]

Then most important JavaScript Development tool in Acrobat
[url=http://www.pdfscripting.com/public/34.cfm#JSIntro][b]The Console Window (Video tutorial)[/b][/url]
[url=http://www.acrobatusers.com/tutorials/2006/javascript_console][b]The Console Window(article)[/b][/url]

Thom Parker
The source for PDF Scripting Info
www.pdfscripting.com
Very Important - How to Debug Your Script

griddie
Registered: Jan 11 2010
Posts: 8
Thanks for your ideas, but I'm not allowed to change the layout of the PDF in any way. So if I inserted some token text, it woul have to be invisible (like spaces maybee). But I'm pretty sure, that spaces (or even empty footlines) will be left out too, if they are the only content of a page...

griddie
rbogie
Registered: Apr 28 2008
Posts: 432
these are the basics: i gather (assume) when you say "text" that you want to convert each page, including blank pages, to a stand alone TXT file. let's start with a 10 page PDF, where some pages are in blank.
step 1: split the PDF such that each resulting PDF has one page. hence, the 10 page PDF will be split to 10 single page PDF files. (set output options to suit your needs).
step 2: create a new batch sequence named "save all as text" that's analogous to the stock "save all as RTF" (set output options).
step 3: run the new batch seq on the 10 single page PDFs.
Result: 10 sequentially named TXT files, including blank files corresponding to the blank pages of the original 10-page PDF.