AN: GuruGram #46 ...

D

Don Lancaster

Guest
.... is now available for free download as
http://www.tinaja.com/glib/extract1.pdf

It is on programatically xtracting text and content from Acrobat .PDF files.

Sourcecode is separately available as
http://www.tinaja.com/glib/extract1.psl

Additional GuruGrams are found at http://www.tinaja.com/gurgrm01.asp

--
Many thanks,

Don Lancaster
Synergetics 3860 West First Street Box 809 Thatcher, AZ 85552
voice: (928)428-4073 email: don@tinaja.com

Please visit my GURU's LAIR web site at http://www.tinaja.com
 
In sci.electronics.design Don Lancaster <don@tinaja.com> wrote:
... is now available for free download as
http://www.tinaja.com/glib/extract1.pdf

It is on programatically xtracting text and content from Acrobat .PDF files.
I've had good results with xpdf.
http://www.foolabs.com/xpdf/
pdftops and pdftotext both work well.
 
Ian Stirling wrote:
In sci.electronics.design Don Lancaster <don@tinaja.com> wrote:

... is now available for free download as
http://www.tinaja.com/glib/extract1.pdf

It is on programatically xtracting text and content from Acrobat .PDF files.


I've had good results with xpdf.
http://www.foolabs.com/xpdf/
pdftops and pdftotext both work well.
EXTRACT1.PDF also does word frequency analysis and is easily
customizable for virtually any .PDF manipulation.

--
Many thanks,

Don Lancaster
Synergetics 3860 West First Street Box 809 Thatcher, AZ 85552
voice: (928)428-4073 email: don@tinaja.com

Please visit my GURU's LAIR web site at http://www.tinaja.com
 
On 25 Feb 2005 19:54:26 GMT, Ian Stirling wrote:

In sci.electronics.design Don Lancaster <don@tinaja.com> wrote:
... is now available for free download as
http://www.tinaja.com/glib/extract1.pdf

It is on programatically xtracting text and content from Acrobat .PDF files.

I've had good results with xpdf.
http://www.foolabs.com/xpdf/
pdftops and pdftotext both work well.
The only complaint I have with pdftops is that it turns spot colors
(DeviceN) into process colors. When I have a document with process colors,
I have to use Acrobat 5.0 (not reader) Save As Postscript.
 
On Fri, 25 Feb 2005 18:26:46 -0500, the renowned Active8
<reply2group@ndbbm.net> wrote:

On 25 Feb 2005 19:54:26 GMT, Ian Stirling wrote:

In sci.electronics.design Don Lancaster <don@tinaja.com> wrote:
... is now available for free download as
http://www.tinaja.com/glib/extract1.pdf

It is on programatically xtracting text and content from Acrobat .PDF files.

I've had good results with xpdf.
http://www.foolabs.com/xpdf/
pdftops and pdftotext both work well.

rant
Acrobat Pro v6.0 works for me, now (POS that it is.)

Either pdf995 or one of those free deals I got has a text extract
facility, but the programs (best used to convert to ps or pdf) all
suck on some features like making bookmarks and some of the other
automagic sh*t I forgot about since getting Acrobat.

Should be AcroFat. It hogs as much or more mem than Orcad,
Dreamweaver, etc. And stoopid! Can't even remember the last save dir
once it shuts down. You can't shut the main window while a pdf is
open in the browser and when you click the back button, if another
pdf is open somewhere, it gives you the choice to exit Acrobat or
leave it running in the browser. But sometimes it's wrong and still
asks, so I leave it open to remember the save dir and so it doesn't
have to load again to get the next pdf file. If I wait too long, it
realizes there are no other instances and shuts itself down,
forgetting everything.
Adobe Illustrator will WYSYWIG edit PDF documents (one page at a
time). You can remove elements for use elsewhere and so on. Very
useful. More indirectly, but cheaper, you can print a PDF to PS, and
using (free) Ghostscript you can convert PS it to Adobe Illustrator
(.ai) format (ps2ai.bat), and then WYSYWIG edit it with Mayura Draw
(free trial and $39 to register).


Luckily Firefox "save as" *always* remembers the save dir and is
*always* able to save a pdf, unlike MSIE. It tries to save an html
page.
/rant
But it gives errors when Acrobat reader exits about half the time
(Acrobat 5 anyway), and *sometimes* can't handle large PDFs properly
inside a browser window, and sometimes you just have to kill Acrobat
from the task manager and restart it. Not pleasant. Is 6.0 any better
in that regard? Do I have to add another half-gigabyte of RAM to run
it?



Best regards,
Spehro Pefhany
--
"it's the network..." "The Journey is the reward"
speff@interlog.com Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog Info for designers: http://www.speff.com
 
On Fri, 25 Feb 2005 23:15:03 -0500, in sci.electronics.design Active8
<reply2group@ndbbm.net> wrote:

snip
rant
Acrobat Pro v6.0 works for me, now (POS that it is.)
found this, liposuction for adobe
http://www.theinquirer.net/?article=11041




martin

"An eye for an eye makes the whole world blind"
Gandhi
 
On Sat, 26 Feb 2005 09:23:48 +0100, martin griffith wrote:

On Fri, 25 Feb 2005 23:15:03 -0500, in sci.electronics.design Active8
reply2group@ndbbm.net> wrote:

snip
rant
Acrobat Pro v6.0 works for me, now (POS that it is.)


found this, liposuction for adobe
http://www.theinquirer.net/?article=11041

martin

"An eye for an eye makes the whole world blind"
Gandhi
I think that's the same link posted last year, but Acrobat Pro 6 is
*not* Acrobat Reader. It might work, though. I'd really like to pull
out and old cd that shipped with v4 or 5 of the reader. Maybe I can
associate it with pdf files and only open Pro for editing without
too much registry hacking.
--
Best Regards,
Mike
 
On Sun, 27 Feb 2005 14:45:58 GMT, Torbjorn Lindgren wrote:

Active8 <reply2group@ndbbm.net> wrote:
6.0 takes "forever" to start up in the browser or alone. In the
browser, you can't do anything while that's happening and the first

Most of the time is used to load useless plugins and show the splash
screen (the splash screen shouldn't matter, but it does...)

http://sastools.com/b2/post/79394202
http://hacks.oreilly.com/pub/h/2347

IIRC there are a few plugins that you may want to leave in plug_ins,
because they will almost always be needed, the second link tells you
how to find out what the different plugins do and what dependencies
there are.

Even Acrobat 5.1 had a lot of plugins, wonder what the startup time
would be if I reinstalled that and did the same there :)

[...]
I lost a browser before this post. I changed virtual desktops (to
run the Hamster script that found your reply) while the OS was doing
the pagefile thing (I clicked refresh on TV guide :( - ask Don. He'd
say cut the TV power cord) and the browser did the musical desktop
thing and shut down.

Overload? :)
No. JS Virtual Pager. I'm surprised the thing works as well as it
does (neary frawress) considering the winders OS. I used to play
hell getting 9x to not do stupid things with my GUIs. I actually had
it so that it wouldn't make my task bar fly out/unhide when I closed
a program.

Acrobat is the prime offender with JS Pager. If it's loading a doc,
and I change desktops, I can watch Acrobat's icon jump from screen
to screen on the JS pager monitor.
--
Best Regards,
Mike
 
On Sun, 27 Feb 2005 11:53:30 -0500, Active8 wrote:

Oh. Thanks for the links. "sastools.com"... sounds like something
John Weisman would use.
--
Best Regards,
Mike
 

Welcome to EDABoard.com

Sponsor

Back
Top