OT: Copying text from a PDF

Terry Pinnell <terrypinDELETE@thesedial.pipex.com> wrote:
Rich Grise <richgrise@example.net> wrote:
Click "Download" to get to http://www.foolabs.com/xpdf/download.html ,
[...]

Glad you introduced me to it.
Ahem...

Am I right? Any other methods?
3 and 4 let you just type "pdftotext" no matter what directory you're
in:

3) Put "D:\Program Files\xpdf-3.00pl3\" in your $PATH.

4) Copy pdftotext.exe to c:\windows or c:\winnt (which are already in
your $PATH).

5) Install the "Command Prompt Here" powertoy (Google it) which will
allow you to navigate to the folder in Exploder, then get a command
prompt starting in that directory.

6) format c:, then install Linux or FreeBSD. Caution: this process may
lose data.

Matt Roberds
 
mroberds@worldnet.att.net wrote:

Terry Pinnell <terrypinDELETE@thesedial.pipex.com> wrote:
Rich Grise <richgrise@example.net> wrote:
Click "Download" to get to http://www.foolabs.com/xpdf/download.html ,
[...]

Glad you introduced me to it.

Ahem...
Sorry, Matt - it was of course you who gave me the nod on xpdf <g>.

Am I right? Any other methods?

3 and 4 let you just type "pdftotext" no matter what directory you're
in:

3) Put "D:\Program Files\xpdf-3.00pl3\" in your $PATH.
Presumably that would mean adding a PATH statement as the sole entry
in an autoexec.bat file in C:\? It seems I long ago got rid of that
file, as I thought it was an archaism now frowned upon?

4) Copy pdftotext.exe to c:\windows or c:\winnt (which are already in
your $PATH).
That looks great. Will try later today.

5) Install the "Command Prompt Here" powertoy (Google it) which will
allow you to navigate to the folder in Exploder, then get a command
prompt starting in that directory.
Yep, got that already thanks. That's what I used earlier, but then
needs pasting in the filename and putting quotes around it.

6) format c:, then install Linux or FreeBSD. Caution: this process may
lose data.
Think I'll pass on that for now, thanks!

Ideally, I'd like to be able to r-click the PDF filename, wherever it
is, and choose PDFtoTEXT from a context menu. Is that possible somehow
please? Even maybe deploying a keyboard macro utility to enter some
keystrokes at the appropriate stage?

--
Terry Pinnell
Hobbyist, West Sussex, UK



>Matt Roberds
 
On Fri, 03 Jun 2005 08:20:40 GMT, mroberds@worldnet.att.net wrote:

Terry Pinnell <terrypinDELETE@thesedial.pipex.com> wrote:
Rich Grise <richgrise@example.net> wrote:
Click "Download" to get to http://www.foolabs.com/xpdf/download.html ,
[...]

Glad you introduced me to it.

Ahem...

Am I right? Any other methods?

3 and 4 let you just type "pdftotext" no matter what directory you're
in:

3) Put "D:\Program Files\xpdf-3.00pl3\" in your $PATH.

4) Copy pdftotext.exe to c:\windows or c:\winnt (which are already in
your $PATH).

5) Install the "Command Prompt Here" powertoy (Google it) which will
allow you to navigate to the folder in Exploder, then get a command
prompt starting in that directory.

6) format c:, then install Linux or FreeBSD. Caution: this process may
lose data.

Matt Roberds
As you seem familiar with the product, and to save me downloading and testing to
find out, would you consider it - or part of the suite - might work on the
following:

..pdf file, 100MB, all text (a large database snapshot printout)

Security settings of note:
.. Printing - fully allowed
.. Content copying or extraction - not allowed
.. Content accessibility enabled - allowed

I don't want to print 22930 pages and run them through the OCR machine :-(
 
Terry Pinnell <terrypinDELETE@thesedial.pipex.com> wrote:
mroberds@worldnet.att.net wrote:
3) Put "D:\Program Files\xpdf-3.00pl3\" in your $PATH.

Presumably that would mean adding a PATH statement as the sole entry
in an autoexec.bat file in C:\? It seems I long ago got rid of that
file, as I thought it was an archaism now frowned upon?
Putting it in autoexec.bat might work. You can also set environment
variables one one of the tabs in Control Panel->System. I don't
remember which one, and it might be hiding behind an "advanced" button.

6) format c:, then install Linux or FreeBSD. Caution: this process may
lose data.

Think I'll pass on that for now, thanks!
Get a cheap second computer and try it. Alternatively, download Knoppix,
which runs a Linux system entirely from CD and doesn't touch your hard
drive, and try it out on your main computer.

Ideally, I'd like to be able to r-click the PDF filename, wherever it
is, and choose PDFtoTEXT from a context menu. Is that possible somehow
please?
I don't know. Googling "command prompt here" yielded
http://www.petri.co.il/add_command_prompt_here_shortcut_to_windows_explorer.htm
which describes how to manually add things to the context menu.

Matt Roberds
 
budgie <me@privacy.net> wrote:
[xpdf/pdftotext ability to extract text from a large PDF file]
. Content copying or extraction - not allowed
Most likely it won't work. See http://www.foolabs.com/xpdf/cracking.html
for why.

Matt Roberds
 
mroberds@worldnet.att.net wrote:

Terry Pinnell <terrypinDELETE@thesedial.pipex.com> wrote:

Ideally, I'd like to be able to r-click the PDF filename, wherever it
is, and choose PDFtoTEXT from a context menu. Is that possible somehow
please?

I don't know. Googling "command prompt here" yielded
http://www.petri.co.il/add_command_prompt_here_shortcut_to_windows_explorer.htm
which describes how to manually add things to the context menu.
Maybe I'll ask around in few more on-topic groups. I already have
'Open Command Window Here' in my context menu. It's automating the
rest of it that's the challenge!

--
Terry Pinnell
Hobbyist, West Sussex, UK
 
On Fri, 03 Jun 2005 08:40:34 +0100, Terry Pinnell wrote:

Rich Grise <richgrise@example.net> wrote:

Click "Download" to get to http://www.foolabs.com/xpdf/download.html ,
then scroll down to "Precompiled binaries" and it's probably either
xpdf-3.00pl3-win32.zip (1142558 bytes) for Win32 or
xpdf-3.00pl3-dos6.zip (1775202 bytes) for DOS.
[snip useful extract]

Thanks, got it, and it works a treat. Particularly impressed with its
speed. Glad you introduced me to it.
Actually, I don't deserve credit for it at all - it was someone else,
just up-thread a bit ... Matt Roberds, in <l9dne.38478$Gp.36280@fed1read04>.

Mind you, I'm never too keen on leaving the GUI and getting into a DOS
Command Prompt window <g>. Seem to be two ways of doing it:

1) Open the DOS window in the XPDF program folder and then (assuming
defaults are OK) enter:

pdftotext "D:\long path\probably with some blanks\so needs
quotes\filename.pdf"

2) Open the DOS window in the folder containing the PDF file and
enter:

"D:\Program Files\xpdf-3.00pl3\pdftotext" filename.pdf

Am I right? Any other methods?
Danged if I know. I'm using Linux, so I just open a console window or
a one-line command line ("Run Command" in Kde) and type "xpdf filename.pdf".
You could probably do that with Doze Start/Run... (type command as above).
I can open xpdf all by itself, but it seems to not have any menus - they
seem to have put their programming effort into actually getting the job
done. :) I'm sure I could make it more automatic by tweaking my menus
and my file browser and stuff, but for the number of times I need to
actually copy/paste text from a pdf, it's not worth the effort.

You do know that if you have your folder options set to "display full
path in address bar" in windows explorer, that you can copy and paste the
full path without typing the whole dang thing?

Cheers!
Rich
 
On Fri, 03 Jun 2005 21:42:40 +0800, budgie wrote:
[about xpdf ]
As you seem familiar with the product, and to save me downloading and testing to
find out, would you consider it - or part of the suite - might work on the
following:

.pdf file, 100MB, all text (a large database snapshot printout)

Security settings of note:
. Printing - fully allowed
. Content copying or extraction - not allowed
. Content accessibility enabled - allowed

I don't want to print 22930 pages and run them through the OCR machine :-(
Well, if you can't capture the page as a graphic, you could do a screen
snap; at least that way you wouldn't have to print it to paper - just do
the OCR on the graphic.

But I do admit I'm just guessing here.

Good Luck!
Rich
 
Rich Grise <richgrise@example.net> wrote:

On Fri, 03 Jun 2005 08:40:34 +0100, Terry Pinnell wrote:

Rich Grise <richgrise@example.net> wrote:

Click "Download" to get to http://www.foolabs.com/xpdf/download.html ,
then scroll down to "Precompiled binaries" and it's probably either
xpdf-3.00pl3-win32.zip (1142558 bytes) for Win32 or
xpdf-3.00pl3-dos6.zip (1775202 bytes) for DOS.
[snip useful extract]

Thanks, got it, and it works a treat. Particularly impressed with its
speed. Glad you introduced me to it.

Actually, I don't deserve credit for it at all - it was someone else,
just up-thread a bit ... Matt Roberds, in <l9dne.38478$Gp.36280@fed1read04>.
Yes, I've duly apologised to Matt!

Mind you, I'm never too keen on leaving the GUI and getting into a DOS
Command Prompt window <g>. Seem to be two ways of doing it:

1) Open the DOS window in the XPDF program folder and then (assuming
defaults are OK) enter:

pdftotext "D:\long path\probably with some blanks\so needs
quotes\filename.pdf"

2) Open the DOS window in the folder containing the PDF file and
enter:

"D:\Program Files\xpdf-3.00pl3\pdftotext" filename.pdf

Am I right? Any other methods?

Danged if I know. I'm using Linux, so I just open a console window or
a one-line command line ("Run Command" in Kde) and type "xpdf filename.pdf".
You could probably do that with Doze Start/Run... (type command as above).
I can open xpdf all by itself, but it seems to not have any menus - they
seem to have put their programming effort into actually getting the job
done. :) I'm sure I could make it more automatic by tweaking my menus
and my file browser and stuff, but for the number of times I need to
actually copy/paste text from a pdf, it's not worth the effort.

You do know that if you have your folder options set to "display full
path in address bar" in windows explorer, that you can copy and paste the
full path without typing the whole dang thing?
I usually use R-click>Send To>Clipboard As Name. It's a pity that
'DOS' (more accurately WinXP Command Prompt) doesn't accept names with
blanks, necessitating quotes (or use of the old 8-character names).

--
Terry Pinnell
Hobbyist, West Sussex, UK
 
budgie wrote:
On Fri, 03 Jun 2005 08:20:40 GMT, mroberds@worldnet.att.net wrote:


Terry Pinnell <terrypinDELETE@thesedial.pipex.com> wrote:

Rich Grise <richgrise@example.net> wrote:

Click "Download" to get to http://www.foolabs.com/xpdf/download.html ,
[...]

Glad you introduced me to it.

Ahem...


Am I right? Any other methods?

3 and 4 let you just type "pdftotext" no matter what directory you're
in:

3) Put "D:\Program Files\xpdf-3.00pl3\" in your $PATH.

4) Copy pdftotext.exe to c:\windows or c:\winnt (which are already in
your $PATH).

5) Install the "Command Prompt Here" powertoy (Google it) which will
allow you to navigate to the folder in Exploder, then get a command
prompt starting in that directory.

6) format c:, then install Linux or FreeBSD. Caution: this process may
lose data.

Matt Roberds


As you seem familiar with the product, and to save me downloading and testing to
find out, would you consider it - or part of the suite - might work on the
following:

.pdf file, 100MB, all text (a large database snapshot printout)

Security settings of note:
. Printing - fully allowed
. Content copying or extraction - not allowed
. Content accessibility enabled - allowed

I don't want to print 22930 pages and run them through the OCR machine :-(
Elsewhere I suggested creating a virtual postscript printer. I have
done this myself. _Anything_ that will let me print can be directed to
that printer and thus to a file. It can then converted with Ghost
script to other forms.

Ted
 
On Fri, 03 Jun 2005 21:09:44 GMT, Ted Edwards <Ted_Espamless@telus.net> wrote:

budgie wrote:
On Fri, 03 Jun 2005 08:20:40 GMT, mroberds@worldnet.att.net wrote:


Terry Pinnell <terrypinDELETE@thesedial.pipex.com> wrote:

Rich Grise <richgrise@example.net> wrote:

Click "Download" to get to http://www.foolabs.com/xpdf/download.html ,
[...]

Glad you introduced me to it.

Ahem...


Am I right? Any other methods?

3 and 4 let you just type "pdftotext" no matter what directory you're
in:

3) Put "D:\Program Files\xpdf-3.00pl3\" in your $PATH.

4) Copy pdftotext.exe to c:\windows or c:\winnt (which are already in
your $PATH).

5) Install the "Command Prompt Here" powertoy (Google it) which will
allow you to navigate to the folder in Exploder, then get a command
prompt starting in that directory.

6) format c:, then install Linux or FreeBSD. Caution: this process may
lose data.

Matt Roberds


As you seem familiar with the product, and to save me downloading and testing to
find out, would you consider it - or part of the suite - might work on the
following:

.pdf file, 100MB, all text (a large database snapshot printout)

Security settings of note:
. Printing - fully allowed
. Content copying or extraction - not allowed
. Content accessibility enabled - allowed

I don't want to print 22930 pages and run them through the OCR machine :-(

Elsewhere I suggested creating a virtual postscript printer. I have
done this myself. _Anything_ that will let me print can be directed to
that printer and thus to a file. It can then converted with Ghost
script to other forms.
Thanks. I will explore that route.
 
On Fri, 03 Jun 2005 19:35:48 GMT, Rich Grise <richgrise@example.net> wrote:

On Fri, 03 Jun 2005 21:42:40 +0800, budgie wrote:
[about xpdf ]
As you seem familiar with the product, and to save me downloading and testing to
find out, would you consider it - or part of the suite - might work on the
following:

.pdf file, 100MB, all text (a large database snapshot printout)

Security settings of note:
. Printing - fully allowed
. Content copying or extraction - not allowed
. Content accessibility enabled - allowed

I don't want to print 22930 pages and run them through the OCR machine :-(

Well, if you can't capture the page as a graphic, you could do a screen
snap; at least that way you wouldn't have to print it to paper - just do
the OCR on the graphic.

But I do admit I'm just guessing here.

Good Luck!
I'll need a lot more than luck, with 22930 pages :-(
 
mroberds@worldnet.att.net wrote:
Most likely it won't work. See http://www.foolabs.com/xpdf/cracking.html
for why.
A Russion guy who wrote a PDF decryptor visited the Black Hat conference
and was arrested and *locked away* for violating the DMCA. Even though
the PDFs were plainly readable on the screen... your legal system is
truly stuffed.
 

Welcome to EDABoard.com

Sponsor

Back
Top