Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Programmers do it bit by bit.


computers / comp.text.pdf / Re: extracting data from pdf

SubjectAuthor
* extracting data from pdfzeneca
`- Re: extracting data from pdfJoe Beanfish

1
extracting data from pdf

<sr6vtq$rif$1@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=47&group=comp.text.pdf#47

  copy link   Newsgroups: comp.text.pdf
Path: i2pn2.org!i2pn.org!aioe.org!tgTnqjlB+mH8wb8tQyid8Q.user.46.165.242.75.POSTED!not-for-mail
From: pasIci@ailleur.fr (zeneca)
Newsgroups: comp.text.pdf
Subject: extracting data from pdf
Date: Thu, 6 Jan 2022 15:55:52 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sr6vtq$rif$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="28239"; posting-host="tgTnqjlB+mH8wb8tQyid8Q.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.4.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: zeneca - Thu, 6 Jan 2022 14:55 UTC

Hello,
I would like to extract date (account number, name, date ....) from a
pdf file. Any idee how to do this??
Many thanks in advances
André

Re: extracting data from pdf

<sr9tp3$lg3$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=51&group=comp.text.pdf#51

  copy link   Newsgroups: comp.text.pdf
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: joebeanfish@nospam.duh (Joe Beanfish)
Newsgroups: comp.text.pdf
Subject: Re: extracting data from pdf
Date: Fri, 7 Jan 2022 17:37:39 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <sr9tp3$lg3$2@dont-email.me>
References: <sr6vtq$rif$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jan 2022 17:37:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d30a93288ebfb763dadbda80a78737b5";
logging-data="22019"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX191/vUlv4ZEvmaYZEZlcKIp7Ag/x82KR3Q="
User-Agent: Pan/0.146 (Hic habitat felicitas; 8107378
git@gitlab.gnome.org:GNOME/pan.git)
Cancel-Lock: sha1:HoKUYQZKavVk5+zPdkJEJodH7js=
 by: Joe Beanfish - Fri, 7 Jan 2022 17:37 UTC

On Thu, 06 Jan 2022 15:55:52 +0100, zeneca wrote:

> Hello,
> I would like to extract date (account number, name, date ....) from a
> pdf file. Any idee how to do this??
> Many thanks in advances
> André

Since "account number" isn't a standard pdf meta data, I assume you
want to extract meaningful data from the content of page(s) stored in
the PDF? If you're lucky, it's not just a picture of a page and has
actual text behind it. Try "pdftotext" or "pdftohtml" (part of "poppler")
to extract whatever text there is. Then your favorite text processing
language/utility for extracting the desired portions of the text.

If you happen to want meta data from the pdf, try "pdfinfo", also a
part of the "poppler" package.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor