Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

With all the fancy scientists in the world, why can't they just once build a nuclear balm?


devel / comp.lang.awk / Re: ATTN: GAWK developers. I need help with writing an input filter extension.

SubjectAuthor
* ATTN: GAWK developers. I need help with writing an input filter extension.Kenny McCormack
+- Re: ATTN: GAWK developers. I need help with writing an input filter extension.Spiros Bousbouras
`* Re: ATTN: GAWK developers. I need help with writing an input filterBruce Horrocks
 `- Re: ATTN: GAWK developers. I need help with writing an input filterKenny McCormack

1
ATTN: GAWK developers. I need help with writing an input filter extension.

<sdrlk2$3r38$1@news.xmission.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=826&group=comp.lang.awk#826

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gazelle@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.lang.awk
Subject: ATTN: GAWK developers. I need help with writing an input filter extension.
Date: Wed, 28 Jul 2021 13:21:06 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <sdrlk2$3r38$1@news.xmission.com>
Injection-Date: Wed, 28 Jul 2021 13:21:06 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="126056"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Wed, 28 Jul 2021 13:21 UTC

First, note that I have already written one. It provides "readline"-like
capability to GAWK. It uses a package which is similar to, but different
from, "readline", so that you have a scrollback buffer when you are
entering lines at the terminal in GAWK. I wrote is several years ago and
use it extensively. So far, so good.

Basically, what that extension does is, when called, it calls the "getline"
function in the other package, then copies the line read from the buffer of
the "getline" function into the buffer provided by GAWK. GAWK then picks
it up and everything works as expected.

But here's the thing. I want to write one now that will read the line
normally and then do something to the line before returning it to GAWK.
What I don't know how to do is to call GAWK's normal "getline" function
from my extension library. So, what I am thinking of is something like:

/* In my extension code; note that "fd" is passed in as a parameter */
normal_gawk_input(fd,buff);
/* Now examine (and possibly change) buff */
...
/* And return to GAWK */
return awk_true;

Some notes:

0) My target is Linux. Don't care about any other OS or any other
"portability" or "standards" considerations.
1) One of the sample extensions, readfile, looks like it does something
similar to what I want. But it includes a function called
read_file_to_buffer(), that looks more than a little above my pay grade.
It seems like you shouldn't have to do that. I'd rather call
whatever code GAWK already uses to read the line.
2) I thought about using the Linux function getline(3). That would
work, except for one little problem. The problem is that getline
wants a FILE * object, but GAWK deals in "fd"s. You could use
fdopen(3) to convert, but that seems messy. It seems wasteful to
call fdopen() every time the input filter function is called, but I
don't see any entirely safe way to avoid doing that. It would be
nice if there was "fd" version of getline(), but I don't know of
anything like that. (see footnote below at (*))
3) Alternatively, if there was some way to have GAWK read the input
line "normally" and then call my function before continuing (i.e.,
have the extension function be able to examine the line already
read), then that'd be good. But I don't think there is any
capability for that in GAWK, as of the current writing.

Finally, another question about these "input filter" functions in general.
The discussion so far has always been in terms of lines - i.e, the usual
line-oriented input model. What happens if RS is set to something other
than the default? Is the input filter function supposed to deal with that
itself or does GAWK provide some kind of handling?

(*) Part of the problem is that it seems clear to me that fdopen(3)
allocates memory (presumably, using malloc() or similar) under the covers
for the FILE * object that it creates. There doesn't seem to be any clean
way to free() that allocated memory.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/DanaC

Re: ATTN: GAWK developers. I need help with writing an input filter extension.

<zQ66SfL+M+OAShA62@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=827&group=comp.lang.awk#827

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!aioe.org!OC6U9UkZn9R/lnxSpxG5YA.user.46.165.242.91.POSTED!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.awk
Subject: Re: ATTN: GAWK developers. I need help with writing an input filter extension.
Date: Wed, 28 Jul 2021 17:43:27 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <zQ66SfL+M+OAShA62@bongo-ra.co>
References: <sdrlk2$3r38$1@news.xmission.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="27600"; posting-host="OC6U9UkZn9R/lnxSpxG5YA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Server-Commands: nowebcancel
X-Notice: Filtered by postfilter v. 0.9.2
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Wed, 28 Jul 2021 17:43 UTC

On Wed, 28 Jul 2021 13:21:06 -0000 (UTC)
gazelle@shell.xmission.com (Kenny McCormack) wrote:
> First, note that I have already written one. It provides "readline"-like
> capability to GAWK. It uses a package which is similar to, but different
> from, "readline", so that you have a scrollback buffer when you are
> entering lines at the terminal in GAWK. I wrote is several years ago and
> use it extensively. So far, so good.
>
> Basically, what that extension does is, when called, it calls the "getline"
> function in the other package, then copies the line read from the buffer of
> the "getline" function into the buffer provided by GAWK. GAWK then picks
> it up and everything works as expected.
>
> But here's the thing. I want to write one now that will read the line
> normally and then do something to the line before returning it to GAWK.
> What I don't know how to do is to call GAWK's normal "getline" function
> from my extension library. So, what I am thinking of is something like:
>
> /* In my extension code; note that "fd" is passed in as a parameter */
> normal_gawk_input(fd,buff);
> /* Now examine (and possibly change) buff */
> ...
> /* And return to GAWK */
> return awk_true;
>
> Some notes:

[...]

> 2) I thought about using the Linux function getline(3). That would
> work, except for one little problem. The problem is that getline
> wants a FILE * object, but GAWK deals in "fd"s. You could use
> fdopen(3) to convert, but that seems messy. It seems wasteful to
> call fdopen() every time the input filter function is called, but I
> don't see any entirely safe way to avoid doing that. It would be
> nice if there was "fd" version of getline(), but I don't know of
> anything like that. (see footnote below at (*))

Isn't it trivial to write your own getline() with the interface you want ?

[...]

> (*) Part of the problem is that it seems clear to me that fdopen(3)
> allocates memory (presumably, using malloc() or similar) under the covers
> for the FILE * object that it creates. There doesn't seem to be any clean
> way to free() that allocated memory.

I don't know what your overall set up is and what function calls what when
but you can specify your own buffer using setvbuf() .This way you can free
it whenever you want.

--
vlaho.ninja/prog

Re: ATTN: GAWK developers. I need help with writing an input filter extension.

<0da59d24-2344-71d2-ba62-a548e64c0f7c@scorecrow.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=828&group=comp.lang.awk#828

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: 07.013@scorecrow.com (Bruce Horrocks)
Newsgroups: comp.lang.awk
Subject: Re: ATTN: GAWK developers. I need help with writing an input filter
extension.
Date: Wed, 28 Jul 2021 23:43:25 +0100
Lines: 22
Message-ID: <0da59d24-2344-71d2-ba62-a548e64c0f7c@scorecrow.com>
References: <sdrlk2$3r38$1@news.xmission.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 0+AM/IsxqpbJYVCPlT/YgQtctvKDVCd72aqyh5CEP7pX/X0rI/
Cancel-Lock: sha1:70rI9wsdN53+goTl6r7e+LY4//M=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <sdrlk2$3r38$1@news.xmission.com>
Content-Language: en-GB
 by: Bruce Horrocks - Wed, 28 Jul 2021 22:43 UTC

On 28/07/2021 14:21, Kenny McCormack wrote:
> 2) I thought about using the Linux function getline(3). That would
> work, except for one little problem. The problem is that getline
> wants a FILE * object, but GAWK deals in "fd"s. You could use
> fdopen(3) to convert, but that seems messy. It seems wasteful to
> call fdopen() every time the input filter function is called, but I
> don't see any entirely safe way to avoid doing that. It would be
> nice if there was "fd" version of getline(), but I don't know of
> anything like that. (see footnote below at (*))

You don't need to call fdopen() every time, if I understand this page
correctly:
<https://www.gnu.org/software/gawk/manual/html_node/Input-Parsers.html>

I think you need only call it when your XXX_can_take_file() function is
invoked and save the obtained FILE value in a global static.

So that's once per file not once per record.

--
Bruce Horrocks
Surrey, England

Re: ATTN: GAWK developers. I need help with writing an input filter extension.

<sdsu5n$4fsl$1@news.xmission.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=829&group=comp.lang.awk#829

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gazelle@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.lang.awk
Subject: Re: ATTN: GAWK developers. I need help with writing an input filter
extension.
Date: Thu, 29 Jul 2021 00:53:11 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <sdsu5n$4fsl$1@news.xmission.com>
References: <sdrlk2$3r38$1@news.xmission.com> <0da59d24-2344-71d2-ba62-a548e64c0f7c@scorecrow.com>
Injection-Date: Thu, 29 Jul 2021 00:53:11 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="147349"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Thu, 29 Jul 2021 00:53 UTC

In article <0da59d24-2344-71d2-ba62-a548e64c0f7c@scorecrow.com>,
Bruce Horrocks <07.013@scorecrow.com> wrote:
>On 28/07/2021 14:21, Kenny McCormack wrote:
>> 2) I thought about using the Linux function getline(3). That would
>> work, except for one little problem. The problem is that getline
>> wants a FILE * object, but GAWK deals in "fd"s. You could use
>> fdopen(3) to convert, but that seems messy. It seems wasteful to
>> call fdopen() every time the input filter function is called, but I
>> don't see any entirely safe way to avoid doing that. It would be
>> nice if there was "fd" version of getline(), but I don't know of
>> anything like that. (see footnote below at (*))
>
>You don't need to call fdopen() every time, if I understand this page
>correctly:
><https://www.gnu.org/software/gawk/manual/html_node/Input-Parsers.html>
>
>I think you need only call it when your XXX_can_take_file() function is
>invoked and save the obtained FILE value in a global static.
>
>So that's once per file not once per record.

Thank you. That makes a lot of sense.

Now, as it happens, it turns out I made a boo-boo here. My underlying
assumption about what was needed to be implemented was all wrong. Upon
digging into things a bit deeper, I realized that the function you write as
an Input Parser is not a replacement for some line-oriented function like
getline(3), but is, rather, supposed to be a "drop-in" for read(2). Note
that the default value for iobuf -> read_func is "read". This is the thing
that you change to point to your new function.

That's why the new function that you are to define is declared as:

static ssize_t XXX_read(int fd, void *buf, size_t nbytes);

which is the same signature as read(2).

Once I realized this, everything became quite clear.
It also, incidentally, answers my question about RS.

Anyway, I was able to quickly write the new Input Parser that I had planned.
I will be posting a summary of that new functionality soon.

--
Politics is show business for ugly people.

Sports is politics for stupid people.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor