Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

The value of a program is proportional to the weight of its output.


devel / comp.lang.awk / Re: The Art of Unix Programming - Case Study: awk

SubjectAuthor
* The Art of Unix Programming - Case Study: awkJanis Papanagnou
+* Re: The Art of Unix Programming - Case Study: awkKenny McCormack
|`* Re: The Art of Unix Programming - Case Study: awkKenny McCormack
| `- Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
+- Re: The Art of Unix Programming - Case Study: awkEd Morton
+* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
|`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
| +* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
| |`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
| | `* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
| |  `* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
| |   +* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
| |   |`- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
| |   `- Re: The Art of Unix Programming - Case Study: awkJeremy Brubaker
| `- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
+* Re: The Art of Unix Programming - Case Study: awkOlaf Schultz
|`- Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
`* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
 `* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
  `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   +* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   |+* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   ||+* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||`* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   ||| `* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||  +* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  |`* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||  | `- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  +* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  |`* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   |||  | +- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  | `- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||  `* Syntactic Sugar (Was: The Art of Unix Programming - Case Study: awk)Kenny McCormack
   |||   `* Re: Syntactic Sugar (Was: The Art of Unix Programming - Case Study:Kaz Kylheku
   |||    `- Re: Syntactic SugarBen Bacarisse
   ||`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   || +* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   || |`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   || | `* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   || |  `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   || |   `- Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   || `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   ||  +- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   ||  `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||   `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   ||    `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||     `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   ||      +* Re: The Art of Unix Programming - Case Study: awkolivier gabathuler
   ||      |`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||      | `* Re: The Art of Unix Programming - Case Study: awkolivier gabathuler
   ||      |  `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||      |   +- Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||      |   `* Re: The Art of Unix Programming - Case Study: awkolivier gabathuler
   ||      |    `- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   ||      `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||       `- Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   |`- Re: The Art of Unix Programming - Case Study: awkAndreas Eder
   +* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |`- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   `- Re: The Art of Unix Programming - Case Study: awkKpop 2GM

Pages:123
Re: The Art of Unix Programming - Case Study: awk

<20220209141205.801@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1056&group=comp.lang.awk#1056

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Wed, 9 Feb 2022 22:22:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <20220209141205.801@kylheku.com>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk>
Injection-Date: Wed, 9 Feb 2022 22:22:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6c15947a1f4d9f5630cd9b2dccfd20d6";
logging-data="1201"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/GQfkjw4LMyGmbrrHt1ITTWaK3gnp+evQ="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:7O2vvJd1Upk3YYt9fAOlKg9Xplc=
 by: Kaz Kylheku - Wed, 9 Feb 2022 22:22 UTC

On 2022-02-09, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>
>> On 09.02.2022 08:49, Axel Reichert wrote:
>> [ about an ASCII data mangling Python script ]
>>> [....] I started immediately, cobbled something together (awk featured
>>> prominently among other usual suspects, such as tr, sed, cut, grep).
>>
>> Hmm.. - these four tools are amongst those where I usually say; instead
>> of connecting and running a lot of such processes use just one instance
>> of awk. The functions expressed in those tools are - modulo a few edge
>> cases - basics in Awk and part of its core.
>
> That sometimes works, but the trouble is that once you've used AWK's
> pattern/action once feature, you can't do so again -- you are stuck
> inside the action part. Just the other day I needed to split fields
> within a filed after finding the lines I wanted. This was, for me, an
> obvious case for two processes:
>
> awk -F: '/wanted/ { print $3 }' | awk -F, '...'

You can split $3 into fields by assigning its value to $0, after
tweaking FS for the inner field separator:

$ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
wanted two three,a,b,c <- input
three:a:b:c <- output

You have to save and restore FS to do this repeatedly for
different records of the outer file. Another approach is to
use the split function to populate an array, where the pattern
is an argument (only defaulting to FS if omitted).

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: The Art of Unix Programming - Case Study: awk

<su1mvc$9a9$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1057&group=comp.lang.awk#1057

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 10 Feb 2022 01:41:16 +0100
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <su1mvc$9a9$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 10 Feb 2022 00:41:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="00fdd5ef5df2acc3c0a17fa769bb621a";
logging-data="9545"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19aX+ZU02oatGBjPauO8bxi"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:ioRDUryJotRoE3T6EUw3IHN5+Oo=
In-Reply-To: <87leyjn42c.fsf@bsb.me.uk>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Thu, 10 Feb 2022 00:41 UTC

On 09.02.2022 22:05, Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>
>> On 09.02.2022 08:49, Axel Reichert wrote:
>> [ about an ASCII data mangling Python script ]
>>> [....] I started immediately, cobbled something together (awk featured
>>> prominently among other usual suspects, such as tr, sed, cut, grep).
>>
>> Hmm.. - these four tools are amongst those where I usually say; instead
>> of connecting and running a lot of such processes use just one instance
>> of awk. The functions expressed in those tools are - modulo a few edge
>> cases - basics in Awk and part of its core.
>
> That sometimes works,

My observation is that it usually works smoothly, and only sometimes
(the edge cases, I called them above) not obviously straightforward,
but usually just in a slightly different way. But it works generally.

> but the trouble is that once you've used AWK's
> pattern/action once feature, you can't do so again -- you are stuck
> inside the action part. Just the other day I needed to split fields
> within a filed after finding the lines I wanted.

You can always simply split() the fields, no need to invoke another
process just for another implicit loop that awk supports.

> This was, for me, an obvious case for two processes:
>
> awk -F: '/wanted/ { print $3 }' | awk -F, '...'

I understand the impulse to develop commands that way; that usually
leads to such horrible and inflexible cascades of the tools mentioned
above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).

And as soon as you need yet more information from the first instance
this approach needs more workarounds, e.g. passing state information
through the OS level.

Of course there's many ways to skin a cat. I just advocate to think
about one-process solutions before following the reflex to construct
inflexible pipeline constructs.

>
> but I could have used grep and cut in place of the first AWK. Maybe I'm
> just not good at remembering the details of all the key functions,

The nice thing about awk - actually already mentioned in context of
the features/complexity vs. power comments - is that you don't need
to memorize a lot;[*] I think awk is terse and compact enough. YMMV.

> but I find I use AWK in pipelines quite a lot.

That's how we learned it; pipelining through simple dedicated tools.
I also still do that. My observation is that whenever a more powerful
tool like awk gets into use, the more primitive tools in the pipeline
can be eliminated, the whole pipeline gets then refactored, typically
for efficiency, flexibility, robustness, and clarity in design.

I want to close my comment with another aspect; the primitive helper
tools are often restricted and incoherent.[*] In GNU context you have
additional options that I'm glad to be able to use, but if you want to
stay standard conforming the tools might not "suffice" or usage gets
more bulky. With awk the standard version supports already the powerful
core.

Janis

[*] If I'd have a remembering issue then it would be how options, e.g.
the delimiters, are (differently) defined in the various tools, since
options are incoherent and inconsistently named across the tools, and
such options have also different semantics. That results in a lot man
page lookups and more software maintenance issues.

Re: The Art of Unix Programming - Case Study: awk

<87pmnvleaz.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1058&group=comp.lang.awk#1058

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 10 Feb 2022 01:07:32 +0000
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <87pmnvleaz.fsf@bsb.me.uk>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="44ac79418358714662faf2c71e7e4cc5";
logging-data="21836"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19U9Ny1imxkz87CS//ror/M68lLf2zxW94="
Cancel-Lock: sha1:xVL14HcLVii3YoToZPg6pdB0zUE=
sha1:9QGbaYunEFdtXLJIcyowyo/WDoM=
X-BSB-Auth: 1.d078379270f3ceaa0eb3.20220210010732GMT.87pmnvleaz.fsf@bsb.me.uk
 by: Ben Bacarisse - Thu, 10 Feb 2022 01:07 UTC

Kaz Kylheku <480-992-1380@kylheku.com> writes:

> On 2022-02-09, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>
>>> On 09.02.2022 08:49, Axel Reichert wrote:
>>> [ about an ASCII data mangling Python script ]
>>>> [....] I started immediately, cobbled something together (awk featured
>>>> prominently among other usual suspects, such as tr, sed, cut, grep).
>>>
>>> Hmm.. - these four tools are amongst those where I usually say; instead
>>> of connecting and running a lot of such processes use just one instance
>>> of awk. The functions expressed in those tools are - modulo a few edge
>>> cases - basics in Awk and part of its core.
>>
>> That sometimes works, but the trouble is that once you've used AWK's
>> pattern/action once feature, you can't do so again -- you are stuck
>> inside the action part. Just the other day I needed to split fields
>> within a filed after finding the lines I wanted. This was, for me, an
>> obvious case for two processes:
>>
>> awk -F: '/wanted/ { print $3 }' | awk -F, '...'
>
> You can split $3 into fields by assigning its value to $0, after
> tweaking FS for the inner field separator:
>
> $ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
> wanted two three,a,b,c <- input
> three:a:b:c <- output

Sure, but you don't get to use pattern/action pairs on the result.

> You have to save and restore FS to do this repeatedly for
> different records of the outer file. Another approach is to
> use the split function to populate an array, where the pattern
> is an argument (only defaulting to FS if omitted).

I would much prefer to use split, but only if someone stopped me doing
it the natural way with a pipeline.

I suspected there would be a slew of replies about how to do it in one
command! However, I seriously doubt that there is any Unix programmer
or sysadmin who has not used AWK in a pipeline with comments that could,
relatively easily, be coded into the {} part of one or more actions. I
really don't think my point is very contentious.

--
Ben.

Re: The Art of Unix Programming - Case Study: awk

<87iltnlcxe.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1059&group=comp.lang.awk#1059

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 10 Feb 2022 01:37:17 +0000
Organization: A noiseless patient Spider
Lines: 112
Message-ID: <87iltnlcxe.fsf@bsb.me.uk>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="44ac79418358714662faf2c71e7e4cc5";
logging-data="4206"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19w5H/0DeGdhpx8eYg4wHwVsTnCKdTJDqg="
Cancel-Lock: sha1:c9fpVZ+woDDNhFetwHILFjNa99U=
sha1:7Atdr4JuiU0NXz1b9/na4A3W82k=
X-BSB-Auth: 1.0f6f14d7142841bbc9a3.20220210013717GMT.87iltnlcxe.fsf@bsb.me.uk
 by: Ben Bacarisse - Thu, 10 Feb 2022 01:37 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> On 09.02.2022 22:05, Ben Bacarisse wrote:
>> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>
>>> On 09.02.2022 08:49, Axel Reichert wrote:
>>> [ about an ASCII data mangling Python script ]
>>>> [....] I started immediately, cobbled something together (awk featured
>>>> prominently among other usual suspects, such as tr, sed, cut, grep).
>>>
>>> Hmm.. - these four tools are amongst those where I usually say; instead
>>> of connecting and running a lot of such processes use just one instance
>>> of awk. The functions expressed in those tools are - modulo a few edge
>>> cases - basics in Awk and part of its core.
>>
>> That sometimes works,
>
> My observation is that it usually works smoothly, and only sometimes
> (the edge cases, I called them above) not obviously straightforward,
> but usually just in a slightly different way. But it works generally.
>
>> but the trouble is that once you've used AWK's
>> pattern/action once feature, you can't do so again -- you are stuck
>> inside the action part. Just the other day I needed to split fields
>> within a filed after finding the lines I wanted.
>
> You can always simply split() the fields, no need to invoke another
> process just for another implicit loop that awk supports.

Yes, there's no need, but why worry about it? Maybe I am alone in
thinking processes are cheap.

But more to the point, a pipeline is an elegant, easily understood, and
often natural way to organise a task. I will keep using them, even if
there is no need.

>> This was, for me, an obvious case for two processes:
>>
>> awk -F: '/wanted/ { print $3 }' | awk -F, '...'
>
> I understand the impulse to develop commands that way; that usually
> leads to such horrible and inflexible cascades of the tools mentioned
> above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).
>
> And as soon as you need yet more information from the first instance
> this approach needs more workarounds, e.g. passing state information
> through the OS level.

A pipeline is not the right structure for such tasks, but there are a
huge number of tasks where combining Unix tools is the simplest
solution.

> Of course there's many ways to skin a cat. I just advocate to think
> about one-process solutions before following the reflex to construct
> inflexible pipeline constructs.
>
>>
>> but I could have used grep and cut in place of the first AWK. Maybe I'm
>> just not good at remembering the details of all the key functions,
>
> The nice thing about awk - actually already mentioned in context of
> the features/complexity vs. power comments - is that you don't need
> to memorize a lot;[*] I think awk is terse and compact enough. YMMV.

But since I use pipelines so much, I rarely use split, patsplit, gsub or
gensub. I find myself checking their arguments pretty much every time I
use them.

>> but I find I use AWK in pipelines quite a lot.
>
> That's how we learned it; pipelining through simple dedicated tools.
> I also still do that.

Why? Serious question. It sound like a dreadful risk based on your
comments above. Doing is "usually leads to such horrible and inflexible
cascades of the tools" when there is no need "to invoke another
process". What makes you sometimes take the risk of horrible cascades
and pay the price of another process?

I ask because it's possible we disagree only on how frequently it should
be done, and about exactly what circumstances warrant it.

> My observation is that whenever a more powerful
> tool like awk gets into use, the more primitive tools in the pipeline
> can be eliminated,

I think we all agree that it /can/ be done.

> the whole pipeline gets then refactored, typically for efficiency,
> flexibility, robustness, and clarity in design.

That's where I disagree. I often choose a pipeline because it is the
most robust, flexible and clear design. (I rarely care about efficiency
when doing this sort of thing.)

I do it in other contexts too. In Haskell, because of it's lazy
evaluation, you can chain function calls that filter and process lists,
even potentially infinite ones. It often results in clear, easy to
modify code.

> I want to close my comment with another aspect; the primitive helper
> tools are often restricted and incoherent.[*] In GNU context you have
> additional options that I'm glad to be able to use, but if you want to
> stay standard conforming the tools might not "suffice" or usage gets
> more bulky. With awk the standard version supports already the powerful
> core.

I agree. That's a shame, but an inevitable cost of piecemeal historical
development.

--
Ben.

Re: The Art of Unix Programming - Case Study: awk

<20220209235134.861@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1060&group=comp.lang.awk#1060

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 10 Feb 2022 07:59:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <20220209235134.861@kylheku.com>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk>
Injection-Date: Thu, 10 Feb 2022 07:59:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e7e23f7bbf81f76a5caf0d98e8b667d5";
logging-data="14255"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/oppnIkRWk83nN+GYJgunjED7v6s8qFHY="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:1MbpJji38u1epaZ96jCcAX1ov+4=
 by: Kaz Kylheku - Thu, 10 Feb 2022 07:59 UTC

On 2022-02-10, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> Kaz Kylheku <480-992-1380@kylheku.com> writes:
>
>> On 2022-02-09, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>>> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>>
>>>> On 09.02.2022 08:49, Axel Reichert wrote:
>>>> [ about an ASCII data mangling Python script ]
>>>>> [....] I started immediately, cobbled something together (awk featured
>>>>> prominently among other usual suspects, such as tr, sed, cut, grep).
>>>>
>>>> Hmm.. - these four tools are amongst those where I usually say; instead
>>>> of connecting and running a lot of such processes use just one instance
>>>> of awk. The functions expressed in those tools are - modulo a few edge
>>>> cases - basics in Awk and part of its core.
>>>
>>> That sometimes works, but the trouble is that once you've used AWK's
>>> pattern/action once feature, you can't do so again -- you are stuck
>>> inside the action part. Just the other day I needed to split fields
>>> within a filed after finding the lines I wanted. This was, for me, an
>>> obvious case for two processes:
>>>
>>> awk -F: '/wanted/ { print $3 }' | awk -F, '...'
>>
>> You can split $3 into fields by assigning its value to $0, after
>> tweaking FS for the inner field separator:
>>
>> $ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
>> wanted two three,a,b,c <- input
>> three:a:b:c <- output
>
> Sure, but you don't get to use pattern/action pairs on the result.

But that's largely just syntactic sugar for a glorified case statement.

Instead of

/abc/ { ... }
$2 > $3 { ... }

you have to write

if (/abc/) { ... }
if ($2 > $3) { ... }

kind of thing.

Re: The Art of Unix Programming - Case Study: awk

<87k0e2hbjf.fsf@axel-reichert.de>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1061&group=comp.lang.awk#1061

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mail@axel-reichert.de (Axel Reichert)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 10 Feb 2022 18:33:08 +0100
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <87k0e2hbjf.fsf@axel-reichert.de>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="a906fe575102ad7c8548ce19327e4224";
logging-data="4629"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Sm2VV3PKyWjsBKIh93ZZjl4BHcGpzRkE="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:9Gu/I1GTJur4uTal/E22fg+8PCg=
sha1:kHEq9Cksh3PGjRedWKZCBSgeKO4=
 by: Axel Reichert - Thu, 10 Feb 2022 17:33 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> I understand the impulse to develop commands that way; that usually
> leads to such horrible and inflexible cascades of the tools mentioned
> above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).
>
> And as soon as you need yet more information from the first instance
> this approach needs more workarounds, e.g. passing state information
> through the OS level.
>
> Of course there's many ways to skin a cat. I just advocate to think
> about one-process solutions before following the reflex to construct
> inflexible pipeline constructs.

It seems that like Ben I am a pipeliner, the igniting spark probably
"Opening the software toolbox":

https://www.gnu.org/software/coreutils/manual/html_node/Opening-the-software-toolbox.html

I know that a lot can be done within awk, but it often does not seem to
meet my way of thinking. For example, I might start with a grep. To my
surprise it finds many matches, so further processing is called for, say
awk '{print $3}' or similar. At that point, I will NOT replace the grep
with awk '/.../', because it is easier to just add another pipeline
after fetching the command from history using the up arrow. And so on,
adding pipeline after pipeline (which I also can easily relate to
functional programming). Once the whole dataflow is ready, I will
usually not "refactor" the beast, only in glaringly obvious
cases/optimizations. I might even have started with a (in hindsight)
Useless Use Of Cat. On the more ambitious side, I well remember how
proud I was when plumbing several xargs into a pipeline:

foo | bar | xargs -i baz {} 333 | quux | xargs fubar

By now this is a common idiom for me on the command line.

But full ACK on passing information from the first instance downstream,
at which point I tend to start using Python. But up to then pipelining
"just flows". That's what they were designed for. (-:

Axel

P. S.: I will keep your advice in memory, though, to avoid my worst
excesses. Point taken.

Re: The Art of Unix Programming - Case Study: awk

<496994e3-d2bd-40bb-aac2-e7df1f7c14fdn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1062&group=comp.lang.awk#1062

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:a47:: with SMTP id j7mr5562132qka.146.1644555247544;
Thu, 10 Feb 2022 20:54:07 -0800 (PST)
X-Received: by 2002:a25:d246:: with SMTP id j67mr10626079ybg.641.1644555247370;
Thu, 10 Feb 2022 20:54:07 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Thu, 10 Feb 2022 20:54:07 -0800 (PST)
In-Reply-To: <20220209235134.861@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:7564:4cef:c36b:d23d;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:7564:4cef:c36b:d23d
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <496994e3-d2bd-40bb-aac2-e7df1f7c14fdn@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Fri, 11 Feb 2022 04:54:07 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 6
 by: Kpop 2GM - Fri, 11 Feb 2022 04:54 UTC

one-liner solution to that wanted-three question :

echo 'wanted two three,a,b,c' \
\
| [mg]awk '/^wanted/ && gsub(",", substr(":", ($0=$3)~"", 1)) + 1'

three:a:b:c

Re: The Art of Unix Programming - Case Study: awk

<ed05edaf-fd58-410a-a4ad-a2fbeb9469aen@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1063&group=comp.lang.awk#1063

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:430d:: with SMTP id u13mr5511763qko.286.1644556952937;
Thu, 10 Feb 2022 21:22:32 -0800 (PST)
X-Received: by 2002:a25:e6d4:: with SMTP id d203mr9903233ybh.626.1644556952704;
Thu, 10 Feb 2022 21:22:32 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Thu, 10 Feb 2022 21:22:32 -0800 (PST)
In-Reply-To: <20220209235134.861@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:7564:4cef:c36b:d23d;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:7564:4cef:c36b:d23d
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ed05edaf-fd58-410a-a4ad-a2fbeb9469aen@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Fri, 11 Feb 2022 05:22:32 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 60
 by: Kpop 2GM - Fri, 11 Feb 2022 05:22 UTC

On Thursday, February 10, 2022 at 2:59:45 AM UTC-5, Kaz Kylheku wrote:
> On 2022-02-10, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
> > Kaz Kylheku <480-99...@kylheku.com> writes:
> >
> >> On 2022-02-09, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
> >>> Janis Papanagnou <janis_pa...@hotmail.com> writes:
> >>>
> >>>> On 09.02.2022 08:49, Axel Reichert wrote:
> >>>> [ about an ASCII data mangling Python script ]
> >>>>> [....] I started immediately, cobbled something together (awk featured
> >>>>> prominently among other usual suspects, such as tr, sed, cut, grep).
> >>>>
> >>>> Hmm.. - these four tools are amongst those where I usually say; instead
> >>>> of connecting and running a lot of such processes use just one instance
> >>>> of awk. The functions expressed in those tools are - modulo a few edge
> >>>> cases - basics in Awk and part of its core.
> >>>
> >>> That sometimes works, but the trouble is that once you've used AWK's
> >>> pattern/action once feature, you can't do so again -- you are stuck
> >>> inside the action part. Just the other day I needed to split fields
> >>> within a filed after finding the lines I wanted. This was, for me, an
> >>> obvious case for two processes:
> >>>
> >>> awk -F: '/wanted/ { print $3 }' | awk -F, '...'
> >>
> >> You can split $3 into fields by assigning its value to $0, after
> >> tweaking FS for the inner field separator:
> >>
> >> $ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
> >> wanted two three,a,b,c <- input
> >> three:a:b:c <- output
> >
> > Sure, but you don't get to use pattern/action pairs on the result.
> But that's largely just syntactic sugar for a glorified case statement.
>
> Instead of
>
> /abc/ { ... }
> $2 > $3 { ... }
>
> you have to write
>
> if (/abc/) { ... }
> if ($2 > $3) { ... }
>
> kind of thing.

two different one-liner solutions i managed to conjure up, neither of which requires dealing with patterns, or arrays, or patsplit, but both involve assigning back to $0

command 1 is

[ echo "wanted two three,a,b,c" | mawk2 '/wanted/ * gsub(",", substr(":",$_!=($_=$NF),_~_))' ]

three:a:b:c

command 2 is

[ echo "wanted two three,a,b,c" | mawk2 -F, '/wanted/ && ($!_=substr($!_,match($!_,/[^ \t]+$/) ) )' OFS=":" ]

three:a:b:c

Re: The Art of Unix Programming - Case Study: awk

<20220210233843.626@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1064&group=comp.lang.awk#1064

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 11 Feb 2022 07:43:18 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <20220210233843.626@kylheku.com>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
<496994e3-d2bd-40bb-aac2-e7df1f7c14fdn@googlegroups.com>
Injection-Date: Fri, 11 Feb 2022 07:43:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f6a9fa82d00c2971ba04715d2dab2079";
logging-data="23024"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/BmDTq3yXsTDzjfQgnSS1MkdvuUfD4G8U="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:4BcjSSrTX+bJ92jUXLBTD8OzMos=
 by: Kaz Kylheku - Fri, 11 Feb 2022 07:43 UTC

On 2022-02-11, Kpop 2GM <jason.cy.kwan@gmail.com> wrote:
> one-liner solution to that wanted-three question :
>
> echo 'wanted two three,a,b,c' \
> \
> | [mg]awk '/^wanted/ && gsub(",", substr(":", ($0=$3)~"", 1)) + 1'
>
> three:a:b:c

Are you positively sure that you're taking my example literally enough?

Try this:

sed -e 's/wanted two //' -e 's/,/:/g'

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: The Art of Unix Programming - Case Study: awk

<87wni14xlb.fsf@axel-reichert.de>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1065&group=comp.lang.awk#1065

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mail@axel-reichert.de (Axel Reichert)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 11 Feb 2022 09:27:28 +0100
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <87wni14xlb.fsf@axel-reichert.de>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
<ed05edaf-fd58-410a-a4ad-a2fbeb9469aen@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="2884522db66cfbd4c3b0c178654e8fdb";
logging-data="7232"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mRNb5JUeq/0Pk+5bG5z7COftNFRPmR1M="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:ARxFhfkUD876tApTkPjedbxo2Dw=
sha1:PMhczkBJPyoX0gGI6ojUzYMuct4=
 by: Axel Reichert - Fri, 11 Feb 2022 08:27 UTC

Kpop 2GM <jason.cy.kwan@gmail.com> writes:

> command 1 is
>
> [ echo "wanted two three,a,b,c" | mawk2 '/wanted/ * gsub(",", substr(":",$_!=($_=$NF),_~_))' ]
>
> three:a:b:c
>
> command 2 is
>
> [ echo "wanted two three,a,b,c" | mawk2 -F, '/wanted/ && ($!_=substr($!_,match($!_,/[^ \t]+$/) ) )' OFS=":" ]
>
> three:a:b:c

And both seem to me horrendously unelegant compared to

echo "wanted two three,a,b,c" | awk '{print $3}' | tr ',' ':'

But maybe I missed some detail from the original task and '/wanted/' has
to be added as awk pattern.

Best regards

Axel

Re: The Art of Unix Programming - Case Study: awk

<2eeee91f-1e86-446a-825c-0a49d22b71b4n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1066&group=comp.lang.awk#1066

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:ac8:7f06:: with SMTP id f6mr338838qtk.625.1644568536885;
Fri, 11 Feb 2022 00:35:36 -0800 (PST)
X-Received: by 2002:a25:d246:: with SMTP id j67mr400347ybg.641.1644568536735;
Fri, 11 Feb 2022 00:35:36 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Fri, 11 Feb 2022 00:35:36 -0800 (PST)
In-Reply-To: <20220210233843.626@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:7564:4cef:c36b:d23d;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:7564:4cef:c36b:d23d
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
<496994e3-d2bd-40bb-aac2-e7df1f7c14fdn@googlegroups.com> <20220210233843.626@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2eeee91f-1e86-446a-825c-0a49d22b71b4n@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Fri, 11 Feb 2022 08:35:36 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25
 by: Kpop 2GM - Fri, 11 Feb 2022 08:35 UTC

On Friday, February 11, 2022 at 2:43:21 AM UTC-5, Kaz Kylheku wrote:
> On 2022-02-11, Kpop 2GM <jason....@gmail.com> wrote:
> > one-liner solution to that wanted-three question :
> >
> > echo 'wanted two three,a,b,c' \
> > \
> > | [mg]awk '/^wanted/ && gsub(",", substr(":", ($0=$3)~"", 1)) + 1'
> >
> > three:a:b:c
>
> Are you positively sure that you're taking my example literally enough?
>
> Try this:
>
> sed -e 's/wanted two //' -e 's/,/:/g'

i think yours doesn't enforce the filter criteria but simply cleaned it up, which could be circumvented as such :

echo $'wanted two threeA,a,b,c\nhi wanted two threeB,a,b,c' \
\ [mg]awk 'sub("^wanted.+ ","")*gsub(",",":")'
threeA:a:b:c

echo $'wanted two threeA,a,b,c\nhi wanted two threeB,a,b,c' | sed -e 's/wanted two //' -e 's/,/:/g'
threeA:a:b:c
hi threeB:a:b:c

Re: The Art of Unix Programming - Case Study: awk

<fa5929ef-ddf3-4f61-84d0-3add0a96bd84n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1067&group=comp.lang.awk#1067

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a37:9684:: with SMTP id y126mr192685qkd.766.1644568770985;
Fri, 11 Feb 2022 00:39:30 -0800 (PST)
X-Received: by 2002:a25:8702:: with SMTP id a2mr400421ybl.58.1644568770785;
Fri, 11 Feb 2022 00:39:30 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Fri, 11 Feb 2022 00:39:30 -0800 (PST)
In-Reply-To: <87wni14xlb.fsf@axel-reichert.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:7564:4cef:c36b:d23d;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:7564:4cef:c36b:d23d
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
<ed05edaf-fd58-410a-a4ad-a2fbeb9469aen@googlegroups.com> <87wni14xlb.fsf@axel-reichert.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fa5929ef-ddf3-4f61-84d0-3add0a96bd84n@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Fri, 11 Feb 2022 08:39:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 5
 by: Kpop 2GM - Fri, 11 Feb 2022 08:39 UTC

>> echo "wanted two three,a,b,c" | awk '{print $3}' | tr ',' ':'

if elegance is a concern then no pattern and no print statement is more elegant, at least to me

echo "wanted two three,a,b,c" | gawk '($_=$NF)~_' | tr ',' ':'
three:a:b:c

Re: The Art of Unix Programming - Case Study: awk

<0d611d8b-8001-42b0-8d70-5f915e182c05n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1068&group=comp.lang.awk#1068

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:1924:: with SMTP id bj36mr232153qkb.526.1644570332422;
Fri, 11 Feb 2022 01:05:32 -0800 (PST)
X-Received: by 2002:a81:2d03:: with SMTP id t3mr691838ywt.215.1644570332232;
Fri, 11 Feb 2022 01:05:32 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Fri, 11 Feb 2022 01:05:32 -0800 (PST)
In-Reply-To: <87k0e2hbjf.fsf@axel-reichert.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:7564:4cef:c36b:d23d;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:7564:4cef:c36b:d23d
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0d611d8b-8001-42b0-8d70-5f915e182c05n@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Fri, 11 Feb 2022 09:05:32 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 48
 by: Kpop 2GM - Fri, 11 Feb 2022 09:05 UTC

On Thursday, February 10, 2022 at 12:33:11 PM UTC-5, Axel Reichert wrote:
> Janis Papanagnou <janis_pa...@hotmail.com> writes:
>
> > I understand the impulse to develop commands that way; that usually
> > leads to such horrible and inflexible cascades of the tools mentioned
> > above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).
> >
> > And as soon as you need yet more information from the first instance
> > this approach needs more workarounds, e.g. passing state information
> > through the OS level.
> >
> > Of course there's many ways to skin a cat. I just advocate to think
> > about one-process solutions before following the reflex to construct
> > inflexible pipeline constructs.
> It seems that like Ben I am a pipeliner, the igniting spark probably
> "Opening the software toolbox":
>
> https://www.gnu.org/software/coreutils/manual/html_node/Opening-the-software-toolbox.html
>
> I know that a lot can be done within awk, but it often does not seem to
> meet my way of thinking. For example, I might start with a grep. To my
> surprise it finds many matches, so further processing is called for, say
> awk '{print $3}' or similar. At that point, I will NOT replace the grep
> with awk '/.../', because it is easier to just add another pipeline
> after fetching the command from history using the up arrow. And so on,
> adding pipeline after pipeline (which I also can easily relate to
> functional programming). Once the whole dataflow is ready, I will
> usually not "refactor" the beast, only in glaringly obvious
> cases/optimizations. I might even have started with a (in hindsight)
> Useless Use Of Cat. On the more ambitious side, I well remember how
> proud I was when plumbing several xargs into a pipeline:
>
> foo | bar | xargs -i baz {} 333 | quux | xargs fubar
>
> By now this is a common idiom for me on the command line.
>

speaking of specialized tools for piping at the command line, how many open-source utilities you're aware of that could decode unsigned hex to arbitrary precision piping in from /dev/stdin with nothing more than :

gawk -nM '$!_=+$_'

or this variant, pre-negate it for you before outputting : gawk -nM '$!_=-$_'
or this variant, pre-double it for you before outputting : gawk -nM '$!_-=-$_'

or this best one yet :

gawk -nM '$(($!_-=-$_)~_)++'

decoding unsigned hex to arbitrary precision, and returning 2 n + 1 of that input. maybe it's even cleaner in perl i dunno

Syntactic Sugar (Was: The Art of Unix Programming - Case Study: awk)

<su5q43$b08p$1@news.xmission.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1069&group=comp.lang.awk#1069

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gazelle@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.lang.awk
Subject: Syntactic Sugar (Was: The Art of Unix Programming - Case Study: awk)
Date: Fri, 11 Feb 2022 13:59:31 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <su5q43$b08p$1@news.xmission.com>
References: <st6udg$k03$1@dont-email.me> <20220209141205.801@kylheku.com> <87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
Injection-Date: Fri, 11 Feb 2022 13:59:31 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="360729"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Fri, 11 Feb 2022 13:59 UTC

In article <20220209235134.861@kylheku.com>,
Kaz Kylheku <480-992-1380@kylheku.com> wrote:
....
>> Sure, but you don't get to use pattern/action pairs on the result.
>
>But that's largely just syntactic sugar for a glorified case statement.
>
>Instead of
>
> /abc/ { ... }
> $2 > $3 { ... }
>
>you have to write
>
> if (/abc/) { ... }
> if ($2 > $3) { ... }
>
>kind of thing.

Of course, it can be (and often has been) argued that everything in any
programming language is just "syntactic sugar" for the underling machine
code. Everything is really just for programmer convenience.

Personally, I tend to agree with Ben here, that not being able to use the
"automatic input loop" (aka, the "pattern/action" facility for which AWK is
justly famous) on non-traditional input is a Bad Thing. It would be nice
if this were not the case.

In fact, since I like to write my AWK programs as "true AWK programs"
rather than as shell scripts - i.e., my scripts start with:

#!/path/to/gawk -f

rather than:

#!/bin/bash
...
something | gawk ...
...

And I want to be able to use the pattern/action facility on piped-in input
(and also output - see below), I have written an extension library for GAWK
that enables that. That is, I can do:

@load "pipeline"
BEGIN { pipeline("in","Some Shell Command Here") }
/foo/ { bar...}

and it reads from the pipeline as if it had been piped-in from the shell.

I also find it annoying to have to put "| cmd" at the end of every command
that generates output - when I want the output to go to a pipe (e.g.,
"less"). So, I can do:

@load "pipeline"
BEGIN { pipeline("out","less") }
{ print }

With the expected result.

--
"He is exactly as they taught in KGB school: an egoist, a liar, but talented - he
knows the mind of the wrestling-loving, under-educated, authoritarian-admiring
white male populous."
- Malcolm Nance, p59. -

Re: Syntactic Sugar (Was: The Art of Unix Programming - Case Study: awk)

<20220211083704.537@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1070&group=comp.lang.awk#1070

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: Syntactic Sugar (Was: The Art of Unix Programming - Case Study:
awk)
Date: Fri, 11 Feb 2022 17:38:47 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 175
Message-ID: <20220211083704.537@kylheku.com>
References: <st6udg$k03$1@dont-email.me> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
<su5q43$b08p$1@news.xmission.com>
Injection-Date: Fri, 11 Feb 2022 17:38:47 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f6a9fa82d00c2971ba04715d2dab2079";
logging-data="10299"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19kUp9iI2IfE+d5FFmsFPcg+W3XN436to0="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:2AdG7YlEs8g6elZv8IlBLEsmIOc=
 by: Kaz Kylheku - Fri, 11 Feb 2022 17:38 UTC

On 2022-02-11, Kenny McCormack <gazelle@shell.xmission.com> wrote:
> In article <20220209235134.861@kylheku.com>,
> Kaz Kylheku <480-992-1380@kylheku.com> wrote:
> ...
>>> Sure, but you don't get to use pattern/action pairs on the result.
>>
>>But that's largely just syntactic sugar for a glorified case statement.
>>
>>Instead of
>>
>> /abc/ { ... }
>> $2 > $3 { ... }
>>
>>you have to write
>>
>> if (/abc/) { ... }
>> if ($2 > $3) { ... }
>>
>>kind of thing.
>
> Of course, it can be (and often has been) argued that everything in any
> programming language is just "syntactic sugar"

Not by me.

> for the underling machine
> code.

When I say "syntactic sugar", I'm referring to a light transformation to
improve the taste. This is justified for the above example because the
"machine code" differs from the "HLL" counterpart only in that
"if (...)" has been wrapped around the test expressions.

I have a lot of experience implementing complicated language features
as code transformations, which I wouldn't call syntactic sugar.

> Personally, I tend to agree with Ben here, that not being able to use the
> "automatic input loop" (aka, the "pattern/action" facility for which AWK is

Ben didn't need a loop in the specific situation because he just wanted
to take a field and further split it as if it were a record.

You'd benefit from a loop if you wanted to take a string and treat it
as an entire file, separated into records and then fields.

I agree that having to implement that loop around your conditional
statements would start to get more than a little inconvenient.

I made a Lisp version of Awk as a macro. You can use it anywhere you can
use an expression, and cleanly nest it with itself.

E.g. we can use it to make a function which returns a list of objects:

This is the TXR Lisp interactive listener of TXR 273.
Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
This area is under 24 hour TTY surveillance.
1> (defun wrapped-awk (strings)
(build ;; establish sccope for procedural list construction
;; built-up list is returned when scope terminates
(awk (:inputs strings)
(:set fs ":") ;; field separator is colon (:)
(t (fconv - i i)) ;; convert second and third fields to integer
(#/plus/ (add (+ [f 1] [f 2])))
(#/minus/ (add (- [f 1] [f 2]))))))
wrapped-awk

Now we just pass a list of strings into this function: they become
records:

2> (wrapped-awk '("plus:1:2" "plus:3:4" "minus:5:5"))
(3 7 0)

The list of sums and differences is returned.

Now, let's use the wrapped-awk function inside another awk loop;
this time we will feed it input from the TTY interactively:

3> (awk (t (prn (wrapped-awk f))))
plus:1:2 plus:3:4 minus:5:5
3 7 0
minus:15:20
-5

"If the 't' condition is true, which is always, then print the list
obtained by passing the delimited fields f into wrapped-awk."

> And I want to be able to use the pattern/action facility on piped-in input
> (and also output - see below), I have written an extension library for GAWK
> that enables that. That is, I can do:

> @load "pipeline"
> BEGIN { pipeline("in","Some Shell Command Here") }
> /foo/ { bar...}

Let's pipe the above into a character string.

4> (with-out-string-stream (*stdout*)
(awk (t (prn (wrapped-awk f)))))
plus:10:20 minus:13:7
[Ctrl-D][Enter]
"30 6\n"

with-out-string-stream creates a scope in which it binds a variable
to a string stream. Everything sent to that stream is appended to
a string, and that string is returned when with-out-string-stream
terminates.

If we choose *stdout* for the variable name, then standard output
is temporary bound to this string stream. Everything that would go
to standard output goes into the string stream.

Already in the 1980s, people were able to use FFI to define bindings
to foreign libraries, and not have to write any glue code that had to be
compiled.

Here is my "extension lib" for calling size_t wcslen(const wchar_t *str):

5> (with-dyn-lib nil
(deffi wcslen "wcslen" size-t (wstr)))
wcslen
6> (wcslen "abcd")
4

How about the structure-returning function

ldiv_t lldiv(long numerator, long long denominator);

Define the foreign structure:

6> (deffi-struct lldiv
(quot longlong)
(rem longlong))
#<ffi-type (struct lldiv (quot longlong) (rem longlong))>

Define the foreign function binding:

7> (with-dyn-lib nil
(deffi lldiv "lldiv" lldiv (longlong longlong)))
lldiv

Test:

8> (lldiv 127 15)
#S(lldiv quot 8 rem 7)

This shit is unsafe, since we are calling C:

9> (lldiv 127 0)
Floating point exception (core dumped)

We could catch that signal. Let's try it again:

1> (deffi-struct lldiv (quot longlong) (rem longlong))
#<ffi-type (struct lldiv (quot longlong) (rem longlong))>
2> (with-dyn-lib nil
(deffi lldiv "lldiv" lldiv (longlong longlong)))
lldiv

Now install signal handler for SIGFPE, binding it to a
lambda function which throws the yikes symbol as an exception:

3> (set-sig-handler sig-fpe (lambda (signal async-p) (throw 'yikes)))
t

Now:

4> (lldiv 127 15)
#S(lldiv quot 8 rem 7)
5> (lldiv 127 0)
** yikes exception, args: nil
6>

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: The Art of Unix Programming - Case Study: awk

<20220211093859.328@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1071&group=comp.lang.awk#1071

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 11 Feb 2022 17:40:13 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <20220211093859.328@kylheku.com>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
<ed05edaf-fd58-410a-a4ad-a2fbeb9469aen@googlegroups.com>
<87wni14xlb.fsf@axel-reichert.de>
Injection-Date: Fri, 11 Feb 2022 17:40:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f6a9fa82d00c2971ba04715d2dab2079";
logging-data="10299"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+GIE7De54D993wXu+1hIh+VJPvt/MxYpU="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:o027k05lvQr6Cjfdkyau19sl1aI=
 by: Kaz Kylheku - Fri, 11 Feb 2022 17:40 UTC

On 2022-02-11, Axel Reichert <mail@axel-reichert.de> wrote:
> Kpop 2GM <jason.cy.kwan@gmail.com> writes:
>
>> command 1 is
>>
>> [ echo "wanted two three,a,b,c" | mawk2 '/wanted/ * gsub(",", substr(":",$_!=($_=$NF),_~_))' ]
>>
>> three:a:b:c
>>
>> command 2 is
>>
>> [ echo "wanted two three,a,b,c" | mawk2 -F, '/wanted/ && ($!_=substr($!_,match($!_,/[^ \t]+$/) ) )' OFS=":" ]
>>
>> three:a:b:c
>
> And both seem to me horrendously unelegant compared to

How nice of you to provide company; now Kpop 2GM doesn't have to feel
he's the only one missing the point.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: Syntactic Sugar

<87y22hjib8.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1072&group=comp.lang.awk#1072

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: Syntactic Sugar
Date: Fri, 11 Feb 2022 19:48:27 +0000
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <87y22hjib8.fsf@bsb.me.uk>
References: <st6udg$k03$1@dont-email.me> <20220209141205.801@kylheku.com>
<87pmnvleaz.fsf@bsb.me.uk> <20220209235134.861@kylheku.com>
<su5q43$b08p$1@news.xmission.com> <20220211083704.537@kylheku.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="7055ed469b1bd8f08fa22e9c44bf45b2";
logging-data="853"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+COJSUBiObr3tPex8I63fkKXTJZEtWaJk="
Cancel-Lock: sha1:Kn+C2O2r4vubSkrpq9fc0g1gpSY=
sha1:vG42vluO96z6lZTqqk1yZKKVOnk=
X-BSB-Auth: 1.223b5b46f8010b1df263.20220211194827GMT.87y22hjib8.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 11 Feb 2022 19:48 UTC

Kaz Kylheku <480-992-1380@kylheku.com> writes:

> Ben didn't need a loop in the specific situation because he just wanted
> to take a field and further split it as if it were a record.

I see this sketch of mine now has a life of it's own. I wrote '...' as
the second AWK source, so how anyone can know that I did not need a loop
I can't imagine.

The first AWK picked out lines of interest (they matched dates and AWK
would allow me to use date functions in the picking out though I had, on
this occasion, no need for that). The second AWK matched logged 'add'
and 'remove' actions that the second command counted -- anything added
more than it was removed remained for the END action to print.

Of course this could be one AWK command. What I don't get is why such a
simple use should raise so many people's hackles. It took about 30
seconds to write, and it did the job.

--
Ben.

Re: The Art of Unix Programming - Case Study: awk

<su6gs3$mru$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1073&group=comp.lang.awk#1073

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 11 Feb 2022 21:27:47 +0100
Organization: A noiseless patient Spider
Lines: 107
Message-ID: <su6gs3$mru$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 11 Feb 2022 20:27:47 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="77394b29beb741d32a360f78216ee379";
logging-data="23422"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/lk05zcTyuJ9qU7JfikXfM"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:BSF+JlNKUnqWhc16+163mxF/myk=
In-Reply-To: <87k0e2hbjf.fsf@axel-reichert.de>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 11 Feb 2022 20:27 UTC

It just occurred to me that we are discussing basically Unix/shell
issues at the moment, but since there's a strong relation to awk I
abstain from marking it [OT].

On 10.02.2022 18:33, Axel Reichert wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>
>> I understand the impulse to develop commands that way; that usually
>> leads to such horrible and inflexible cascades of the tools mentioned
>> above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).
>>
>> [...]
>
> It seems that like Ben I am a pipeliner, the igniting spark probably
> "Opening the software toolbox":
>
> https://www.gnu.org/software/coreutils/manual/html_node/Opening-the-software-toolbox.html
>
> I know that a lot can be done within awk, but it often does not seem to
> meet my way of thinking. For example, I might start with a grep. To my
> surprise it finds many matches, so further processing is called for, say
> awk '{print $3}' or similar. At that point, I will NOT replace the grep
> with awk '/.../', because it is easier to just add another pipeline
> after fetching the command from history using the up arrow. And so on,
> adding pipeline after pipeline (which I also can easily relate to
> functional programming). Once the whole dataflow is ready, I will
> usually not "refactor" the beast, only in glaringly obvious
> cases/optimizations. I might even have started with a (in hindsight)
> Useless Use Of Cat. On the more ambitious side, I well remember how
> proud I was when plumbing several xargs into a pipeline:
>
> foo | bar | xargs -i baz {} 333 | quux | xargs fubar
>
> By now this is a common idiom for me on the command line.
>
> But full ACK on passing information from the first instance downstream,
> at which point I tend to start using Python. But up to then pipelining
> "just flows". That's what they were designed for. (-:
>
> Axel
>
> P. S.: I will keep your advice in memory, though, to avoid my worst
> excesses. Point taken.
>

Don't get me wrong. Pipelines of tools are not "bad" and I also wrote:

"That's how we learned it; pipelining through simple dedicated tools.
I also still do that. [...]"

Some application cases or programming patterns are also often not that
easily implementable by other "monolithic" languages (including awk).
Your double xargs programming pattern is certainly rare, but I used it
also at times, and it didn't occur to me to try to re-implement it by
awk just for the sake of praising the language or something like that.
The same with two instances of awk, it may make sense, there's no dogma.

It's not a black-or-white issue. And personal preferences vary anyway.
But sometimes the way we're used to use a tool-chest is in our way of
"finding" (actually just writing down) better solutions.

What I was addressing is the use of programs with primitive functions
that awk is providing in a simple and consistent way inherently! My
impression is that Unix folks today learn command-line patterns about
the same way I did decades ago; starting from cat, cut, grep, sed (and
so on), often not even reaching awk - because it's more complex than
the simpler tools dedicated to a task. The joined dedicated Unix tools
are not simpler, though, rather the opposite. Recognizing that let me
change the way how I start my tasks; for simple searches it wouldn't
occur to me to use awk, but as soon as a search task gets slightly
more complex, say searching for lines with two matches, /A/&&/B/, I'd
use awk. Or if it is clear from the beginning that the task will be
harder and more clumsy to implement with pipes, e.g. extracting keys
from a file to match records in another file; then I don't even think
about how that (maybe) could be implemented by function compositions
with primitive Unix programs, I'd immediately take awk if appropriate.

There are immediate gains, but also gains not obvious in the first
moment. You may observe that, say, /A/&&/B/ would return too many
results; no problem to restrict the results set simply with further
qualifications like, say, $1~/A/ && $1~/B/ , adding a FS definition,
and whatnot. Of course grep A | grep B isn't complex (less terse,
okay, but still not complex), but you're often in a dead end if your
demands get extended (e.g. by matching fields instead of lines). For
a one-shot ad hoc task the greps are okay, and I see that I sometimes
start with a command, then add another piped command to the previous
one (from shell history), and then a third one. But as soon as this
goes into a shell script these commands are getting optimized. Lately
I don't even start with pipes if I think the command might get bulky.

Note that here I am not advocating for smart awk code patterns as we
occasionally suggest here for fun, to keep things minimalistic terse.
It's just the mundane inherent features I am talking about. Features
that are consistently available with awk. And that's different to the
Unix tools set where I have a hammer, a nail, and a screwdriver, and
think about how to combine them to change the light bulb. - Yes, I'm
exaggerating (at least a bit), but you get the point.

Folks learn by examples, and we see (in shell newsgroups or in books)
often bad solutions based on published code that follows the historic
approach, copy/pasted without thinking, and copied as code pattern in
their future projects, posts, lectures, and own books.

It got a bit pathetic.

Janis

Re: The Art of Unix Programming - Case Study: awk

<su6im0$36h$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1074&group=comp.lang.awk#1074

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 11 Feb 2022 21:58:40 +0100
Organization: A noiseless patient Spider
Lines: 104
Message-ID: <su6im0$36h$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87iltnlcxe.fsf@bsb.me.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 11 Feb 2022 20:58:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="77394b29beb741d32a360f78216ee379";
logging-data="3281"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2aYicmz22D9bltdahD3+Z"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:JuKRJJldih4RZZCOk9a4J0D4w6w=
In-Reply-To: <87iltnlcxe.fsf@bsb.me.uk>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 11 Feb 2022 20:58 UTC

On 10.02.2022 02:37, Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>
>> You can always simply split() the fields, no need to invoke another
>> process just for another implicit loop that awk supports.
>
> Yes, there's no need, but why worry about it? Maybe I am alone in
> thinking processes are cheap.

I just try to avoid unnecessary processes. A dozen is not an issue,
but once you've embedded them in shell loops it might get an issue.

>
> But more to the point, a pipeline is an elegant, easily understood, and
> often natural way to organise a task.

Agreed.

>> [...]
>
> A pipeline is not the right structure for such tasks, but there are a
> huge number of tasks where combining Unix tools is the simplest
> solution.

Agreed.

>>
>> The nice thing about awk - actually already mentioned in context of
>> the features/complexity vs. power comments - is that you don't need
>> to memorize a lot;[*] I think awk is terse and compact enough. YMMV.
>
> But since I use pipelines so much, I rarely use split, patsplit, gsub or
> gensub. I find myself checking their arguments pretty much every time I
> use them.

(Some of the mentioned functions are non-standard, GNU Awk'ish.)

Well, if you don't use them regularly you'll have to look up the docs.
Personally I think the [standard] functions are easy to remember, but
okay. Myself I can easy remember them if only by thinking about their
finally placed default arguments; split (what, where [,by-what])
omitting the "by-what" will use the standard FS, and "what", "where"
I think is the natural order I'd expect, and similarly with gsub;
gsub (what, by-what [, where]) omitting the where will operate on
the whole line, and "what", "by-what" I think are the natural order.

The non-standard GNU functions patsplit() and gensub() I also have to
look up, but I think just because I rarely have a need to use these
functions.

>>
>> That's how we learned it; pipelining through simple dedicated tools.
>> I also still do that.
>
> Why? Serious question. It sound like a dreadful risk based on your
> comments above. Doing is "usually leads to such horrible and inflexible
> cascades of the tools" when there is no need "to invoke another
> process". What makes you sometimes take the risk of horrible cascades
> and pay the price of another process?

I think this is answered in my previous post, my reply to Axel's post.

It's certainly not something I'd call a risk, because I can control it,
I can make the decisions, based on application case, requirements, and
expertise.

>
> I ask because it's possible we disagree only on how frequently it should
> be done, and about exactly what circumstances warrant it.

I think we should not take the theme too dogmatic or too strict. To
quote from my other post:

>> What I was addressing is the use of programs with primitive functions
>> that awk is providing in a simple and consistent way inherently!

>
>> the whole pipeline gets then refactored, typically for efficiency,
>> flexibility, robustness, and clarity in design.
>
> That's where I disagree. I often choose a pipeline because it is the
> most robust, flexible and clear design. (I rarely care about efficiency
> when doing this sort of thing.)

We do not disagree concerning the clearness of the pipe concept. It is
just very _primitive_ (an advantage, and a restriction WRT flexibility).

>
>> I want to close my comment with another aspect; the primitive helper
>> tools are often restricted and incoherent.[*] In GNU context you have
>> additional options that I'm glad to be able to use, but if you want to
>> stay standard conforming the tools might not "suffice" or usage gets
>> more bulky. With awk the standard version supports already the powerful
>> core.
>
> I agree. That's a shame, but an inevitable cost of piecemeal historical
> development.

My resume is that we "mostly"[*] agree. :-)

Janis

[*] Term borrowed from the HHGTTG.

Re: The Art of Unix Programming - Case Study: awk

<87r1893wng.fsf@axel-reichert.de>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1075&group=comp.lang.awk#1075

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mail@axel-reichert.de (Axel Reichert)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 11 Feb 2022 22:45:23 +0100
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <87r1893wng.fsf@axel-reichert.de>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="2884522db66cfbd4c3b0c178654e8fdb";
logging-data="22320"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+rpZrjE+PjA6L3qnQxunWFcoY3CiuCNYg="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:qV7KEHpH9S993RLg3qhjgHOm2eA=
sha1:e2TZS7DcBDON/TfkuxtnSgWw7z8=
 by: Axel Reichert - Fri, 11 Feb 2022 21:45 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> On 10.02.2022 18:33, Axel Reichert wrote:
>>
>> P. S.: I will keep your advice in memory, though, to avoid my worst
>> excesses. Point taken.
>>
>
> Don't get me wrong. Pipelines of tools are not "bad" and I also wrote:
>
> "That's how we learned it; pipelining through simple dedicated tools.
> I also still do that. [...]"

Yes, sure, I noticed that. I do think that it's mostly about stylistic
matters at this point of the discussion. I like to compare
shell/Unix/awk/CLI issues with a language: A former boss was impressed
by what was feasible with these strange "words", which to him sounded
like Greek. He wanted me to save these utterances for others to benefit
from my "words of wisdom". I argued that they were not a "quote" to be
put into some anthology, but a spontaneous sentence formed during "live
talk". The point for me was not "memorizing" them (shell script), but
being able to speak.

Of course this analogy is valid only for ad hoc stuff, but that is how I
use them in almost all cases: These are throw-away command lines and
only very rarely I see the potential for them to be re-used. It is for
those instances that I will try to remember your words of
wisdom/warning. (-:

> Your double xargs programming pattern is certainly rare

Not for me, my guess is that I use it daily.

> as soon as a search task gets slightly more complex, say searching for
> lines with two matches, /A/&&/B/, I'd use awk.

grep A ... | grep B

or, often

grep -E '(A|B)' ...

Again, due to the ad-hoc nature of my shell usage, I will often not
notice that two matches are needed, only after the first grep command
has been executed: "Ah, I will need another match!"

> more clumsy to implement with pipes, e.g. extracting keys from a file
> to match records in another file; then I don't even think about how
> that (maybe) could be implemented by function compositions with
> primitive Unix programs

But you do know "join"? An often overlooked gem.

> a one-shot ad hoc task the greps are okay, and I see that I sometimes
> start with a command, then add another piped command to the previous
> one (from shell history), and then a third one. But as soon as this
> goes into a shell script these commands are getting optimized.

Yes. And it is at that point that I should try to reduce the number of
tools used. In fact I am a big fan of universal weapons.

Best regards

Axel

Re: The Art of Unix Programming - Case Study: awk

<87sfspj7xf.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1076&group=comp.lang.awk#1076

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 11 Feb 2022 23:32:44 +0000
Organization: A noiseless patient Spider
Lines: 105
Message-ID: <87sfspj7xf.fsf@bsb.me.uk>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87iltnlcxe.fsf@bsb.me.uk> <su6im0$36h$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="050a2297e0b7b44d504545e0d7bd6d04";
logging-data="996"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX194Rp+3afmfYgQ6x32Xw4IDrbxiF+V4L+s="
Cancel-Lock: sha1:vS8aMrQIuvPFnWWBx0JsgLzxj9s=
sha1:fmq+tjecD9Vij9u9UcAod0u6GVE=
X-BSB-Auth: 1.6e06b15c5b5a5a17822b.20220211233244GMT.87sfspj7xf.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 11 Feb 2022 23:32 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> On 10.02.2022 02:37, Ben Bacarisse wrote:
>> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>
>>> You can always simply split() the fields, no need to invoke another
>>> process just for another implicit loop that awk supports.
>>
>> Yes, there's no need, but why worry about it? Maybe I am alone in
>> thinking processes are cheap.
>
> I just try to avoid unnecessary processes. A dozen is not an issue,
> but once you've embedded them in shell loops it might get an issue.

That does not really advance the discussion because almost no processes
(other than one to preform the task) are actually necessary. It's clear
that you attach a different weighting to the various considerations, but
I don't really know more than that. For my part, I can't remember the
last time I even thought about how many processes were involved in a
pipeline.

>> But more to the point, a pipeline is an elegant, easily understood, and
>> often natural way to organise a task.
>
> Agreed.
>
>>> [...]
>>
>> A pipeline is not the right structure for such tasks, but there are a
>> huge number of tasks where combining Unix tools is the simplest
>> solution.
>
> Agreed.
>
>>>
>>> The nice thing about awk - actually already mentioned in context of
>>> the features/complexity vs. power comments - is that you don't need
>>> to memorize a lot;[*] I think awk is terse and compact enough. YMMV.
>>
>> But since I use pipelines so much, I rarely use split, patsplit, gsub or
>> gensub. I find myself checking their arguments pretty much every time I
>> use them.
>
> (Some of the mentioned functions are non-standard, GNU Awk'ish.)
>
> Well, if you don't use them regularly you'll have to look up the docs.
> Personally I think the [standard] functions are easy to remember, but
> okay. Myself I can easy remember them if only by thinking about their
> finally placed default arguments; split (what, where [,by-what])
> omitting the "by-what" will use the standard FS, and "what", "where"
> I think is the natural order I'd expect, and similarly with gsub;
> gsub (what, by-what [, where]) omitting the where will operate on
> the whole line, and "what", "by-what" I think are the natural order.
>
> The non-standard GNU functions patsplit() and gensub() I also have to
> look up, but I think just because I rarely have a need to use these
> functions.
>
>>>
>>> That's how we learned it; pipelining through simple dedicated tools.
>>> I also still do that.
>>
>> Why? Serious question. It sound like a dreadful risk based on your
>> comments above. Doing is "usually leads to such horrible and inflexible
>> cascades of the tools" when there is no need "to invoke another
>> process". What makes you sometimes take the risk of horrible cascades
>> and pay the price of another process?
>
> I think this is answered in my previous post, my reply to Axel's post.
>
> It's certainly not something I'd call a risk, because I can control it,
> I can make the decisions, based on application case, requirements, and
> expertise.

That's an odd answer. Do you think I can't control the risk, or was the
advice about it "usually [leading] to such horrible and inflexible
cascades of the tools" aimed at some unstated group who are not as good
at controlling risk as you and I?

>> I ask because it's possible we disagree only on how frequently it should
>> be done, and about exactly what circumstances warrant it.
>
> I think we should not take the theme too dogmatic or too strict. To
> quote from my other post:
>
>>> What I was addressing is the use of programs with primitive functions
>>> that awk is providing in a simple and consistent way inherently!
>
>>
>>> the whole pipeline gets then refactored, typically for efficiency,
>>> flexibility, robustness, and clarity in design.
>>
>> That's where I disagree. I often choose a pipeline because it is the
>> most robust, flexible and clear design. (I rarely care about efficiency
>> when doing this sort of thing.)
>
> We do not disagree concerning the clearness of the pipe concept. It is
> just very _primitive_ (an advantage, and a restriction WRT
> flexibility).

But you also appear to consider it costly and risky. That's the
difference I think.

--
Ben.

Re: The Art of Unix Programming - Case Study: awk

<su8kif$fe6$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1077&group=comp.lang.awk#1077

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Sat, 12 Feb 2022 16:43:10 +0100
Organization: A noiseless patient Spider
Lines: 88
Message-ID: <su8kif$fe6$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 12 Feb 2022 15:43:11 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="03590c13cd5ab0f14aba33329becd8d4";
logging-data="15814"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+vLoNmnfRVgS1RvyrdahuF"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:ZkA1aU/uGHCwRaYuh5egwFBWOyE=
In-Reply-To: <87r1893wng.fsf@axel-reichert.de>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sat, 12 Feb 2022 15:43 UTC

On 11.02.2022 22:45, Axel Reichert wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>
> Yes, sure, I noticed that. I do think that it's mostly about stylistic
> matters at this point of the discussion. I like to compare
> shell/Unix/awk/CLI issues with a language: A former boss was impressed
> by what was feasible with these strange "words", which to him sounded
> like Greek. He wanted me to save these utterances for others to benefit
> from my "words of wisdom". I argued that they were not a "quote" to be
> put into some anthology, but a spontaneous sentence formed during "live
> talk". The point for me was not "memorizing" them (shell script), but
> being able to speak.
>
> Of course this analogy is valid only for ad hoc stuff, but that is how I
> use them in almost all cases: These are throw-away command lines and
> only very rarely I see the potential for them to be re-used. [...]

I understand that analogy. And thinking about your and Ben's posts I
concluded that this must be the reason for your approach and view of
the topic. (But please correct me if I misinterpreted your reasons.)

I indeed have two types of tasks, ad hoc tools, and applications that
are supporting regular occurring tasks. And often it is the case that
the former evolve to the latter! I already described the former tasks
in my previous posts by the (often incremental) development of tools;
starting with a command and appending pipe-connected other commands
to refine the task, to make the output better usable, and whatnot. At
the other side there's the functionality that goes in the direction
of software development; I'm therefore also reluctant to call what I
am creating here as "shell script", I often call it "shell program".
How these two types of code are written/designed depends on the type.

I'd like to jump in with anecdotes from Real Life (as you've done as
well), since what I wrote can probably be better understood that way
of how I am thinking.

Quite some years ago I had an interview for a job, and since it was
a Unix/Linux running company they asked me about writing code for a
shell task; basically log file analysis (as far as my faint memories
serve). I heard the task and quickly typed a simple 1-liner of awk;
because that was what appeared to be the right tool for the given
task, and for the perspective that the requirements for such tools
or tool-applications will typically quickly get extended and grow
("implement this", "now add that", "and this function would help",
"we also missed to consider that", etc.). The interviewing person
was (positively!) astonished, and he told me that he expected some
solution with 'cut', 'uniq', 'wc', and the like. So I provided him
also the piped variant (luckily even remembering uniq's option -c),
but also pointed him out the inherent drawbacks and restrictions.

On another occasion we quickly needed an analysis of mobile phone
customer data from a production database. My boss sat adjacent and
I told him to just get and produce this data instantly; one minute
and an awk 1-liner later we had that information. His expectation
was that we'd have needed some high level programming environment,
and so he was quite fascinated. With piped commands that wouldn't
have been possible to achieve, and with a high-level language it
would have required a lot more time.

It all depends on the specific task, and the perspective whereto
the application will likely evolve (and of course also personal
preference and specific experience what makes more or less sense).

>
>> as soon as a search task gets slightly more complex, say searching for
>> lines with two matches, /A/&&/B/, I'd use awk.
>
> grep A ... | grep B

I anticipated that pattern already in my previous post to that you
are replying here (but didn't quote), so I refer to my original text.

>
>> more clumsy to implement with pipes, e.g. extracting keys from a file
>> to match records in another file; then I don't even think about how
>> that (maybe) could be implemented by function compositions with
>> primitive Unix programs
>
> But you do know "join"? An often overlooked gem.

I know the 'join' command but don't see what that has to do with what I
wrote here. By "function composition" I meant that programs represent
functions; tool x does f, tool y does g, and combining tool x and y by,
say, x|y does g o f, where o is the function connector, a composition
of functionality (and code).

Janis

Re: The Art of Unix Programming - Case Study: awk

<su8mvt$eb$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1078&group=comp.lang.awk#1078

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Sat, 12 Feb 2022 17:24:29 +0100
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <su8mvt$eb$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87iltnlcxe.fsf@bsb.me.uk> <su6im0$36h$1@dont-email.me>
<87sfspj7xf.fsf@bsb.me.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 12 Feb 2022 16:24:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="03590c13cd5ab0f14aba33329becd8d4";
logging-data="459"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18nfTl+N+UfzzWO7gk6nY3Z"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:WiVGf6BVtCRG27zFECX9D76nG6k=
In-Reply-To: <87sfspj7xf.fsf@bsb.me.uk>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sat, 12 Feb 2022 16:24 UTC

On 12.02.2022 00:32, Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>> On 10.02.2022 02:37, Ben Bacarisse wrote:
>>>
>>> Why? Serious question. It sound like a dreadful risk based on your
>>> comments above. Doing is "usually leads to such horrible and inflexible
>>> cascades of the tools" when there is no need "to invoke another
>>> process". What makes you sometimes take the risk of horrible cascades
>>> and pay the price of another process?
>>
>> I think this is answered in my previous post, my reply to Axel's post.
>>
>> It's certainly not something I'd call a risk, because I can control it,
>> I can make the decisions, based on application case, requirements, and
>> expertise.
>
> That's an odd answer. Do you think I can't control the risk,

No, Ben. You were (IMO unnecessarily) introducing the (IMO also
inappropriate) term "risk". And I pointed out that I see risks
only in cases where one cannot control the situation or where I
am restricted in any ways in my decisions. I was neither saying
nor implying anything about you. (This discussion got a spin in
an [emotional] direction that I don't want to follow.)

In a reply to Axel (a few minutes ago) I've just extended on my
view; it may (or may not) shed some light on some open questions
or maybe give some insights about possible differences of personal
approaches or mindsets.

Janis

Re: The Art of Unix Programming - Case Study: awk

<87h793k9j1.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1079&group=comp.lang.awk#1079

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Sat, 12 Feb 2022 22:25:06 +0000
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <87h793k9j1.fsf@bsb.me.uk>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87iltnlcxe.fsf@bsb.me.uk> <su6im0$36h$1@dont-email.me>
<87sfspj7xf.fsf@bsb.me.uk> <su8mvt$eb$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="050a2297e0b7b44d504545e0d7bd6d04";
logging-data="16140"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1++MUxRHVVGez4v4EVmVjnFvyW6/EoGf98="
Cancel-Lock: sha1:FWWjVazL2FuGBdlSe59JzYm9Sh8=
sha1:sy0Fesi+P2qwpXhaWQGF+9TJ3j8=
X-BSB-Auth: 1.27094c2a846498e11079.20220212222506GMT.87h793k9j1.fsf@bsb.me.uk
 by: Ben Bacarisse - Sat, 12 Feb 2022 22:25 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> On 12.02.2022 00:32, Ben Bacarisse wrote:
>> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>> On 10.02.2022 02:37, Ben Bacarisse wrote:
>>>>
>>>> Why? Serious question. It sound like a dreadful risk based on your
>>>> comments above. Doing is "usually leads to such horrible and inflexible
>>>> cascades of the tools" when there is no need "to invoke another
>>>> process". What makes you sometimes take the risk of horrible cascades
>>>> and pay the price of another process?
>>>
>>> I think this is answered in my previous post, my reply to Axel's post.
>>>
>>> It's certainly not something I'd call a risk, because I can control it,
>>> I can make the decisions, based on application case, requirements, and
>>> expertise.
>>
>> That's an odd answer. Do you think I can't control the risk,
>
> No, Ben. You were (IMO unnecessarily) introducing the (IMO also
> inappropriate) term "risk".

It seems to me reasonable to characterise doing something that "usually
leads to such horrible and inflexible cascades of the tools" as taking a
risk. A practice (that you describe as an impulse!) that usually leads
to anything horrible and inflexible is surely risky, isn't it?

> And I pointed out that I see risks
> only in cases where one cannot control the situation or where I
> am restricted in any ways in my decisions.

I accept that this is what you intended to say, but it's not what you
said. There was no inclusive "one" in your justification for your use
of pipelines of basic tools.

You had warned me of where following the "impulse" I had demonstrated in
my example usually leads, but when I was surprised that you "still do
that" you tell me it's fine because "I can control it, I can make the
decisions, based on application case, requirements, and expertise". No
"where one can make the decisions", no "where one can control it".

> I was neither saying nor implying anything about you.

If you did not intend your remarks imply and say things about me, you
phrased it badly. Talking about developing commands "that way",
immediately after an example of mine inevitably includes me in the
remark about it being an (understandable) impulse.

> (This discussion got a spin in
> an [emotional] direction that I don't want to follow.)

Telling someone they are acting on an impulse that usually leads to
horrible and inflexible code is obviously somewhat personal.

--
Ben.

Re: The Art of Unix Programming - Case Study: awk

<87ee464h3a.fsf@axel-reichert.de>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1080&group=comp.lang.awk#1080

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mail@axel-reichert.de (Axel Reichert)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Sun, 13 Feb 2022 22:00:41 +0100
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <87ee464h3a.fsf@axel-reichert.de>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="9119fcb3c3d32301150de6a5cf786893";
logging-data="7350"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+oE/5gOyNNjKPgfmAQytqmt0H+16a7kz8="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:aTxzW+v6AiGx15ud5SCusSoHhE0=
sha1:siciu84hJx6p46Fpc6YJrM1auSc=
 by: Axel Reichert - Sun, 13 Feb 2022 21:00 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> On 11.02.2022 22:45, Axel Reichert wrote:
>> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>
>>> more clumsy to implement with pipes, e.g. extracting keys from a file
>>> to match records in another file; then I don't even think about how
>>> that (maybe) could be implemented by function compositions with
>>> primitive Unix programs
>>
>> But you do know "join"? An often overlooked gem.
>
> I know the 'join' command but don't see what that has to do with what I
> wrote here. By "function composition" I meant that programs represent
> functions; tool x does f, tool y does g, and combining tool x and y by,
> say, x|y does g o f, where o is the function connector, a composition
> of functionality (and code).

foo-1.txt:
foo 1 2 3
Foo 4 5 6 7
FOO 8 9

foo-2.txt:
foo 456
Foo 45 67
FOO 89

To me, the first column seems like a key and the whole line like a
record. To get something like

foo-joined.txt:
foo 1 2 456
Foo 4 5 45
Foo 8 9 89

would be a typical job for join. Hence my question. But we digress from
awk. (-:

Axel


devel / comp.lang.awk / Re: The Art of Unix Programming - Case Study: awk

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor