Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"Send lawyers, guns and money..." -- Lyrics from a Warren Zevon song


devel / comp.lang.awk / Re: The Art of Unix Programming - Case Study: awk

SubjectAuthor
* The Art of Unix Programming - Case Study: awkJanis Papanagnou
+* Re: The Art of Unix Programming - Case Study: awkKenny McCormack
|`* Re: The Art of Unix Programming - Case Study: awkKenny McCormack
| `- Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
+- Re: The Art of Unix Programming - Case Study: awkEd Morton
+* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
|`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
| +* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
| |`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
| | `* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
| |  `* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
| |   +* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
| |   |`- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
| |   `- Re: The Art of Unix Programming - Case Study: awkJeremy Brubaker
| `- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
+* Re: The Art of Unix Programming - Case Study: awkOlaf Schultz
|`- Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
`* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
 `* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
  `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   +* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   |+* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   ||+* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||`* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   ||| `* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||  +* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  |`* Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||  | `- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  +* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  |`* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   |||  | +- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |||  | `- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   |||  `* Syntactic Sugar (Was: The Art of Unix Programming - Case Study: awk)Kenny McCormack
   |||   `* Re: Syntactic Sugar (Was: The Art of Unix Programming - Case Study:Kaz Kylheku
   |||    `- Re: Syntactic SugarBen Bacarisse
   ||`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   || +* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   || |`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   || | `* Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   || |  `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   || |   `- Re: The Art of Unix Programming - Case Study: awkBen Bacarisse
   || `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   ||  +- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   ||  `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||   `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   ||    `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||     `* Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   ||      +* Re: The Art of Unix Programming - Case Study: awkolivier gabathuler
   ||      |`* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||      | `* Re: The Art of Unix Programming - Case Study: awkolivier gabathuler
   ||      |  `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||      |   +- Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||      |   `* Re: The Art of Unix Programming - Case Study: awkolivier gabathuler
   ||      |    `- Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   ||      `* Re: The Art of Unix Programming - Case Study: awkJanis Papanagnou
   ||       `- Re: The Art of Unix Programming - Case Study: awkAxel Reichert
   |`- Re: The Art of Unix Programming - Case Study: awkAndreas Eder
   +* Re: The Art of Unix Programming - Case Study: awkKpop 2GM
   |`- Re: The Art of Unix Programming - Case Study: awkKaz Kylheku
   `- Re: The Art of Unix Programming - Case Study: awkKpop 2GM

Pages:123
Re: The Art of Unix Programming - Case Study: awk

<7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1081&group=comp.lang.awk#1081

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:600c:35c4:b0:37c:debf:6f2d with SMTP id r4-20020a05600c35c400b0037cdebf6f2dmr3422427wmq.142.1645049500855;
Wed, 16 Feb 2022 14:11:40 -0800 (PST)
X-Received: by 2002:a81:9bc5:0:b0:2d0:c4b1:da5f with SMTP id
s188-20020a819bc5000000b002d0c4b1da5fmr6858ywg.307.1645049500209; Wed, 16 Feb
2022 14:11:40 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Wed, 16 Feb 2022 14:11:39 -0800 (PST)
In-Reply-To: <87ee464h3a.fsf@axel-reichert.de>
Injection-Info: google-groups.googlegroups.com; posting-host=82.64.112.228; posting-account=W6bwFgoAAAAUGIIMp4N_fxEcpFPb09fE
NNTP-Posting-Host: 82.64.112.228
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me> <87ee464h3a.fsf@axel-reichert.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: ogabathuler@free.fr (olivier gabathuler)
Injection-Date: Wed, 16 Feb 2022 22:11:40 +0000
Content-Type: text/plain; charset="UTF-8"
 by: olivier gabathuler - Wed, 16 Feb 2022 22:11 UTC

Hi, thank you for your outstanding contributions and discussions on awk.

Working with it from more than 20 years now and still amazed at the power of this wonderful language !

This is my modest contribution to shed light on a usage too little documented on the Internet, I named "Record Separator" : https://rosettacode.org/wiki/Search_in_paragraph%27s_text

Olivier Gabathuler

Re: The Art of Unix Programming - Case Study: awk

<sukau4$msi$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1082&group=comp.lang.awk#1082

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 17 Feb 2022 03:12:19 +0100
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <sukau4$msi$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 17 Feb 2022 02:12:20 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d93e526db323c09bcacc88c6e3071c45";
logging-data="23442"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mqVAUBx8jt6r4qWtM7Mud"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:jwOwJzwg6lRgG7BTbTnyDVtab0s=
In-Reply-To: <87ee464h3a.fsf@axel-reichert.de>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Thu, 17 Feb 2022 02:12 UTC

On 13.02.2022 22:00, Axel Reichert wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>
>> On 11.02.2022 22:45, Axel Reichert wrote:
>>> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>>
>>>> more clumsy to implement with pipes, e.g. extracting keys from a file
>>>> to match records in another file; then I don't even think about how
>>>> that (maybe) could be implemented by function compositions with
>>>> primitive Unix programs
>>>
>>> But you do know "join"? An often overlooked gem.
>>
>> I know the 'join' command but don't see what that has to do with what I
>> wrote here. By "function composition" I meant that programs represent
>> functions; tool x does f, tool y does g, and combining tool x and y by,
>> say, x|y does g o f, where o is the function connector, a composition
>> of functionality (and code).
>
> foo-1.txt:
> foo 1 2 3
> Foo 4 5 6 7
> FOO 8 9
>
> foo-2.txt:
> foo 456
> Foo 45 67
> FOO 89
>
> To me, the first column seems like a key and the whole line like a
> record.

Sure. You join two data sets identified by a common key. But so what?

You have probably been triggered by the formulation of a sample use
("extracting keys from a file to match records in another file") in
my post where I was more aiming at patterns like

awk '
NR==FNR { map[$1] ; next }
$1 in map
' keys data

(i.e. a filtering task - in a simple form also doable by grep) or like

awk '
NR==FNR { map[$1] = $2 ; next }
{ for (i in map) gsub (i, map[i]) }
1
' mapping data

(i.e. a simple replacement task).

One point was that function compositions by piped commands may (or not)
work for reductions of original data but if you need context from a
previous stage you are lost (or rather; need workarounds).

> To get something like
>
> foo-joined.txt:
> foo 1 2 456
> Foo 4 5 45
> Foo 8 9 89
>
> would be a typical job for join. Hence my question.

Yes, as I said in previous posts as well, I do know join but that was
not what I had been speaking about.

> But we digress from awk. (-:

I hope I dragged the thread back on topic with the awk samples. ;-)

Janis

>
> Axel
>

Re: The Art of Unix Programming - Case Study: awk

<sukcap$46t$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1083&group=comp.lang.awk#1083

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 17 Feb 2022 03:36:08 +0100
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <sukcap$46t$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de>
<7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 17 Feb 2022 02:36:09 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d93e526db323c09bcacc88c6e3071c45";
logging-data="4317"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qRMg6movQa4VS4KcZnAL9"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:dGnFqnjucj48JfO0SVrVMXAmgNw=
In-Reply-To: <7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Thu, 17 Feb 2022 02:36 UTC

On 16.02.2022 23:11, olivier gabathuler wrote:
> Hi, thank you for your outstanding contributions and discussions on
> awk.
>
> Working with it from more than 20 years now and still amazed at the
> power of this wonderful language !
>
> This is my modest contribution to shed light on a usage too little
> documented on the Internet, I named "Record Separator" :
> https://rosettacode.org/wiki/Search_in_paragraph%27s_text

I had a peek view into the awk code and (unstructured) data sample.

The task is described not very specific as:
"The goal is to verify the presence of a word or regular expression
within several paragraphs of text (structured or not) and to print
the relevant paragraphs on the standard output."

When I saw the code I first wondered about the definition of a two
newlines output record separator just to define the same as input
separator to the next awk stage. (An indication for a candidate to
be refactored.)

It seems that your code basically extracts from records of blocks
those blocks that contain a specific string. In addition it changes
the data in a subtle way beyond the formulated task description.

Personally my first attempt for such a task would have been simpler
(using awk's multi-line data blocks feature), something like

awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
/Traceback/ && /SystemError/
' Traceback.txt

with possible extensions to test for the patterns in specific fields
(by adding FS = "\n") so that the patterns if appearing in the data
won't compromise the correct function.

(Note that the output of above code keeps the matched data intact.)

Yes, features relying on the separators allow interesting solutions.
(In the given case it's arguable whether they've been used sensibly.)

Janis

>
> Olivier Gabathuler
>

Re: The Art of Unix Programming - Case Study: awk

<87o835ykpz.fsf@axel-reichert.de>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1084&group=comp.lang.awk#1084

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mail@axel-reichert.de (Axel Reichert)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Thu, 17 Feb 2022 09:13:12 +0100
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <87o835ykpz.fsf@axel-reichert.de>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de> <sukau4$msi$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="89f5b416d46a43727d7cfc508f08328a";
logging-data="22366"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/C9rNPhCKvJbMKOy6A25gUx++kaKfDPvU="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:eNYdiNWx3r6Yg1F3MuO70W9CuwM=
sha1:wvt6W+pt4TUSQnkTHhEFs2Pr9OA=
 by: Axel Reichert - Thu, 17 Feb 2022 08:13 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> Sure. You join two data sets identified by a common key. But so what?

Hey, I was very proud when I discovered this 15 years ago. (-:

> You have probably been triggered by the formulation of a sample use
> ("extracting keys from a file to match records in another file") in my
> post

Yes.

> I hope I dragged the thread back on topic with the awk samples. ;-)

You did, and I am happy to learn more here from, it seems, much more
advanced awk usage than I am used to so far. Thanks!

Axel

Re: The Art of Unix Programming - Case Study: awk

<5bfbf78d-782b-49f2-86fe-469bb9cd1c91n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1090&group=comp.lang.awk#1090

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:adf:f001:0:b0:1e4:b7b1:87c1 with SMTP id j1-20020adff001000000b001e4b7b187c1mr7216301wro.238.1645207739854;
Fri, 18 Feb 2022 10:08:59 -0800 (PST)
X-Received: by 2002:a25:400f:0:b0:623:fc8b:7529 with SMTP id
n15-20020a25400f000000b00623fc8b7529mr8485424yba.422.1645207739342; Fri, 18
Feb 2022 10:08:59 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Fri, 18 Feb 2022 10:08:58 -0800 (PST)
In-Reply-To: <sukcap$46t$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=82.64.112.228; posting-account=W6bwFgoAAAAUGIIMp4N_fxEcpFPb09fE
NNTP-Posting-Host: 82.64.112.228
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de> <7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
<sukcap$46t$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5bfbf78d-782b-49f2-86fe-469bb9cd1c91n@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: ogabathuler@free.fr (olivier gabathuler)
Injection-Date: Fri, 18 Feb 2022 18:08:59 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: olivier gabathuler - Fri, 18 Feb 2022 18:08 UTC

Le jeudi 17 février 2022 à 03:36:11 UTC+1, Janis Papanagnou a écrit :
> On 16.02.2022 23:11, olivier gabathuler wrote:
> > Hi, thank you for your outstanding contributions and discussions on
> > awk.
> >
> > Working with it from more than 20 years now and still amazed at the
> > power of this wonderful language !
> >
> > This is my modest contribution to shed light on a usage too little
> > documented on the Internet, I named "Record Separator" :
> > https://rosettacode.org/wiki/Search_in_paragraph%27s_text
> I had a peek view into the awk code and (unstructured) data sample.
>
> The task is described not very specific as:
> "The goal is to verify the presence of a word or regular expression
> within several paragraphs of text (structured or not) and to print
> the relevant paragraphs on the standard output."
>
> When I saw the code I first wondered about the definition of a two
> newlines output record separator just to define the same as input
> separator to the next awk stage. (An indication for a candidate to
> be refactored.)
>
> It seems that your code basically extracts from records of blocks
> those blocks that contain a specific string. In addition it changes
> the data in a subtle way beyond the formulated task description.
>
> Personally my first attempt for such a task would have been simpler
> (using awk's multi-line data blocks feature), something like
>
> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
> /Traceback/ && /SystemError/
> ' Traceback.txt
>
> with possible extensions to test for the patterns in specific fields
> (by adding FS = "\n") so that the patterns if appearing in the data
> won't compromise the correct function.
>
> (Note that the output of above code keeps the matched data intact.)
>
> Yes, features relying on the separators allow interesting solutions.
> (In the given case it's arguable whether they've been used sensibly.)
>
> Janis
>
> >
> > Olivier Gabathuler
> >

Hi Janis,
thanks for your response :-)

Just to understand, the output with
> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
> /Traceback/ && /SystemError/
> ' Traceback.txt
is :
...
----------------
[Tue Jan 21 16:16:19.250245 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] Traceback (most recent call last):
[Tue Jan 21 16:16:19.252221 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] SystemError: unable to access /home/dir
[Tue Jan 21 16:16:19.249067 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Failed to exec Python script file '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
[Tue Jan 21 16:16:19.249609 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Exception occurred processing WSGI script '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
----------------
12/01 19:24:57.726 ERROR| log:0072| post-test sysinfo error: 11/01 18:24:57..727 ERROR| traceback:0013| Traceback (most recent call last): 11/01 18:24:57.728 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/common_lib/log.py", line 70, in decorated_func 11/01 18:24:57.729 ERROR| traceback:0013| fn(*args, **dargs) 11/01 18:24:57.730 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/bin/base_sysinfo.py", line 286, in log_after_each_test 11/01 18:24:57.731 ERROR| traceback:0013| old_packages = set(self._installed_packages) 11/01 18:24:57.731 ERROR| traceback:0013| SystemError: no such file or directory
----------------
...
-> not exactly the output I expect, but as you said, I was not specific enough in the description of the output formatting.
I will fix that.

In fact I took this example, but in my working life on +10k Linux boxes as sysadmin, I used RS extensively to parse a lot of logs, so..

Olivier G.

Re: The Art of Unix Programming - Case Study: awk

<sup178$i12$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1091&group=comp.lang.awk#1091

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 18 Feb 2022 21:57:12 +0100
Organization: A noiseless patient Spider
Lines: 139
Message-ID: <sup178$i12$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de>
<7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
<sukcap$46t$1@dont-email.me>
<5bfbf78d-782b-49f2-86fe-469bb9cd1c91n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 18 Feb 2022 20:57:12 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0851dd9e8df0d6b2a27c6470d35d488b";
logging-data="18466"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TzmfkZ9O4Dq0YVc/lmDnb"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:faNJPpG5DxMfoBcX/blooqlxfJE=
In-Reply-To: <5bfbf78d-782b-49f2-86fe-469bb9cd1c91n@googlegroups.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 18 Feb 2022 20:57 UTC

On 18.02.2022 19:08, olivier gabathuler wrote:
> Le jeudi 17 février 2022 à 03:36:11 UTC+1, Janis Papanagnou a écrit :
>> On 16.02.2022 23:11, olivier gabathuler wrote:
>>> Hi, thank you for your outstanding contributions and discussions on
>>> awk.
>>>
>>> Working with it from more than 20 years now and still amazed at the
>>> power of this wonderful language !
>>>
>>> This is my modest contribution to shed light on a usage too little
>>> documented on the Internet, I named "Record Separator" :
>>> https://rosettacode.org/wiki/Search_in_paragraph%27s_text
>> I had a peek view into the awk code and (unstructured) data sample.
>>
>> The task is described not very specific as:
>> "The goal is to verify the presence of a word or regular expression
>> within several paragraphs of text (structured or not) and to print
>> the relevant paragraphs on the standard output."
>>
>> When I saw the code I first wondered about the definition of a two
>> newlines output record separator just to define the same as input
>> separator to the next awk stage. (An indication for a candidate to
>> be refactored.)
>>
>> It seems that your code basically extracts from records of blocks
>> those blocks that contain a specific string. In addition it changes
>> the data in a subtle way beyond the formulated task description.
>>
>> Personally my first attempt for such a task would have been simpler
>> (using awk's multi-line data blocks feature), something like
>>
>> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
>> /Traceback/ && /SystemError/
>> ' Traceback.txt
>>
>> with possible extensions to test for the patterns in specific fields
>> (by adding FS = "\n") so that the patterns if appearing in the data
>> won't compromise the correct function.
>>
>> (Note that the output of above code keeps the matched data intact.)
>>
>> Yes, features relying on the separators allow interesting solutions.
>> (In the given case it's arguable whether they've been used sensibly.)
>>
>> Janis
>>
>>>
>>> Olivier Gabathuler
>>>
>
> Hi Janis,
> thanks for your response :-)
>
> Just to understand, the output with
>> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
>> /Traceback/ && /SystemError/
>> ' Traceback.txt
> is :
> ..
> ----------------
> [Tue Jan 21 16:16:19.250245 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] Traceback (most recent call last):
> [Tue Jan 21 16:16:19.252221 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] SystemError: unable to access /home/dir
> [Tue Jan 21 16:16:19.249067 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Failed to exec Python script file '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
> [Tue Jan 21 16:16:19.249609 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Exception occurred processing WSGI script '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
> ----------------
> 12/01 19:24:57.726 ERROR| log:0072| post-test sysinfo error: 11/01 18:24:57.727 ERROR| traceback:0013| Traceback (most recent call last): 11/01 18:24:57.728 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/common_lib/log.py", line 70, in decorated_func 11/01 18:24:57.729 ERROR| traceback:0013| fn(*args, **dargs) 11/01 18:24:57.730 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/bin/base_sysinfo.py", line 286, in log_after_each_test 11/01 18:24:57.731 ERROR| traceback:0013| old_packages = set(self._installed_packages) 11/01 18:24:57.731 ERROR| traceback:0013| SystemError: no such file or directory
> ----------------
> ..
> -> not exactly the output I expect, but as you said, I was not specific enough in the description of the output formatting.
> I will fix that.

Actually I was much more saying and implying. To expand on it...

From the code and the task description it was unclear whether the
output of your script was just by accident or deliberately beyond
the description on the web page.

If it was by accident differing - as a consequence of a convoluted
design based on the field separators - then above simple code is an
immediate improvement (in more than one aspect).

If your task was actually to output the matching lines, but these
matching lines should start from the keyword "Traceback" (and the
leading time stamps suppressed), then you can and should formulate
that in a clean way; not only the description but also the code
should be clearly formulated.

A clean awk function is simply substr($0,index($0,"Traceback"))
and the resulting code still clean and comprehensible; instead of
printing the whole record

awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }

/Traceback/ && /SystemError/ ## this implies: print $0

' Traceback.txt

you just print the desired part

awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }

/Traceback/ && /SystemError/ {
print substr($0,index($0,"Traceback"))
}
' Traceback.txt

A simple straightforward addition without side-effects or any hard
to follow program logic. No unnecessary awk instances, FS-fiddling,
or anything.

And this code prints at least the same output as the code you posted
on that web page. That code of yours was

awk -v ORS='\n\n' '/SystemError/ { print RS $0 }'
RS="Traceback" Traceback.txt |\
awk -v ORS='\n----------------\n' '/Traceback/' RS="\n\n"

If you think this code is in any way to prefer I'd be interested in
your explanations. - No, not really, that was just rhetorical.

>
> In fact I took this example, but in my working life on +10k Linux boxes as sysadmin, I used RS extensively to parse a lot of logs, so..

In fact, if that posted code you showed here is a characteristic code
sample, then I doubt that it's a good idea to spread it to +10k Linux
systems.

But that taunt aside; there's nothing wrong in using the awk separators,
it's a basic feature any proficient awk authority will [sensibly] use.
It's its pathological or unnecessary use I consider to be problematic.

YMMV.

Janis

>
> Olivier G.
>

Re: The Art of Unix Programming - Case Study: awk

<sup295$77o$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1092&group=comp.lang.awk#1092

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: The Art of Unix Programming - Case Study: awk
Date: Fri, 18 Feb 2022 22:15:17 +0100
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <sup295$77o$1@dont-email.me>
References: <st6udg$k03$1@dont-email.me>
<88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com>
<8735ksqy1k.fsf@axel-reichert.de> <su0n16$od0$1@dont-email.me>
<87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de>
<7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
<sukcap$46t$1@dont-email.me>
<5bfbf78d-782b-49f2-86fe-469bb9cd1c91n@googlegroups.com>
<sup178$i12$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 18 Feb 2022 21:15:17 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0851dd9e8df0d6b2a27c6470d35d488b";
logging-data="7416"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sIHu/A1bBvI7+r1WnFvUO"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:ynN/CRb22bOkcPrweHpYhMm2lHk=
In-Reply-To: <sup178$i12$1@dont-email.me>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 18 Feb 2022 21:15 UTC

On 18.02.2022 21:57, Janis Papanagnou wrote:
> It's its pathological or unnecessary use I consider to be problematic.

It just occurred to me that the circle closes. The thread started
with a book called "The Art of Unix Programming", resembling the
classic books title "The Art of Computer Programming" from Donald
Knuth, from an epoch where there was a clear path away from hacking
code together individually by dedicated computer experts, towards
taking a more systematic and scientific approach. My feeling is that
we're still balancing on the bleeding edge of software development,
between hacks and - whatever.

Janis

Re: The Art of Unix Programming - Case Study: awk

<3764d924-af33-4b3a-957e-2fea1a454a82n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1095&group=comp.lang.awk#1095

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a7b:cb44:0:b0:37c:4e2d:3bb2 with SMTP id v4-20020a7bcb44000000b0037c4e2d3bb2mr15443790wmj.96.1645300533756;
Sat, 19 Feb 2022 11:55:33 -0800 (PST)
X-Received: by 2002:a81:5982:0:b0:2d0:ca4d:f10 with SMTP id
n124-20020a815982000000b002d0ca4d0f10mr13163489ywb.276.1645300533211; Sat, 19
Feb 2022 11:55:33 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 19 Feb 2022 11:55:32 -0800 (PST)
In-Reply-To: <sup178$i12$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=82.64.112.228; posting-account=W6bwFgoAAAAUGIIMp4N_fxEcpFPb09fE
NNTP-Posting-Host: 82.64.112.228
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de> <7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
<sukcap$46t$1@dont-email.me> <5bfbf78d-782b-49f2-86fe-469bb9cd1c91n@googlegroups.com>
<sup178$i12$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3764d924-af33-4b3a-957e-2fea1a454a82n@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: ogabathuler@free.fr (olivier gabathuler)
Injection-Date: Sat, 19 Feb 2022 19:55:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: olivier gabathuler - Sat, 19 Feb 2022 19:55 UTC

Le vendredi 18 février 2022 à 21:57:14 UTC+1, Janis Papanagnou a écrit :
> On 18.02.2022 19:08, olivier gabathuler wrote:
> > Le jeudi 17 février 2022 à 03:36:11 UTC+1, Janis Papanagnou a écrit :
> >> On 16.02.2022 23:11, olivier gabathuler wrote:
> >>> Hi, thank you for your outstanding contributions and discussions on
> >>> awk.
> >>>
> >>> Working with it from more than 20 years now and still amazed at the
> >>> power of this wonderful language !
> >>>
> >>> This is my modest contribution to shed light on a usage too little
> >>> documented on the Internet, I named "Record Separator" :
> >>> https://rosettacode.org/wiki/Search_in_paragraph%27s_text
> >> I had a peek view into the awk code and (unstructured) data sample.
> >>
> >> The task is described not very specific as:
> >> "The goal is to verify the presence of a word or regular expression
> >> within several paragraphs of text (structured or not) and to print
> >> the relevant paragraphs on the standard output."
> >>
> >> When I saw the code I first wondered about the definition of a two
> >> newlines output record separator just to define the same as input
> >> separator to the next awk stage. (An indication for a candidate to
> >> be refactored.)
> >>
> >> It seems that your code basically extracts from records of blocks
> >> those blocks that contain a specific string. In addition it changes
> >> the data in a subtle way beyond the formulated task description.
> >>
> >> Personally my first attempt for such a task would have been simpler
> >> (using awk's multi-line data blocks feature), something like
> >>
> >> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
> >> /Traceback/ && /SystemError/
> >> ' Traceback.txt
> >>
> >> with possible extensions to test for the patterns in specific fields
> >> (by adding FS = "\n") so that the patterns if appearing in the data
> >> won't compromise the correct function.
> >>
> >> (Note that the output of above code keeps the matched data intact.)
> >>
> >> Yes, features relying on the separators allow interesting solutions.
> >> (In the given case it's arguable whether they've been used sensibly.)
> >>
> >> Janis
> >>
> >>>
> >>> Olivier Gabathuler
> >>>
> >
> > Hi Janis,
> > thanks for your response :-)
> >
> > Just to understand, the output with
> >> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
> >> /Traceback/ && /SystemError/
> >> ' Traceback.txt
> > is :
> > ..
> > ----------------
> > [Tue Jan 21 16:16:19.250245 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] Traceback (most recent call last):
> > [Tue Jan 21 16:16:19.252221 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] SystemError: unable to access /home/dir
> > [Tue Jan 21 16:16:19.249067 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Failed to exec Python script file '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
> > [Tue Jan 21 16:16:19.249609 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Exception occurred processing WSGI script '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
> > ----------------
> > 12/01 19:24:57.726 ERROR| log:0072| post-test sysinfo error: 11/01 18:24:57.727 ERROR| traceback:0013| Traceback (most recent call last): 11/01 18:24:57.728 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/common_lib/log.py", line 70, in decorated_func 11/01 18:24:57.729 ERROR| traceback:0013| fn(*args, **dargs) 11/01 18:24:57.730 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/bin/base_sysinfo.py", line 286, in log_after_each_test 11/01 18:24:57.731 ERROR| traceback:0013| old_packages = set(self._installed_packages) 11/01 18:24:57.731 ERROR| traceback:0013| SystemError: no such file or directory
> > ----------------
> > ..
> > -> not exactly the output I expect, but as you said, I was not specific enough in the description of the output formatting.
> > I will fix that.
> Actually I was much more saying and implying. To expand on it...
>
> From the code and the task description it was unclear whether the
> output of your script was just by accident or deliberately beyond
> the description on the web page.
>
> If it was by accident differing - as a consequence of a convoluted
> design based on the field separators - then above simple code is an
> immediate improvement (in more than one aspect).
>
> If your task was actually to output the matching lines, but these
> matching lines should start from the keyword "Traceback" (and the
> leading time stamps suppressed), then you can and should formulate
> that in a clean way; not only the description but also the code
> should be clearly formulated.
>
> A clean awk function is simply substr($0,index($0,"Traceback"))
> and the resulting code still clean and comprehensible; instead of
> printing the whole record
> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
> /Traceback/ && /SystemError/ ## this implies: print $0
>
> ' Traceback.txt
>
> you just print the desired part
> awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
> /Traceback/ && /SystemError/ {
> print substr($0,index($0,"Traceback"))
> }
> ' Traceback.txt
>
> A simple straightforward addition without side-effects or any hard
> to follow program logic. No unnecessary awk instances, FS-fiddling,
> or anything.
>
> And this code prints at least the same output as the code you posted
> on that web page. That code of yours was
>
> awk -v ORS='\n\n' '/SystemError/ { print RS $0 }'
> RS="Traceback" Traceback.txt |\
> awk -v ORS='\n----------------\n' '/Traceback/' RS="\n\n"
>
> If you think this code is in any way to prefer I'd be interested in
> your explanations. - No, not really, that was just rhetorical.
> >
> > In fact I took this example, but in my working life on +10k Linux boxes as sysadmin, I used RS extensively to parse a lot of logs, so..
> In fact, if that posted code you showed here is a characteristic code
> sample, then I doubt that it's a good idea to spread it to +10k Linux
> systems.
>
> But that taunt aside; there's nothing wrong in using the awk separators,
> it's a basic feature any proficient awk authority will [sensibly] use.
> It's its pathological or unnecessary use I consider to be problematic.
>
> YMMV.
>
> Janis
>
> >
> > Olivier G.
> >

Hi Janis,
thank you very munch for your explanations and apologize for my mistake.
Yes you stated it properly, this is exaclty what I wanted (and with using index function).
I will fix that on rosettacode.org
Have a nice day !

Olivier G.

Re: The Art of Unix Programming - Case Study: awk

<7f993d18-54e8-4c45-ab2d-403a45e6c3a1n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1131&group=comp.lang.awk#1131

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:1472:b0:62c:e3fb:337c with SMTP id j18-20020a05620a147200b0062ce3fb337cmr14206555qkl.495.1646150746988;
Tue, 01 Mar 2022 08:05:46 -0800 (PST)
X-Received: by 2002:a05:6902:1087:b0:628:788e:8a51 with SMTP id
v7-20020a056902108700b00628788e8a51mr3504568ybu.242.1646150746743; Tue, 01
Mar 2022 08:05:46 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Tue, 1 Mar 2022 08:05:46 -0800 (PST)
In-Reply-To: <3764d924-af33-4b3a-957e-2fea1a454a82n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
<su0n16$od0$1@dont-email.me> <87leyjn42c.fsf@bsb.me.uk> <su1mvc$9a9$1@dont-email.me>
<87k0e2hbjf.fsf@axel-reichert.de> <su6gs3$mru$1@dont-email.me>
<87r1893wng.fsf@axel-reichert.de> <su8kif$fe6$1@dont-email.me>
<87ee464h3a.fsf@axel-reichert.de> <7dbf65ae-fa40-4b3e-bb5a-864b56efcad4n@googlegroups.com>
<sukcap$46t$1@dont-email.me> <5bfbf78d-782b-49f2-86fe-469bb9cd1c91n@googlegroups.com>
<sup178$i12$1@dont-email.me> <3764d924-af33-4b3a-957e-2fea1a454a82n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7f993d18-54e8-4c45-ab2d-403a45e6c3a1n@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Tue, 01 Mar 2022 16:05:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 40
 by: Kpop 2GM - Tue, 1 Mar 2022 16:05 UTC

@Axel :

how many scripting languages you know of that can calculate the modulo of 3 for a single hexadecimal, 2.07 GB in size, with a log2(x) value of approx. 8.9 billion (~8,898,328,444),

in just 14.3 seconds :

echo; ( time ( nice echo '0x888889999888888888877765432CC111111111111111111111188888888CCCCCCCCC88888888899998088888888877765432111111111111111110000000000000011111888888888DDDDDDDD8888888899998888888FFFFFFFFFFFFFFFFFFFFFFFFFfFFFFFFFFFFf88877765432111111111111111111111188888888888888888999988888888887776543211111111111111111111118888AAA88888888888889B9998888888888777654321111111111111111111111' | mawk2 'sub(/^0[Xx]/,"")+gsub(//,$0,$0)+gsub(/........................../,$0,$0)+sub("^",("0x1")($0)($0),$0)+1' | pvE0 | mawk2 'function mod3(_) {
return \
(sub("^0[Xx]","",_)<(substr(_,76,1)==""))\
? (length(_)<16?(+_):substr(_,+1,15)+substr(_,16,15)+\
substr(_,31,15)+substr(_,46,15)+substr(_,61,15) \
) % 3 \
: gsub("[ -0369CFILORUXcfx_\n]+","",_)*0+\
(length(_)*+2 + \
gsub("[258BEHKNQTWZbe]+","",_)*0-\
length(_)) % 3 } BEGIN { FS=ORS; RS="^$" } END { print mod3($1) }' ) | pvE9 | ggP '[0-9]*'| lgp3 ) |ecp

in0: 0.00 B 0:00:00 [0.00 B/s] [0.00 B/s] [<=> ]
out9: 2.00 B 0:00:14 [ 143miB/s] [ 143miB/s] [<=> ]
in0: 2.07GiB 0:00:01 [1.72GiB/s] [1.72GiB/s] [ <=> ]
( nice echo | mawk2 | pvE 0.1 in0 | mawk2 ; ) 12.21s user 2.25s system 101% cpu 14.299 total
pvE 0.1 out9 0.01s user 0.02s system 0% cpu 14.298 total
ggrep --text -P --color=always '[0-9]*' 0.00s user 0.00s system 0% cpu 14.296 total

1

Re: The Art of Unix Programming - Case Study: awk

<64d9fd98-d143-48b7-89eb-c9eb5e8df16fn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1133&group=comp.lang.awk#1133

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:6214:246b:b0:435:418c:71b6 with SMTP id im11-20020a056214246b00b00435418c71b6mr1303579qvb.57.1646452164101;
Fri, 04 Mar 2022 19:49:24 -0800 (PST)
X-Received: by 2002:a0d:e282:0:b0:2dc:394a:aa2 with SMTP id
l124-20020a0de282000000b002dc394a0aa2mr1424232ywe.215.1646452163845; Fri, 04
Mar 2022 19:49:23 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Fri, 4 Mar 2022 19:49:23 -0800 (PST)
In-Reply-To: <8735ksqy1k.fsf@axel-reichert.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <st6udg$k03$1@dont-email.me> <88d1d61e-458f-4364-81b5-7301658ee500n@googlegroups.com>
<947ba8e0-80b6-458d-8caa-dac0764526bcn@googlegroups.com> <8735ksqy1k.fsf@axel-reichert.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <64d9fd98-d143-48b7-89eb-c9eb5e8df16fn@googlegroups.com>
Subject: Re: The Art of Unix Programming - Case Study: awk
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Sat, 05 Mar 2022 03:49:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 61
 by: Kpop 2GM - Sat, 5 Mar 2022 03:49 UTC

On Wednesday, February 9, 2022 at 2:50:01 AM UTC-5, Axel Reichert wrote:
> Kpop 2GM <> writes:
>
> > i'm an ultra late-comer to awk - only discovering it in 2017-2018. and
> > the moment i found it, i realized nearly all else - perl R python java
> > C# - can be thrown straight into the toilet, if performance is a key
> > criteria for the task at a hand
> I would rather go for TCW (Total Cost of Wizardry): A competent Python
> programmer once consulted me on performance tuning for an (ASCII data
> mangling) script he had written (which took him about 30 min). It was
> running since 10 min, and no end in sight according to a monitor on the
> (transformed) output. After he had explained the task at hand, I replied
> that I would not use Python, but rather some Unix command line tools. I
> started immediately, cobbled something together (awk featured
> prominently among other usual suspects, such as tr, sed, cut, grep). It
> delivered the desired results before his Python script was finished. So
> the final tally was "10 min" versus "> 30 min + 10 min + 10 min".
>
> Once the logic becomes more intricate, I will usually go for Python
> though, so I will use awk mostly for command line use, rarely as a file
> to be run by "awk -f".
>
> I was also a later-comer to this tool. When I started to learn Perl in
> the late 90s, I learned that it was a superset to sed and awk (coming
> even with conversion scripts), and so I gave the older tools another try
> (the "man" pages were completely incomprehensible to me before, I could
> not wrap my head around stream processing). Once it clicked, I rarely
> used Perl anywmore.
>
> Same goes for spreadsheet tools, for which I also seldom feel the need.
>
> Best regards
>
> Axel

@Axel :

interesting that you brought up unix CLI utilities : gnu-sed is almost unbelievably slow in this very basic replacement task -

- replacing all odd-digits with a 1, and
- replacing all even digits with a 0

Maybe you can tell me what I'm doing wrong in gnu-sed, cuz I can't seem to be able to even get it to within 3x slower than mawk-1.3.4.

f='jwengowengonoewgnwoegn.txt'; gwc -lcm "${f}";
echo;
( time ( pv -q < "${f}" | mawk 'BEGIN {FS=(FS0="[3579]")(FS0);OFS="11"; ORS=""; RS="^$" } (NF=NF)+gsub("[2468][2468]","00")+gsub("[2468]","0")+gsub(FS0,"1")' | pvE 0.25 mid | xxh128sum ) ) | lgp3 ;
sleep 1;
( time ( pv -q < "${f}" | gsed -zE 's/[3579]{2}/11/g;s/[2468]{2}/00/g;s/[3579]/1/g;s/[2468]/0/g' | pvE 0.25 mid | xxh128sum ) ) | lgp3

1 275150613 275150613 jwengowengonoewgnwoegn.txt

mid: 262MiB 0:00:02 [94.8MiB/s] [94.8MiB/s] [<=> ]
( pv -q < "${f}" | mawk | pvE 0.25 mid | xxh128sum; ) 2.52s user 0.35s system 102% cpu 2.788 total
1fee31c387358cc9b13eea846746dfbc stdin

mid: 262MiB 0:00:08 [30.1MiB/s] [30.1MiB/s] [<=> ]
( pv -q < "${f}" | gsed -zE | pvE 0.25 mid | xxh128sum; ) 8.57s user 0.29s system 101% cpu 8.745 total
1fee31c387358cc9b13eea846746dfbc stdin

The 4Chan Teller

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor