Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

In computing, the mean time to failure keeps getting shorter.


devel / comp.lang.awk / Breaking a table of record rows into an array

SubjectAuthor
* Breaking a table of record rows into an arrayMr. Man-wai Chang
`* Re: Breaking a table of record rows into an arrayJanis Papanagnou
 +- Re: Breaking a table of record rows into an arrayMr. Man-wai Chang
 `* Re: Breaking a table of record rows into an arrayMr. Man-wai Chang
  +- Re: Breaking a table of record rows into an arrayKeith Thompson
  +- Re: Breaking a table of record rows into an arrayJanis Papanagnou
  `* Re: Breaking a table of record rows into an arrayEd Morton
   `* Re: Breaking a table of record rows into an arrayAharon Robbins
    +* Re: Breaking a table of record rows into an arrayKeith Thompson
    |+* Re: Breaking a table of record rows into an arrayKaz Kylheku
    ||`* Re: Breaking a table of record rows into an arrayKeith Thompson
    || +* Re: Breaking a table of record rows into an arrayKaz Kylheku
    || |+* Re: Breaking a table of record rows into an arrayEd Morton
    || ||`- Re: Breaking a table of record rows into an arrayKaz Kylheku
    || |`- Re: Breaking a table of record rows into an arrayKeith Thompson
    || `- Re: Breaking a table of record rows into an arrayKaz Kylheku
    |`* Re: Breaking a table of record rows into an arrayAharon Robbins
    | +- Re: Breaking a table of record rows into an arrayKeith Thompson
    | `* Re: Breaking a table of record rows into an arrayEd Morton
    |  `* Re: Breaking a table of record rows into an arrayEd Morton
    |   `- Re: Breaking a table of record rows into an arrayEd Morton
    `- Re: Breaking a table of record rows into an arrayEd Morton

1
Breaking a table of record rows into an array

<urslg4$18isd$1@toylet.eternal-september.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1774&group=comp.lang.awk#1774

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!toylet.eternal-september.org!.POSTED!not-for-mail
From: toylet.toylet@gmail.com (Mr. Man-wai Chang)
Newsgroups: comp.lang.awk
Subject: Breaking a table of record rows into an array
Date: Fri, 1 Mar 2024 21:33:55 +0800
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <urslg4$18isd$1@toylet.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 1 Mar 2024 13:33:56 -0000 (UTC)
Injection-Info: toylet.eternal-september.org; posting-host="a5dfc74f3d92357c9c3e5d43cc47f31b";
logging-data="1330061"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX189Tz1QQlw87CPSnhRCi7MG"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:SXhLlbo1pRwwWR3llOTDnpeLyoQ=
Content-Language: en-US
 by: Mr. Man-wai Chang - Fri, 1 Mar 2024 13:33 UTC

I am new to Awk programmin.

Given a text table with the following sample entry:

[ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6]
frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]

How do you use Awk to quickly & easily break it into:

bssid="04:9F:xx:xx:xx:xx";
ssid[bssid]="[HOME]";
channel[bssid]="6";
frequency[bssid]="2437";
.....
rate[bssid]="450;
enc[bssid]="Group-AES-CCMP CCMP PSK2";

Re: Breaking a table of record rows into an array

<ursq3s$19kef$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1776&group=comp.lang.awk#1776

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Fri, 1 Mar 2024 15:52:42 +0100
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <ursq3s$19kef$1@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 1 Mar 2024 14:52:44 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="573b6f3271b5f86b0906bbb1ccc77221";
logging-data="1364431"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19xKF6J1U/oYrbBAg78e7tW"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:xnI4MLRBFctrKpEwKHdPq1I0Mjo=
In-Reply-To: <urslg4$18isd$1@toylet.eternal-september.org>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 1 Mar 2024 14:52 UTC

On 01.03.2024 14:33, Mr. Man-wai Chang wrote:
> I am new to Awk programmin.
>
> Given a text table with the following sample entry:
>
> [ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6]
> frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
> dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]

Is that all on one line? (If it's on multiple lines you should
provide more context information, how more than one records are
separated from each other.)

>
> How do you use Awk to quickly & easily break it into:

The nasty thing is the nested '[...]'.

One quick way is to choose an appropriate field separator. For
example

BEGIN { FS="] " }
{ for (i=1; i<=NF; i++)
print $i
}

will produce on one data line like the above (it also works if
the data is spread across three lines, but you still need to
know the record separators then)...

[ 8
SSID[ [HOME]
BSSID[04:9F:xx:xx:xx:xx
channel[ 6]
frequency[2437
numsta[1
rssi[-63
noise[-75
beacon[98
cap[1411]
dtim[0
rate[450
enc[Group-AES-CCMP CCMP PSK2

If the basic splitting is okay you can do the formatting;
using sub() or gsub() on $i to remove/replace parts of the
text (e.g. to remove undesired spaces), use string
concatenation (e.g. to add the "]" again which had been
removed with the field splitting), etc., whatever needed.

Janis

>
> bssid="04:9F:xx:xx:xx:xx";
> ssid[bssid]="[HOME]";
> channel[bssid]="6";
> frequency[bssid]="2437";
> ....
> rate[bssid]="450;
> enc[bssid]="Group-AES-CCMP CCMP PSK2";

Re: Breaking a table of record rows into an array

<ursvj5$1an2m$3@toylet.eternal-september.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1780&group=comp.lang.awk#1780

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!toylet.eternal-september.org!.POSTED!not-for-mail
From: toylet.toylet@gmail.com (Mr. Man-wai Chang)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Sat, 2 Mar 2024 00:26:12 +0800
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <ursvj5$1an2m$3@toylet.eternal-september.org>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 1 Mar 2024 16:26:13 -0000 (UTC)
Injection-Info: toylet.eternal-september.org; posting-host="a5dfc74f3d92357c9c3e5d43cc47f31b";
logging-data="1399894"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ZZ0AsAEwH3kgTzQOv3jPe"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:r15fckWIy7n0sxDUn4Ar4u+tQ5U=
In-Reply-To: <ursq3s$19kef$1@dont-email.me>
Content-Language: en-US
 by: Mr. Man-wai Chang - Fri, 1 Mar 2024 16:26 UTC

On 1/3/2024 10:52 pm, Janis Papanagnou wrote:
>
> The nasty thing is the nested '[...]'.
>
> One quick way is to choose an appropriate field separator. For
> example
>

Even more nasty is that wifi SSID can use any kind of printable
characters, INCLUDING Unicode! :)

Some hardware manufactures like Cisco do restrict the printable
characters you can use in setting the SSID.

Re: Breaking a table of record rows into an array

<usnfod$3o4es$1@toylet.eternal-september.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1909&group=comp.lang.awk#1909

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!toylet.eternal-september.org!.POSTED!not-for-mail
From: toylet.toylet@gmail.com (Mr. Man-wai Chang)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Tue, 12 Mar 2024 01:41:32 +0800
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <usnfod$3o4es$1@toylet.eternal-september.org>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 11 Mar 2024 17:41:34 -0000 (UTC)
Injection-Info: toylet.eternal-september.org; posting-host="e10f395b0563ffe3f32c382f0dc3e156";
logging-data="3936732"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX191Xpoh2x97tqAgGmacCYf6"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:3hz/TvwyDeya9IyAZ8db7YUwrn0=
Content-Language: en-US
In-Reply-To: <ursq3s$19kef$1@dont-email.me>
 by: Mr. Man-wai Chang - Mon, 11 Mar 2024 17:41 UTC

On 1/3/2024 10:52 pm, Janis Papanagnou wrote:
>
> BEGIN { FS="] " }
> { for (i=1; i<=NF; i++)
> print $i
> }

Use of `NF` in awk command - Stack Overflow
https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command

Re: Breaking a table of record rows into an array

<87a5n4a9ny.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1913&group=comp.lang.awk#1913

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Mon, 11 Mar 2024 11:46:41 -0700
Organization: None to speak of
Lines: 17
Message-ID: <87a5n4a9ny.fsf@nosuchdomain.example.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="803e1d91970c7135fb6b17cffb70db2d";
logging-data="3955985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+z8MEPGVE42sHD08vZ327I"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:m+5UCtN09ePZ4iqXCtWVypUEwu8=
sha1:oT43inRwWDICDOb2Vjew8uGyayY=
 by: Keith Thompson - Mon, 11 Mar 2024 18:46 UTC

"Mr. Man-wai Chang" <toylet.toylet@gmail.com> writes:
> On 1/3/2024 10:52 pm, Janis Papanagnou wrote:
>> BEGIN { FS="] " }
>> { for (i=1; i<=NF; i++)
>> print $i
>> }
>
> Use of `NF` in awk command - Stack Overflow
> https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command

That's a question about code that overwrites the value of NF.
How is it relevant?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Breaking a table of record rows into an array

<uso2t3$3sfn9$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1914&group=comp.lang.awk#1914

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Tue, 12 Mar 2024 00:08:17 +0100
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <uso2t3$3sfn9$1@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me> <usnfod$3o4es$1@toylet.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 11 Mar 2024 23:08:19 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9597ebc7963ef34d6ac76127fb37f8e8";
logging-data="4079337"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AX2hyiDIx/thX4oxr+VG7"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:S8rwKmbaGI6+ZZ/ZmCEtzDc3vA0=
In-Reply-To: <usnfod$3o4es$1@toylet.eternal-september.org>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Mon, 11 Mar 2024 23:08 UTC

On 11.03.2024 18:41, Mr. Man-wai Chang wrote:
> On 1/3/2024 10:52 pm, Janis Papanagnou wrote:
>>
>> BEGIN { FS="] " }
>> { for (i=1; i<=NF; i++)
>> print $i
>> }
>
> Use of `NF` in awk command - Stack Overflow

So what?

You want a more cryptic way? - Here it is...

BEGIN { FS="] " ; OFS="\n" }
{ NF=NF } 1

or

BEGIN { FS="] " ; OFS="\n" }
{ $1=$1 } 1

Mind, though, that for a program skeleton to solve your task
my original code is easier to adjust for your data processing.
You are aware that it's just the first step and needs further
processing, aren't you?

Janis

Re: Breaking a table of record rows into an array

<usqkgn$he7u$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1921&group=comp.lang.awk#1921

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Tue, 12 Mar 2024 17:21:09 -0500
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <usqkgn$he7u$2@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me> <usnfod$3o4es$1@toylet.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 12 Mar 2024 22:21:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="259e76a226586613e5049e2673b1ef49";
logging-data="571646"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kOS2Da2AIB2Pwwc/bPmI0"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:03R7dWNGI8KIcgoYARDihasd+3Y=
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 240312-6, 3/12/2024), Outbound message
In-Reply-To: <usnfod$3o4es$1@toylet.eternal-september.org>
 by: Ed Morton - Tue, 12 Mar 2024 22:21 UTC

On 3/11/2024 12:41 PM, Mr. Man-wai Chang wrote:
> On 1/3/2024 10:52 pm, Janis Papanagnou wrote:
>>
>>    BEGIN { FS="] " }
>>    { for (i=1; i<=NF; i++)
>>        print $i
>>    }
>
> Use of `NF` in awk command - Stack Overflow
> https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command

Why did you post that link to an apparently unrelated question which has
all wrong answers (or incomplete at best - the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value)?

Please always provide enough context in your posts for us to be able to
understand why you're posting.

Ed.

Re: Breaking a table of record rows into an array

<65f17028$0$707$14726298@news.sunsite.dk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1925&group=comp.lang.awk#1925

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
References: <urslg4$18isd$1@toylet.eternal-september.org> <ursq3s$19kef$1@dont-email.me> <usnfod$3o4es$1@toylet.eternal-september.org> <usqkgn$he7u$2@dont-email.me>
Organization: non
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: arnold@freefriends.org (Aharon Robbins)
Originator: arnold@freefriends.org (Aharon Robbins)
Date: 13 Mar 2024 09:21:44 GMT
Lines: 14
Message-ID: <65f17028$0$707$14726298@news.sunsite.dk>
NNTP-Posting-Host: 5842e0ed.news.sunsite.dk
X-Trace: 1710321704 news.sunsite.dk 707 arnold@skeeve.com/198.99.81.75:59134
X-Complaints-To: staff@sunsite.dk
 by: Aharon Robbins - Wed, 13 Mar 2024 09:21 UTC

In article <usqkgn$he7u$2@dont-email.me>,
Ed Morton <mortonspam@gmail.com> wrote:
>the effect of setting `NF` is
>undefined behavior per POSIX and so will do different things in
>different awk variants and even in 1 awk variant can behave differently
>depending on whether you're setting it to a higher or lower than
>original value

This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.

Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.

Re: Breaking a table of record rows into an array

<87y1am5cfo.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1928&group=comp.lang.awk#1928

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 09:22:35 -0700
Organization: None to speak of
Lines: 42
Message-ID: <87y1am5cfo.fsf@nosuchdomain.example.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="3a33dee2b2d0e1e9aaa54021fdfdfa61";
logging-data="1098291"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/VO5c40zm8Ru9WCT4QflB4"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:qQSv8PSgtROnj90EhwDmb7Xs4As=
sha1:5o9gMYTmGgP0kS1UAHp4XX5baYw=
 by: Keith Thompson - Wed, 13 Mar 2024 16:22 UTC

arnold@freefriends.org (Aharon Robbins) writes:
> In article <usqkgn$he7u$2@dont-email.me>,
> Ed Morton <mortonspam@gmail.com> wrote:
>>the effect of setting `NF` is
>>undefined behavior per POSIX and so will do different things in
>>different awk variants and even in 1 awk variant can behave differently
>>depending on whether you're setting it to a higher or lower than
>>original value
>
> This is not true. The effect of setting NF was well defined
> by the original awk book and also in POSIX.
>
> Decreasing NF throws away fields. Increasing NF adds the
> intervening fields with the null string as their values
> and rebuilds the record.

I don't see that in the POSIX specification.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
"""
NF
The number of fields in the current record. Inside a BEGIN action,
the use of NF is undefined unless a getline function without a var
argument is executed previously. Inside an END action, NF shall
retain the value it had for the last record read, unless a
subsequent, redirected, getline function without a var argument is
performed prior to entering the END action.
"""

I don't see an explicit statement that assigning to NF has undefined
behavior. The last sentence seems to imply, if taken literally, that
assigning to NF doesn't change its value, at least within an END
section. Perhaps it's merely an oversight, or perhaps I've missed
something.

Do you see something in POSIX that defines the behavior of assigning to
NF?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Breaking a table of record rows into an array

<20240313110839.989@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1930&group=comp.lang.awk#1930

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 18:24:37 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <20240313110839.989@kylheku.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org> <usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com>
Injection-Date: Wed, 13 Mar 2024 18:24:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1e487003ba0b33508276944d7fda6b3a";
logging-data="1149945"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18FSAWe33MTWsfu/spPesFYVYBZmnlIxWc="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:4MSW20lZZWRJAd7ujfTiOAA35Ig=
 by: Kaz Kylheku - Wed, 13 Mar 2024 18:24 UTC

On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> arnold@freefriends.org (Aharon Robbins) writes:
>> In article <usqkgn$he7u$2@dont-email.me>,
>> Ed Morton <mortonspam@gmail.com> wrote:
>>>the effect of setting `NF` is
>>>undefined behavior per POSIX and so will do different things in
>>>different awk variants and even in 1 awk variant can behave differently
>>>depending on whether you're setting it to a higher or lower than
>>>original value
>>
>> This is not true. The effect of setting NF was well defined
>> by the original awk book and also in POSIX.
>>
>> Decreasing NF throws away fields. Increasing NF adds the
>> intervening fields with the null string as their values
>> and rebuilds the record.
>
> I don't see that in the POSIX specification.

The key is this:

References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.

NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.

That implies it must cease to exist; i.e. be destroyed. If setting NF = 4 were
to restore $4 then that would mean it had continued to exist, but was only
hidden.

The behavior is present in GNU Awk, Mawk, BusyBox Awk and others.

I reproduced the behavior carefully in the awk macro of TXR Lisp:

$ echo '1 2 3 4' | txr -e '(awk (t (set nf 1) (set nf 3) (prn [f 1])))'

$ echo '1 2 3 4' | txr -e '(awk (t (set nf 3) (prn [f 1])))'
2

> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
> """
> NF
> The number of fields in the current record. Inside a BEGIN action,
> the use of NF is undefined unless a getline function without a var
> argument is executed previously. Inside an END action, NF shall
> retain the value it had for the last record read, unless a
> subsequent, redirected, getline function without a var argument is
> performed prior to entering the END action.

This looks defective. The value of NF observed in END must obviously
be the last stored one, however it was stored, whether by assignment
or getline.

Note that NF is also recalculated if $0 is assigned, which is
explicitly required in the document; it is glaringly defective to
be appearing to be making an exception for getline but not for
assignment to $0.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Breaking a table of record rows into an array

<87h6h96df7.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1931&group=comp.lang.awk#1931

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 14:15:56 -0700
Organization: None to speak of
Lines: 103
Message-ID: <87h6h96df7.fsf@nosuchdomain.example.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com>
<20240313110839.989@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="3a33dee2b2d0e1e9aaa54021fdfdfa61";
logging-data="1224164"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+umJZHegcVa4dfXd5M2abv"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:9DWeerJelzfoeCwPXREsnDzwMUc=
sha1:GMuitHnhtqAJYOayxANeAIXd2Y0=
 by: Keith Thompson - Wed, 13 Mar 2024 21:15 UTC

Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> arnold@freefriends.org (Aharon Robbins) writes:
>>> In article <usqkgn$he7u$2@dont-email.me>,
>>> Ed Morton <mortonspam@gmail.com> wrote:
>>>>the effect of setting `NF` is
>>>>undefined behavior per POSIX and so will do different things in
>>>>different awk variants and even in 1 awk variant can behave differently
>>>>depending on whether you're setting it to a higher or lower than
>>>>original value
>>>
>>> This is not true. The effect of setting NF was well defined
>>> by the original awk book and also in POSIX.
>>>
>>> Decreasing NF throws away fields. Increasing NF adds the
>>> intervening fields with the null string as their values
>>> and rebuilds the record.
>>
>> I don't see that in the POSIX specification.
>
> The key is this:
>
> References to nonexistent fields (that is, fields after $NF), shall
> evaluate to the uninitialized value.
>
> NF is assignable, and fields after $NF do not exist. Thus if we
> have four fields and set NF = 3, then $4 doesn't exist.

That describes what happens if NF is modified by assignment, but I don't
see that it implies that such an assignment is allowed.

> That implies it must cease to exist; i.e. be destroyed. If setting NF = 4 were
> to restore $4 then that would mean it had continued to exist, but was only
> hidden.
>
> The behavior is present in GNU Awk, Mawk, BusyBox Awk and others.

I accept that most, quite possible all, implementations of Awk allow
assignment to NF, with the semantics of dropping fields after $NF or
adding new fields if the value decreases or increases, respectively.
And on the basis of that, I accept that POSIX *should* specify the
behavior of assigning to NF -- especially if the original AWK book
defines it. The second edition briefly mentions modifying NF:
"Conversely, if NF changes, $0 is recomputed when its value is needed."

But I can imagine a hypothetical awk-like language in which assigning to
NF has undefined behavior. My question is, how does the POSIX
specification not describe that language?

Looking more closely at
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
it can be argued that assigning to NF *is* well defined, but it could be
much clearer. The syntax for a simple assignment is:
lvalue '=' expr
where an lvalue is one of:
NAME
NAME '[' expr_list ']'
'$' expr
and:
The token NAME shall consist of a word that is not a keyword or a
name of a built-in function and is not followed immediately (without
any delimiters) by the '(' character.

Which implies that, for example, `NF = 10` is valid.

Also, NF is a "special variable", which weakly implies that it's
assignable.

On the other hand, it also implies that `foo = 42` is valid where `foo`
is the name of a user-defined function (gawk disallows it). It should
say that the name of a user-defined function is not an lvalue.

The POSIX description reads to me as if the authors just didn't think
about whether assigning to NR, or to user-defined function names, should
be permitted. The behavior of adding or removing fields when NR is
modified by assignment is, I suggest, something that should be stated
explicitly.

[...]

>> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
>> """
>> NF
>> The number of fields in the current record. Inside a BEGIN action,
>> the use of NF is undefined unless a getline function without a var
>> argument is executed previously. Inside an END action, NF shall
>> retain the value it had for the last record read, unless a
>> subsequent, redirected, getline function without a var argument is
>> performed prior to entering the END action.
>
> This looks defective. The value of NF observed in END must obviously
> be the last stored one, however it was stored, whether by assignment
> or getline.
>
> Note that NF is also recalculated if $0 is assigned, which is
> explicitly required in the document; it is glaringly defective to
> be appearing to be making an exception for getline but not for
> assignment to $0.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Breaking a table of record rows into an array

<20240313143157.115@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1932&group=comp.lang.awk#1932

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 21:49:26 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 75
Message-ID: <20240313143157.115@kylheku.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org> <usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com> <20240313110839.989@kylheku.com>
<87h6h96df7.fsf@nosuchdomain.example.com>
Injection-Date: Wed, 13 Mar 2024 21:49:26 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1e487003ba0b33508276944d7fda6b3a";
logging-data="1237000"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wPIji/X7xSJR8Z4Jzj1/5TfvLg9eDTdg="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:CRJNXlB1fvi38VOa4phWPsMJNcU=
 by: Kaz Kylheku - Wed, 13 Mar 2024 21:49 UTC

On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>> arnold@freefriends.org (Aharon Robbins) writes:
>>>> In article <usqkgn$he7u$2@dont-email.me>,
>>>> Ed Morton <mortonspam@gmail.com> wrote:
>>>>>the effect of setting `NF` is
>>>>>undefined behavior per POSIX and so will do different things in
>>>>>different awk variants and even in 1 awk variant can behave differently
>>>>>depending on whether you're setting it to a higher or lower than
>>>>>original value
>>>>
>>>> This is not true. The effect of setting NF was well defined
>>>> by the original awk book and also in POSIX.
>>>>
>>>> Decreasing NF throws away fields. Increasing NF adds the
>>>> intervening fields with the null string as their values
>>>> and rebuilds the record.
>>>
>>> I don't see that in the POSIX specification.
>>
>> The key is this:
>>
>> References to nonexistent fields (that is, fields after $NF), shall
>> evaluate to the uninitialized value.
>>
>> NF is assignable, and fields after $NF do not exist. Thus if we
>> have four fields and set NF = 3, then $4 doesn't exist.
>
> That describes what happens if NF is modified by assignment, but I don't
> see that it implies that such an assignment is allowed.

"The left-hand side of an assignment and the target of increment and
decrement operators can be one of a variable, an array with index, or a
field selector."

NF is described as a variable. Some unique remarks are made about NF,
but none deny that it's assignable like any other variable.

> But I can imagine a hypothetical awk-like language in which assigning to
> NF has undefined behavior. My question is, how does the POSIX
> specification not describe that language?

That language is failing to support an instance of a variable
being the left operand of an assignment, which a variable "can be".

It looks like the violation of a requirement.

> On the other hand, it also implies that `foo = 42` is valid where `foo`
> is the name of a user-defined function (gawk disallows it).

POSIX does say that "[t]he same name shall not be used as both a
function parameter name and as the name of a function or a special awk
variable." So foo = 42 isn't valid if foo is already a function.

Also: "The same name shall not be used both as a variable name with
global scope and as the name of a function. The same name shall not be
used within the same scope both as a scalar variable and as an array."

All that said, the business of the NF tail wagging the $1, $2, ...
legs of the dog should be the target of at least one clarifying remark,
and the other defects should also be corrected:

- In a BEGIN clause NF should be undefined unless any action
whatsoever is executed that sets its value: direct assignment,
use of getline or assignment to $0.

- At the start of the execution of an END clause, NF retains
its current value (or undefined status, if it was never set);
the END clause has no implicit effect on NF.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Breaking a table of record rows into an array

<ustcp3$1734f$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1933&group=comp.lang.awk#1933

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 18:27:30 -0500
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <ustcp3$1734f$1@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me> <usnfod$3o4es$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me> <65f17028$0$707$14726298@news.sunsite.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 13 Mar 2024 23:27:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="42ac1681d1342478fe121f06fe65a95c";
logging-data="1281167"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+jfxv4uB9hFV6E7qSSziuE"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:wydKlnKuLDQd9KDghVWcTJJets0=
X-Antivirus-Status: Clean
X-Antivirus: Avast (VPS 240313-12, 3/13/2024), Outbound message
Content-Language: en-US
In-Reply-To: <65f17028$0$707$14726298@news.sunsite.dk>
 by: Ed Morton - Wed, 13 Mar 2024 23:27 UTC

On 3/13/2024 4:21 AM, Aharon Robbins wrote:
> In article <usqkgn$he7u$2@dont-email.me>,
> Ed Morton <mortonspam@gmail.com> wrote:
>> the effect of setting `NF` is
>> undefined behavior per POSIX and so will do different things in
>> different awk variants and even in 1 awk variant can behave differently
>> depending on whether you're setting it to a higher or lower than
>> original value
>
> This is not true. The effect of setting NF was well defined
> by the original awk book and also in POSIX.
>
> Decreasing NF throws away fields. Increasing NF adds the
> intervening fields with the null string as their values
> and rebuilds the record.

Arnold - I don't know about the original awk book but POSIX
(https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html)
only defines what happens if you populate $X, not what happens if you
populate NF. If you set $X awk rebuilds the record and if X is some
value higher than the current value of NF then awk adds the intervening
fields with the null string as their values, but POSIX doesn't specify
what happens if you set NF to any value.

If I'm wrong about that I'd love for you or anyone else to point me to
the section that defines it as I've scoured the standard several times
looking for it over the years.

Ed.

Re: Breaking a table of record rows into an array

<ustdr6$17a4o$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1934&group=comp.lang.awk#1934

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 18:45:41 -0500
Organization: A noiseless patient Spider
Lines: 109
Message-ID: <ustdr6$17a4o$1@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me> <usnfod$3o4es$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me> <65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com> <20240313110839.989@kylheku.com>
<87h6h96df7.fsf@nosuchdomain.example.com> <20240313143157.115@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 13 Mar 2024 23:45:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="42ac1681d1342478fe121f06fe65a95c";
logging-data="1288344"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3aFVg3gxTxH1RnegmyXcV"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:+XtTUerI4Et0AxqIQ7GnSvicoYE=
X-Antivirus: Avast (VPS 240313-12, 3/13/2024), Outbound message
In-Reply-To: <20240313143157.115@kylheku.com>
X-Antivirus-Status: Clean
Content-Language: en-US
 by: Ed Morton - Wed, 13 Mar 2024 23:45 UTC

On 3/13/2024 4:49 PM, Kaz Kylheku wrote:
> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>>> arnold@freefriends.org (Aharon Robbins) writes:
>>>>> In article <usqkgn$he7u$2@dont-email.me>,
>>>>> Ed Morton <mortonspam@gmail.com> wrote:
>>>>>> the effect of setting `NF` is
>>>>>> undefined behavior per POSIX and so will do different things in
>>>>>> different awk variants and even in 1 awk variant can behave differently
>>>>>> depending on whether you're setting it to a higher or lower than
>>>>>> original value
>>>>>
>>>>> This is not true. The effect of setting NF was well defined
>>>>> by the original awk book and also in POSIX.
>>>>>
>>>>> Decreasing NF throws away fields. Increasing NF adds the
>>>>> intervening fields with the null string as their values
>>>>> and rebuilds the record.
>>>>
>>>> I don't see that in the POSIX specification.
>>>
>>> The key is this:
>>>
>>> References to nonexistent fields (that is, fields after $NF), shall
>>> evaluate to the uninitialized value.
>>>
>>> NF is assignable, and fields after $NF do not exist. Thus if we
>>> have four fields and set NF = 3, then $4 doesn't exist.

That's a bit like the argument from an old episode of the comedy TV show
"Yes, Prime Minister" in the UK where his aide says (paraphrased) "Some
country has done X, we must go something. War is something, therefore we
must go to war".

Being able to set NF to 3 does not mean you must delete $4. Why not
delete $1 or $2 instead? You'd still end up with 3 fields to satisfy the
value of NF. Lots of things you can do are undefined by POSIX despite
how sensible some impacts may seem, assigning a value to NF is just 1
more of them.

You could say that "$0 holds the last record read, you can use $0 in the
END section, therefore in the END section $0 must contain the value of
the last record read". Except that's not true. From the gawk manual
(https://www.gnu.org/software/gawk/manual/html_node/I_002fO-And-BEGIN_002fEND.html#I_002fO-And-BEGIN_002fEND):

----
Most probably due to an oversight, the standard does not say that $0 is
also preserved, although logically one would think that it should be. In
fact, all of BWK awk, mawk, and gawk preserve the value of $0 for use in
END rules. Be aware, however, that some other implementations and many
older versions of Unix awk do not.
----

>>
>> That describes what happens if NF is modified by assignment, but I don't
>> see that it implies that such an assignment is allowed.
>
> "The left-hand side of an assignment and the target of increment and
> decrement operators can be one of a variable, an array with index, or a
> field selector."
>
> NF is described as a variable. Some unique remarks are made about NF,
> but none deny that it's assignable like any other variable.
>
>> But I can imagine a hypothetical awk-like language in which assigning to
>> NF has undefined behavior. My question is, how does the POSIX
>> specification not describe that language?
>
> That language is failing to support an instance of a variable
> being the left operand of an assignment, which a variable "can be".
>
> It looks like the violation of a requirement.
>
>> On the other hand, it also implies that `foo = 42` is valid where `foo`
>> is the name of a user-defined function (gawk disallows it).
>
> POSIX does say that "[t]he same name shall not be used as both a
> function parameter name and as the name of a function or a special awk
> variable." So foo = 42 isn't valid if foo is already a function.
>
> Also: "The same name shall not be used both as a variable name with
> global scope and as the name of a function. The same name shall not be
> used within the same scope both as a scalar variable and as an array."
>
> All that said, the business of the NF tail wagging the $1, $2, ...
> legs of the dog should be the target of at least one clarifying remark,
> and the other defects should also be corrected:
>
> - In a BEGIN clause NF should be undefined unless any action
> whatsoever is executed that sets its value: direct assignment,
> use of getline or assignment to $0.
>
> - At the start of the execution of an END clause, NF retains
> its current value (or undefined status, if it was never set);
> the END clause has no implicit effect on NF.
>

All of the above claims that POSIX states you can assign a value to NF.
That may or may not be correct, I expect it is but I don't care because
nothing above nor in the POSIX spec states what the IMPACT is of
assigning a value to NF. As far as I can see there is absolutely nothing
in the POSIX spec that says anything like "if you set NF to a higher
value fields will be created and if you set NF to a lower value fields
will be removed" but I'd honestly love to be proven wrong and shown the
section that does defined the impact of assigning a higher or lower
value to NF.

Ed.

Re: Breaking a table of record rows into an array

<20240313170627.517@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1935&group=comp.lang.awk#1935

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Thu, 14 Mar 2024 00:17:48 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <20240313170627.517@kylheku.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org> <usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com> <20240313110839.989@kylheku.com>
<87h6h96df7.fsf@nosuchdomain.example.com> <20240313143157.115@kylheku.com>
<ustdr6$17a4o$1@dont-email.me>
Injection-Date: Thu, 14 Mar 2024 00:17:48 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5144a5b70d2242e56a2fa01db8244180";
logging-data="1300199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1ccvvKd3/oVQM6ciALEVxhm+BsXE36V4="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:mFHRATevvsFzZQusZSo+lCO9Wzs=
 by: Kaz Kylheku - Thu, 14 Mar 2024 00:17 UTC

On 2024-03-13, Ed Morton <mortonspam@gmail.com> wrote:
> On 3/13/2024 4:49 PM, Kaz Kylheku wrote:
>> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>>> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>>>> arnold@freefriends.org (Aharon Robbins) writes:
>>>>>> In article <usqkgn$he7u$2@dont-email.me>,
>>>>>> Ed Morton <mortonspam@gmail.com> wrote:
>>>>>>> the effect of setting `NF` is
>>>>>>> undefined behavior per POSIX and so will do different things in
>>>>>>> different awk variants and even in 1 awk variant can behave differently
>>>>>>> depending on whether you're setting it to a higher or lower than
>>>>>>> original value
>>>>>>
>>>>>> This is not true. The effect of setting NF was well defined
>>>>>> by the original awk book and also in POSIX.
>>>>>>
>>>>>> Decreasing NF throws away fields. Increasing NF adds the
>>>>>> intervening fields with the null string as their values
>>>>>> and rebuilds the record.
>>>>>
>>>>> I don't see that in the POSIX specification.
>>>>
>>>> The key is this:
>>>>
>>>> References to nonexistent fields (that is, fields after $NF), shall
>>>> evaluate to the uninitialized value.
>>>>
>>>> NF is assignable, and fields after $NF do not exist. Thus if we
>>>> have four fields and set NF = 3, then $4 doesn't exist.
>
> That's a bit like the argument from an old episode of the comedy TV show
> "Yes, Prime Minister"

But that show is the reference model for how ISO and IEEE standarization
works.

> in the UK where his aide says (paraphrased) "Some
> country has done X, we must go something. War is something, therefore we
> must go to war".
>
> Being able to set NF to 3 does not mean you must delete $4.

The passage says that fields do not exist beyond $NF. So if NF
is 3, $4 doesn't exist.

> Why not
> delete $1 or $2 instead?
> You'd still end up with 3 fields to satisfy the
> value of NF.

Because those are less than 3, the value in NF. Those exist.
$2 and $3 exist while NF is originally 4; and continue to
exist if it is decremented to 3. Why would $2 be victimized,
when at no point had NF been less than 2?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Breaking a table of record rows into an array

<20240313171753.836@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1936&group=comp.lang.awk#1936

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Thu, 14 Mar 2024 00:22:56 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <20240313171753.836@kylheku.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org> <usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com> <20240313110839.989@kylheku.com>
<87h6h96df7.fsf@nosuchdomain.example.com>
Injection-Date: Thu, 14 Mar 2024 00:22:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5144a5b70d2242e56a2fa01db8244180";
logging-data="1300199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19/pLaFmjM4K4GOsM6WbQKxxHSETlCLdRs="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:4ckx9eHQtAx7syNKcZ+d7/xbobg=
 by: Kaz Kylheku - Thu, 14 Mar 2024 00:22 UTC

On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> That describes what happens if NF is modified by assignment, but I don't
> see that it implies that such an assignment is allowed.

Here is a problem. In numerous implementations, when you set NF, not
only does that set the number of fields, but $0 is recomputed.
So instead of $1=$1 you can use NF=NF.

$ echo '1 2 3 4' | awk -v OFS=: '{ NF=NF; print $0; }'
1:2:3:4

$ echo '1 2 3 4' | awk -v OFS=: '{ NF=2; print $0; }'
1:2

We can continue to infer that if setting NF causes certain fields to
exist, and not others, then $0 must be reconstituted accordingly,
just like when a field is assigned, according to the idea that Awk
implements a kind of "reactive programming" paradigm whereby $0
and the fields are kept in sync.

But that's going a little unconfortably far on the proverbial limb,
without assurance from the text.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Breaking a table of record rows into an array

<874jd961gc.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1937&group=comp.lang.awk#1937

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 18:34:27 -0700
Organization: None to speak of
Lines: 63
Message-ID: <874jd961gc.fsf@nosuchdomain.example.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<ursq3s$19kef$1@dont-email.me>
<usnfod$3o4es$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com>
<20240313110839.989@kylheku.com>
<87h6h96df7.fsf@nosuchdomain.example.com>
<20240313143157.115@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="c21b1ebd01dd0ccb23ead5461c6d1b5d";
logging-data="1326684"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/SDqJGklhNMRt/wVc258Ib"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:dY4Aa2eQRIw6SnoT8S6nVhtbeoA=
sha1:sS7NeWIzoq6mBoRykA+9RWZZyeo=
 by: Keith Thompson - Thu, 14 Mar 2024 01:34 UTC

Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>>> arnold@freefriends.org (Aharon Robbins) writes:
>>>>> In article <usqkgn$he7u$2@dont-email.me>,
>>>>> Ed Morton <mortonspam@gmail.com> wrote:
>>>>>>the effect of setting `NF` is
>>>>>>undefined behavior per POSIX and so will do different things in
>>>>>>different awk variants and even in 1 awk variant can behave differently
>>>>>>depending on whether you're setting it to a higher or lower than
>>>>>>original value
>>>>>
>>>>> This is not true. The effect of setting NF was well defined
>>>>> by the original awk book and also in POSIX.
>>>>>
>>>>> Decreasing NF throws away fields. Increasing NF adds the
>>>>> intervening fields with the null string as their values
>>>>> and rebuilds the record.
>>>>
>>>> I don't see that in the POSIX specification.
>>>
>>> The key is this:
>>>
>>> References to nonexistent fields (that is, fields after $NF), shall
>>> evaluate to the uninitialized value.
>>>
>>> NF is assignable, and fields after $NF do not exist. Thus if we
>>> have four fields and set NF = 3, then $4 doesn't exist.
>>
>> That describes what happens if NF is modified by assignment, but I don't
>> see that it implies that such an assignment is allowed.
>
> "The left-hand side of an assignment and the target of increment and
> decrement operators can be one of a variable, an array with index, or a
> field selector."
>
> NF is described as a variable. Some unique remarks are made about NF,
> but none deny that it's assignable like any other variable.

OK, I concede. It can be inferred from the POSIX specification that
assigning to NF is allowed.

And the specification is in serious need of a definition of what
assigning to NF actually *does*, other than changing the value of NF.

>> But I can imagine a hypothetical awk-like language in which assigning to
>> NF has undefined behavior. My question is, how does the POSIX
>> specification not describe that language?
>
> That language is failing to support an instance of a variable
> being the left operand of an assignment, which a variable "can be".
>
> It looks like the violation of a requirement.

Agreed. I think.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Breaking a table of record rows into an array

<65f296fc$0$713$14726298@news.sunsite.dk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1939&group=comp.lang.awk#1939

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
References: <urslg4$18isd$1@toylet.eternal-september.org> <usqkgn$he7u$2@dont-email.me> <65f17028$0$707$14726298@news.sunsite.dk> <87y1am5cfo.fsf@nosuchdomain.example.com>
Organization: non
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: arnold@freefriends.org (Aharon Robbins)
Originator: arnold@freefriends.org (Aharon Robbins)
Date: 14 Mar 2024 06:19:40 GMT
Lines: 18
Message-ID: <65f296fc$0$713$14726298@news.sunsite.dk>
NNTP-Posting-Host: 1b35ded3.news.sunsite.dk
X-Trace: 1710397180 news.sunsite.dk 713 arnold@skeeve.com/198.99.81.75:49296
X-Complaints-To: staff@sunsite.dk
 by: Aharon Robbins - Thu, 14 Mar 2024 06:19 UTC

In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>Do you see something in POSIX that defines the behavior of assigning to
>NF?

In the section "Variables and Special Values"

| References to nonexistent fields (that is, fields after $NF), shall
| evaluate to the uninitialized value. Such references shall not create
| new fields. However, assigning to a nonexistent field (for example,
| $(NF+2)=5) shall increase the value of NF; create any intervening fields
| with the uninitialized value; and cause the value of $0 to be
| recomputed, with the fields being separated by the value of OFS. Each
| field variable shall have a string value or an uninitialized value when
| created.

It doesn't say what happens when you do NF -= 2; nonetheless, all
traditional awks throw away fields when you do something like that.

Re: Breaking a table of record rows into an array

<87zfv148ky.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1940&group=comp.lang.awk#1940

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Wed, 13 Mar 2024 23:43:25 -0700
Organization: None to speak of
Lines: 40
Message-ID: <87zfv148ky.fsf@nosuchdomain.example.com>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me>
<65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com>
<65f296fc$0$713$14726298@news.sunsite.dk>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="c21b1ebd01dd0ccb23ead5461c6d1b5d";
logging-data="1548247"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180svOpwvIq1FymsR6cR0uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:BbdJAmfkufJIJCQ0aLgGUHSjFMU=
sha1:HvbueYV3HC+Oj2doa6JyEzSqvA4=
 by: Keith Thompson - Thu, 14 Mar 2024 06:43 UTC

arnold@freefriends.org (Aharon Robbins) writes:
> In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>Do you see something in POSIX that defines the behavior of assigning to
>>NF?
>
> In the section "Variables and Special Values"
>
> | References to nonexistent fields (that is, fields after $NF), shall
> | evaluate to the uninitialized value. Such references shall not create
> | new fields. However, assigning to a nonexistent field (for example,
> | $(NF+2)=5) shall increase the value of NF; create any intervening fields
> | with the uninitialized value; and cause the value of $0 to be
> | recomputed, with the fields being separated by the value of OFS. Each
> | field variable shall have a string value or an uninitialized value when
> | created.
>
> It doesn't say what happens when you do NF -= 2; nonetheless, all
> traditional awks throw away fields when you do something like that.

Kaz already addressed this. It's not sufficiently explicit about this
behavior, but:

""" Kaz:
The key is this:

References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.

NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
"""

(At the time I wasn't convinced that POSIX requires NF to be
assignable.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Breaking a table of record rows into an array

<usukp4$1idsu$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1942&group=comp.lang.awk#1942

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Thu, 14 Mar 2024 05:50:11 -0500
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <usukp4$1idsu$1@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me> <65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com>
<65f296fc$0$713$14726298@news.sunsite.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 14 Mar 2024 10:50:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="42ac1681d1342478fe121f06fe65a95c";
logging-data="1652638"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+YOrj+dAViC5sAZKi3Z0mX"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:hDjw7qGFDVAvUST+Vu1mW2WAiiQ=
In-Reply-To: <65f296fc$0$713$14726298@news.sunsite.dk>
Content-Language: en-US
X-Antivirus-Status: Clean
X-Antivirus: Avast (VPS 240314-0, 3/13/2024), Outbound message
 by: Ed Morton - Thu, 14 Mar 2024 10:50 UTC

On 3/14/2024 1:19 AM, Aharon Robbins wrote:
> In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Do you see something in POSIX that defines the behavior of assigning to
>> NF?
>
> In the section "Variables and Special Values"
>
> | References to nonexistent fields (that is, fields after $NF), shall
> | evaluate to the uninitialized value. Such references shall not create
> | new fields. However, assigning to a nonexistent field (for example,
> | $(NF+2)=5) shall increase the value of NF; create any intervening fields
> | with the uninitialized value; and cause the value of $0 to be
> | recomputed, with the fields being separated by the value of OFS. Each
> | field variable shall have a string value or an uninitialized value when
> | created.
>
> It doesn't say what happens when you do NF -= 2; nonetheless, all
> traditional awks throw away fields when you do something like that.

It doesn't say what happens when you do NF += 2 either. All I'm saying
is that changing the value of NF is undefined behavior per POSIX.

I'm not sure which awks would be considered "traditional" vs otherwise
but AFAIK POSIX is descriptive, i.e. describes how X behaves rather than
dictates the behavior of X, so if the appropriate set of awk variants
all behave the same way for any behavior such as this that's currently
undefined by POSIX (changing the value of NF, the value of $0 in the end
section, and field splitting with a null FS being the 3 most commonly
used cases IMO) then maybe the folks who write that spec could/should
update it to describe that behavior but I don't know which awks all
behave the same way for those cases, nor if that's enough of them for
POSIX to make a definition.

Ed.

Re: Breaking a table of record rows into an array

<usupeo$1jeon$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1943&group=comp.lang.awk#1943

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Thu, 14 Mar 2024 07:09:59 -0500
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <usupeo$1jeon$1@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me> <65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com>
<65f296fc$0$713$14726298@news.sunsite.dk> <usukp4$1idsu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 14 Mar 2024 12:10:00 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bb28aeefde6f9f2b33574faabed4c910";
logging-data="1686295"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+0g3B+A5Ir5oHZJHi4wz8V"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:PP+ZeMaJYWjNhuGntTnsfSZB0dA=
X-Antivirus-Status: Clean
X-Antivirus: Avast (VPS 240314-2, 3/14/2024), Outbound message
Content-Language: en-US
In-Reply-To: <usukp4$1idsu$1@dont-email.me>
 by: Ed Morton - Thu, 14 Mar 2024 12:09 UTC

On 3/14/2024 5:50 AM, Ed Morton wrote:
> On 3/14/2024 1:19 AM, Aharon Robbins wrote:
>> In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
>> Keith Thompson  <Keith.S.Thompson+u@gmail.com> wrote:
>>> Do you see something in POSIX that defines the behavior of assigning to
>>> NF?
>>
>> In the section "Variables and Special Values"
>>
>> | References to nonexistent fields (that is, fields after $NF), shall
>> | evaluate to the uninitialized value. Such references shall not create
>> | new fields. However, assigning to a nonexistent field (for example,
>> | $(NF+2)=5) shall increase the value of NF; create any intervening
>> fields
>> | with the uninitialized value; and cause the value of $0 to be
>> | recomputed, with the fields being separated by the value of OFS. Each
>> | field variable shall have a string value or an uninitialized value when
>> | created.
>>
>> It doesn't say what happens when you do NF -= 2; nonetheless, all
>> traditional awks throw away fields when you do something like that.
>
> It doesn't say what happens when you do NF += 2 either. All I'm saying
> is that changing the value of NF is undefined behavior per POSIX.
>
> I'm not sure which awks would be considered "traditional" vs otherwise
> but AFAIK POSIX is descriptive, i.e. describes how X behaves rather than
> dictates the behavior of X, so if the appropriate set of awk variants
> all behave the same way for any behavior such as this that's currently
> undefined by POSIX (changing the value of NF, the value of $0 in the end
> section, and field splitting with a null FS being the 3 most commonly
> used cases IMO) then maybe the folks who write that spec could/should
> update it to describe that behavior but I don't know which awks all
> behave the same way for those cases, nor if that's enough of them for
> POSIX to make a definition.
>
>     Ed.

I couldn't find any existing tickets so I just created tickets with the
Austin Group to request that definitions for the 3 cases I listed above
be added to the POSIX spec:

1) Changing the value of NF =
https://www.austingroupbugs.net/view.php?id=1820
2) The value of $0, $1, etc. in an END section =
https://www.austingroupbugs.net/view.php?id=1821
3) Splitting using a null field separator =
https://www.austingroupbugs.net/view.php?id=1822

Obviously I've no idea if they'll be implemented or not but AFAIK it
doesn't hurt to ask. I said "in most modern awks..." in each of them, if
anyone knows which specific awks behave in the ways I described (or
which don't) then feel free to comment on the issues if you can, I just
don't have access to multiple awk variants at this time.

Regards,

Ed.

Re: Breaking a table of record rows into an array

<usuqpe$1joik$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1944&group=comp.lang.awk#1944

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.lang.awk
Subject: Re: Breaking a table of record rows into an array
Date: Thu, 14 Mar 2024 07:32:45 -0500
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <usuqpe$1joik$1@dont-email.me>
References: <urslg4$18isd$1@toylet.eternal-september.org>
<usqkgn$he7u$2@dont-email.me> <65f17028$0$707$14726298@news.sunsite.dk>
<87y1am5cfo.fsf@nosuchdomain.example.com>
<65f296fc$0$713$14726298@news.sunsite.dk> <usukp4$1idsu$1@dont-email.me>
<usupeo$1jeon$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 14 Mar 2024 12:32:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bb28aeefde6f9f2b33574faabed4c910";
logging-data="1696340"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/l7mA+2WyUJkbqPEoSxx59"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:VneZTxolWtmu3wFtTLSDWFdL2jM=
Content-Language: en-US
X-Antivirus: Avast (VPS 240314-2, 3/14/2024), Outbound message
In-Reply-To: <usupeo$1jeon$1@dont-email.me>
X-Antivirus-Status: Clean
 by: Ed Morton - Thu, 14 Mar 2024 12:32 UTC

On 3/14/2024 7:09 AM, Ed Morton wrote:
> On 3/14/2024 5:50 AM, Ed Morton wrote:
>> On 3/14/2024 1:19 AM, Aharon Robbins wrote:
>>> In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
>>> Keith Thompson  <Keith.S.Thompson+u@gmail.com> wrote:
>>>> Do you see something in POSIX that defines the behavior of assigning to
>>>> NF?
>>>
>>> In the section "Variables and Special Values"
>>>
>>> | References to nonexistent fields (that is, fields after $NF), shall
>>> | evaluate to the uninitialized value. Such references shall not create
>>> | new fields. However, assigning to a nonexistent field (for example,
>>> | $(NF+2)=5) shall increase the value of NF; create any intervening
>>> fields
>>> | with the uninitialized value; and cause the value of $0 to be
>>> | recomputed, with the fields being separated by the value of OFS. Each
>>> | field variable shall have a string value or an uninitialized value
>>> when
>>> | created.
>>>
>>> It doesn't say what happens when you do NF -= 2; nonetheless, all
>>> traditional awks throw away fields when you do something like that.
>>
>> It doesn't say what happens when you do NF += 2 either. All I'm saying
>> is that changing the value of NF is undefined behavior per POSIX.
>>
>> I'm not sure which awks would be considered "traditional" vs otherwise
>> but AFAIK POSIX is descriptive, i.e. describes how X behaves rather
>> than dictates the behavior of X, so if the appropriate set of awk
>> variants all behave the same way for any behavior such as this that's
>> currently undefined by POSIX (changing the value of NF, the value of
>> $0 in the end section, and field splitting with a null FS being the 3
>> most commonly used cases IMO) then maybe the folks who write that spec
>> could/should update it to describe that behavior but I don't know
>> which awks all behave the same way for those cases, nor if that's
>> enough of them for POSIX to make a definition.
>>
>>      Ed.
>
> I couldn't find any existing tickets so I just created tickets with the
> Austin Group to request that definitions for the 3 cases I listed above
> be added to the POSIX spec:
>
> 1) Changing the value of NF =
> https://www.austingroupbugs.net/view.php?id=1820
> 2) The value of $0, $1, etc. in an END section =
> https://www.austingroupbugs.net/view.php?id=1821
> 3) Splitting using a null field separator =
> https://www.austingroupbugs.net/view.php?id=1822

I just added a final ticket from me for the other undefined behavior I
fairly often see people relying on (e.g. when creating multi-line
records by reading 1 line at a time to handle quoted fields that include
newlines without gawk --csv):

4) Changing the value of NR or FNR =
https://www.austingroupbugs.net/view.php?id=1823

> Obviously I've no idea if they'll be implemented or not but AFAIK it
> doesn't hurt to ask. I said "in most modern awks..." in each of them, if
> anyone knows which specific awks behave in the ways I described (or
> which don't) then feel free to comment on the issues if you can, I just
> don't have access to multiple awk variants at this time.
>
> Regards,
>
>     Ed.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor