Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"Old age and treachery will beat youth and skill every time." -- a coffee cup


devel / comp.lang.awk / Re: "sed" question

SubjectAuthor
* Re: "sed" questionGrant Taylor
`* Re: "sed" questionKeith Thompson
 +* Re: "sed" questionMr. Man-wai Chang
 |`* Re: "sed" questionJanis Papanagnou
 | +* Re: "sed" questionGrant Taylor
 | |+* Re: "sed" questionMr. Man-wai Chang
 | ||`* Re: "sed" questionGeoff Clare
 | || `* Re: "sed" questionAharon Robbins
 | ||  `- Re: "sed" questionGeoff Clare
 | |`- Re: "sed" questionJanis Papanagnou
 | `* Re: "sed" questionMr. Man-wai Chang
 |  `- Re: "sed" questionJanis Papanagnou
 `- Re: "sed" questionKaz Kylheku

1
Re: "sed" question

<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1805&group=comp.lang.awk#1805

  copy link   Newsgroups: comp.unix.shell comp.lang.awk
Followup: comp.lang.awk
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!tncsrv06.tnetconsulting.net!tncsrv09.home.tnetconsulting.net!.POSTED.omega.home.tnetconsulting.net!not-for-mail
From: gtaylor@tnetconsulting.net (Grant Taylor)
Newsgroups: comp.unix.shell,comp.lang.awk
Subject: Re: "sed" question
Followup-To: comp.lang.awk
Date: Thu, 7 Mar 2024 20:38:28 -0600
Organization: TNet Consulting
Message-ID: <usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 8 Mar 2024 02:38:28 -0000 (UTC)
Injection-Info: tncsrv09.home.tnetconsulting.net; posting-host="omega.home.tnetconsulting.net:198.18.1.140";
logging-data="19543"; mail-complaints-to="newsmaster@tnetconsulting.net"
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <87bk7poa7u.fsf@nosuchdomain.example.com>
 by: Grant Taylor - Fri, 8 Mar 2024 02:38 UTC

On 3/7/24 18:09, Keith Thompson wrote:
> I know that's what awk does, but I don't think I would have expected
> it if I didn't know about it.

Okay. I think that's a fair observation.

> $0 is the current input line.

Or $0 is the current /record/ in awk parlance.

> If you don't change anything, or if you modify $0 itself, whitespace
> betweeen fields is preserved.

> If you modify any of the fields, $0 is recomputed and whitespace
> between tokens is collapsed.

I don't agree with that.

% echo 'one two three' | awk '{print $0; print $1,$2,$3}'
one two three
one two three

I didn't /modify/ anything and awk does print the fields with different
white space.

> awk *could* have been defined to preserve inter-field whitespace even
> when you modify individual fields,

I question the veracity of that. Specifically when lengthening or
shortening the value of a field. E.g. replacing "two" with "fifteen".
This is particularly germane when you look at $0 as a fixed width
formatted output.

> and I think I would have found that more intuitive.

I don't agree.

> (And ideally there would be a way to refer to that inter-field
> whitespace.)

Remember, awk is meant for working on fields of data in a record. By
default, the fields are delimited by white space characters. I'll say
it this way, awk is meant for working on the non-white space characters.
Or yet another way, awk is not meant for working on white space charters.

> The fact that modifying a field has the side effect of messing up $0
> seems counterintuitive.

Maybe.

But I think it's one that is acceptable for what awk is intended to do.

> Perhaps the behavior matches your intuition better than it matches
> mine.

I sort of feel like you are wanting to / trying to use awk in places
where sed might be better. sed just sees a string of text and is
ignorant of any structure without a carefully crafted RE to provide it.

Conversely awk is quite happy working with an easily identified field
based on the count with field separators of one or more white space
characters.

Consider the output of `netstat -an` wherein you have multiple columns
of IP addresses.

Please find a quick way, preferably that doesn't involve negation
(because what needs to be negated may bey highly dynamic) that lists
inbound SMTP connections on an email server but doesn't list outbound
SMTP connections.

awk makes it trivial to identify and print records that have the SMTP
port in the local IP column, thus ignoring outbound connections with
SMTP in the remote column.

Aside: Yes, I know that ss and the likes have more features for this,
but this is my example and ss is not installed everywhere.

I sort of view awk as somewhat akin to SQL wherein fields in awk are
like columns in SQL.

I'd be more than a little bit surprised to find an SQL interface that
preserved white space /between/ columns. -- Many will do it /within/
columns.

awk makes it trivial to take field oriented output from commands and
apply some logic / parsing / action on specific fields in records.

> (And perhaps this should be moved to comp.lang.awk if it doesn't die
> out soon.

comp.lang.awk added and followup pointed there.

> Though both sed and awk are both languages in their own right
> and tools that can be used from the shell, so I'd argue there's a
> topicality overlap.)

;-)

--
Grant. . . .

Re: "sed" question

<87zfv9mkpj.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1806&group=comp.lang.awk#1806

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Thu, 07 Mar 2024 20:06:00 -0800
Organization: None to speak of
Lines: 118
Message-ID: <87zfv9mkpj.fsf@nosuchdomain.example.com>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me>
<usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me>
<usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="ba4161de3c6afa3b79edb2bdfdc78ddd";
logging-data="1595146"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KFinAlTdW9wNwv5c2WrKr"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:iMaEYp5Kx6N/HqgDr/ctrD8w6Z8=
sha1:eNFtYocnCVISnHsFja50eMdOQA8=
 by: Keith Thompson - Fri, 8 Mar 2024 04:06 UTC

Grant Taylor <gtaylor@tnetconsulting.net> writes:
> On 3/7/24 18:09, Keith Thompson wrote:
>> I know that's what awk does, but I don't think I would have expected
>> it if I didn't know about it.
>
> Okay. I think that's a fair observation.
>
>> $0 is the current input line.
>
> Or $0 is the current /record/ in awk parlance.

Yes.

>> If you don't change anything, or if you modify $0 itself, whitespace
>> betweeen fields is preserved.
>
>> If you modify any of the fields, $0 is recomputed and whitespace
>> between tokens is collapsed.
>
> I don't agree with that.
>
> % echo 'one two three' | awk '{print $0; print $1,$2,$3}'
> one two three
> one two three
>
> I didn't /modify/ anything and awk does print the fields with
> different white space.

That's just the semantics of print with comma-delimited arguments, just
like:

% awk 'BEGIN{a="foo"; b="bar"; print a, b}'
foo bar

Printing the values of $1, $2, and $3 doesn't change $0. Writing to any
of $1, $2, $3, even with the same value, does change $0.

$ echo 'one two three' | awk '{print $0; print $1,$2,$3; print $0; $2 = $2; print $0}'
one two three
one two three
one two three
one two three

>> awk *could* have been defined to preserve inter-field whitespace
>> even when you modify individual fields,
>
> I question the veracity of that. Specifically when lengthening or
> shortening the value of a field. E.g. replacing "two" with
> "fifteen". This is particularly germane when you look at $0 as a fixed
> width formatted output.

But awk doesn't work with fixed-width data. The length of each field,
and the length of $0, is variable.

If awk *purely* dealt with input lines only as lists of tokens, then
this:

echo 'one two three' | awk '{print $0}'

would print "one two three" rather than "one two three" (and awk would
lose the ability to deal with arbitrarily formatted input). The fact
that the inter-field whitespace is reset only when individual fields are
touched feels arbitrary to me.

>> and I think I would have found that more intuitive.
>
> I don't agree.
>
>> (And ideally there would be a way to refer to that inter-field
>> whitespace.)
>
> Remember, awk is meant for working on fields of data in a record. By
> default, the fields are delimited by white space characters. I'll say
> it this way, awk is meant for working on the non-white space
> characters. Or yet another way, awk is not meant for working on
> white space charters.

Awk has strong builtin support for working on whitespace-delimited
fields, and that support tends to ignore the details of that whitespace.
But you can also write awk code that just deals with $0.

One trivial example:

awk '{ count += length + 1 } END { print count }'

behaves similarly to `wc -l`, and counts whitespace characters just like
any other characters.

>> The fact that modifying a field has the side effect of messing up $0
>> seems counterintuitive.
>
> Maybe.
>
> But I think it's one that is acceptable for what awk is intended to do.

It's also the existing behavior, and changing it would break things, so
I wouldn't suggest changing it.

>> Perhaps the behavior matches your intuition better than it matches
>> mine.
>
> I sort of feel like you are wanting to / trying to use awk in places
> where sed might be better. sed just sees a string of text and is
> ignorant of any structure without a carefully crafted RE to provide it.

Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.

Awk is optimized for working on records consisting of fields, and not
caring much about how much whitespace there is between fields. But it's
flexible enought to do *lots* of other things.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: "sed" question

<usek8o$1jmgf$1@toylet.eternal-september.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1808&group=comp.lang.awk#1808

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!toylet.eternal-september.org!.POSTED!not-for-mail
From: toylet.toylet@gmail.com (Mr. Man-wai Chang)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Fri, 8 Mar 2024 17:03:19 +0800
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <usek8o$1jmgf$1@toylet.eternal-september.org>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 8 Mar 2024 09:03:20 -0000 (UTC)
Injection-Info: toylet.eternal-september.org; posting-host="f033bd73c96e7571dcae851ee67936cb";
logging-data="1694223"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ZUddmkskLasyC3tNq+6aW"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:AGW9SpsKH+JFEYy1zSaN+LPIduc=
Content-Language: en-US
In-Reply-To: <87zfv9mkpj.fsf@nosuchdomain.example.com>
 by: Mr. Man-wai Chang - Fri, 8 Mar 2024 09:03 UTC

On 8/3/2024 12:06 pm, Keith Thompson wrote:
>
> Not really. I'm just remarking on one particular awk feature that I
> find a bit counterintuitive.
>
> Awk is optimized for working on records consisting of fields, and not
> caring much about how much whitespace there is between fields. But it's
> flexible enought to do *lots* of other things.

The original Awk doesn't support regular expressions, right? Because
regex was not yet talked about back then??

Re: "sed" question

<20240307203151.441@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1809&group=comp.lang.awk#1809

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Fri, 8 Mar 2024 09:38:50 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 81
Message-ID: <20240307203151.441@kylheku.com>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me>
<usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me>
<usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
Injection-Date: Fri, 8 Mar 2024 09:38:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5d62f51bcdd3fa281d3b0240a90726ac";
logging-data="1707761"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18touClo9hz0S7VF6q0/FMZFV3ZsKvRhLA="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:IMjsEnExOTyuXfRgOC5BwBXEO5Y=
 by: Kaz Kylheku - Fri, 8 Mar 2024 09:38 UTC

On 2024-03-08, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> But awk doesn't work with fixed-width data. The length of each field,
> and the length of $0, is variable.

GNU Awk, however, can. It has a FIELDWIDTHS variable where you can
specify column widths that then work instead of the FS field separator.

There is also FPAT (see below).

> If awk *purely* dealt with input lines only as lists of tokens, then
> this:
>
> echo 'one two three' | awk '{print $0}'
>
> would print "one two three" rather than "one two three" (and awk would
> lose the ability to deal with arbitrarily formatted input). The fact
> that the inter-field whitespace is reset only when individual fields are
> touched feels arbitrary to me.

There is no inter-field whitespace.

There is the original record in $0, and parsed out fields in $1, $2, ..

The fields don't have any space.

The space comes from the value of the OFS variable.

$ echo 'one two three' | awk -v OFS=: '{ $1=$1; print }'
one:two:three

GNU Awk has a FPAT mechanism by which we can specify the positive
regex for recognizing fields as tokens. By means of that, we can
save the separating whitespace, turning it into a field:

$ echo 'one two three' | \
awk -v FPAT='[^ ]+|[ ]+' -v OFS= \
'{ $1=$1; print; print NF }'
one two three
5

There you go. We now have 5 fields. The interstitial space is a field.
We set OFS to empty and so $1=$1 doesn't collapse the separation.

> Not really. I'm just remarking on one particular awk feature that I
> find a bit counterintuitive.

The proposed feature of preserving the whitespace separation is a niche
use case in relation to Awk's orientation toward tabular data.

In tabular data that is not formatted into nice columns for a monospaced
font, the whitespace doesn't matter. Awk's behavior is that it will
normalize the separation.

In tabular data that is aligned visually, preserving the whitespace
will not work, if any of your field edits change a field width.

I'm believe that your niche use case has a value though.

That's why, in the TXR Lisp Awk macro, I implemented something which
helps with that use case: the "kfs" variable (keep field separators).
This Boolean variable, if true, causes the separating character
sequences to be retained, and turned into fields.

$ echo 'one two three' | txr -e '(awk (:set kfs t) (t))'
one two three

We can see the list f instead, printed machine readably:

$ echo 'one two three' | txr -e '(awk (:set kfs t) (t (prinl f)))'
("" "one" " " "two" " " "three" "")

There is a leading and trailing empty separator. There is a
very good reason for that, in that the default strategy in Awk
is that, for instance " a b c " produces three fields.
For consistency, if we retain separation, we should always have
five fields where there woudl be three.

My FPAT approach above in the GNU Awk example won't do this correctly;
more work is needed.

This kfs variable is not so recent; I implemented in it 2016.

Re: "sed" question

<usf8bt$1odaj$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1886&group=comp.lang.awk#1886

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Fri, 8 Mar 2024 15:46:20 +0100
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <usf8bt$1odaj$1@dont-email.me>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
<usek8o$1jmgf$1@toylet.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 8 Mar 2024 14:46:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3f9f75a81ddf5efa60d1bc0d0d20978a";
logging-data="1848659"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19xowDT67a/Adq6axCxxE9D"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:paxfhnPc9srabbLSS51YdmXftP4=
In-Reply-To: <usek8o$1jmgf$1@toylet.eternal-september.org>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 8 Mar 2024 14:46 UTC

On 08.03.2024 10:03, Mr. Man-wai Chang wrote:
>
> The original Awk doesn't support regular expressions, right?

Where did you get that from? - Awk without regexps makes little sense;
mind that the basic syntax of Awk programs is described as
/pattern/ { action }
What would remain if there's no regexp patterns; string comparisons?

> Because regex was not yet talked about back then??

Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
released 1979. Before that tool we had Sed (1974), and before that we
had Ed and Grep (1973). My perception is that regexps were there as a
basic concept of UNIX in all these tools, so why should Awk be exempt.
According to the authors Awk was designed to see how Sed and Grep could
be generalized.

Janis

Re: "sed" question

<usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1887&group=comp.lang.awk#1887

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!tncsrv06.tnetconsulting.net!tncsrv09.home.tnetconsulting.net!.POSTED.omega.home.tnetconsulting.net!not-for-mail
From: gtaylor@tnetconsulting.net (Grant Taylor)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Fri, 8 Mar 2024 09:12:05 -0600
Organization: TNet Consulting
Message-ID: <usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
<usek8o$1jmgf$1@toylet.eternal-september.org> <usf8bt$1odaj$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 8 Mar 2024 15:12:05 -0000 (UTC)
Injection-Info: tncsrv09.home.tnetconsulting.net; posting-host="omega.home.tnetconsulting.net:198.18.1.140";
logging-data="21947"; mail-complaints-to="newsmaster@tnetconsulting.net"
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <usf8bt$1odaj$1@dont-email.me>
 by: Grant Taylor - Fri, 8 Mar 2024 15:12 UTC

On 3/8/24 08:46, Janis Papanagnou wrote:
> Awk without regexps makes little sense;

I think this comes down to what is a regular expression and what is not
a regular expression.

> mind that the basic syntax of Awk programs is described as
> pattern { action }

I'm guessing that 40-60% of the awk that I use doesn't use what I would
consider to be regular expressions.

(NF == 5){print $3}
(NF == 8){print $4}

Or:

{total+=$5}
END{print total}

I usually think of regular expressions when I'm doing a sub(/re/, ...)
type thing or a (... ~ /re/) type conditional. More specifically things
between the // in both of those statements are the REs.

Maybe I have an imprecise understanding / definition.

--
Grant. . . .

Re: "sed" question

<usfcfm$1pd84$1@toylet.eternal-september.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1888&group=comp.lang.awk#1888

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!toylet.eternal-september.org!.POSTED!not-for-mail
From: toylet.toylet@gmail.com (Mr. Man-wai Chang)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Fri, 8 Mar 2024 23:56:38 +0800
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <usfcfm$1pd84$1@toylet.eternal-september.org>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
<usek8o$1jmgf$1@toylet.eternal-september.org> <usf8bt$1odaj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 8 Mar 2024 15:56:38 -0000 (UTC)
Injection-Info: toylet.eternal-september.org; posting-host="f033bd73c96e7571dcae851ee67936cb";
logging-data="1881348"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+YfTAQ73P5W4HSjH2JmFpF"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:bMvGC+3wQxcHaDupdfLnZ7nV90g=
In-Reply-To: <usf8bt$1odaj$1@dont-email.me>
Content-Language: en-US
 by: Mr. Man-wai Chang - Fri, 8 Mar 2024 15:56 UTC

On 8/3/2024 10:46 pm, Janis Papanagnou wrote:
> On 08.03.2024 10:03, Mr. Man-wai Chang wrote:
>>
>> The original Awk doesn't support regular expressions, right?
>
> Where did you get that from? - Awk without regexps makes little sense;
> mind that the basic syntax of Awk programs is described as
> /pattern/ { action }
> What would remain if there's no regexp patterns; string comparisons?
>
>> Because regex was not yet talked about back then??
>
> Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
> released 1979. Before that tool we had Sed (1974), and before that we
> had Ed and Grep (1973). My perception is that regexps were there as a
> basic concept of UNIX in all these tools, so why should Awk be exempt.
> According to the authors Awk was designed to see how Sed and Grep could
> be generalized.

That part of history is beyond me. Sorry... my fault for not doing a check.

Re: "sed" question

<usfcle$1pd84$2@toylet.eternal-september.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1889&group=comp.lang.awk#1889

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!toylet.eternal-september.org!.POSTED!not-for-mail
From: toylet.toylet@gmail.com (Mr. Man-wai Chang)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Fri, 8 Mar 2024 23:59:42 +0800
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <usfcle$1pd84$2@toylet.eternal-september.org>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
<usek8o$1jmgf$1@toylet.eternal-september.org> <usf8bt$1odaj$1@dont-email.me>
<usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 8 Mar 2024 15:59:42 -0000 (UTC)
Injection-Info: toylet.eternal-september.org; posting-host="f033bd73c96e7571dcae851ee67936cb";
logging-data="1881348"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19tgmAIVezChwQU5Vz28/la"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:qWBIunfoXbrEHx1AWOswntTgGd4=
Content-Language: en-US
In-Reply-To: <usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>
 by: Mr. Man-wai Chang - Fri, 8 Mar 2024 15:59 UTC

On 8/3/2024 11:12 pm, Grant Taylor wrote:
>
> I usually think of regular expressions when I'm doing a sub(/re/, ...)
> type thing or a (... ~ /re/) type conditional. More specifically things
> between the // in both of those statements are the REs.
>
> Maybe I have an imprecise understanding / definition.

Do Linux and Unix have a ONE AND ONLY ONE STANDARD regex library?

It seemed that tools and programming languages have their own
implementions, let alone different versions among them.

Re: "sed" question

<usgg8t$20kn1$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1893&group=comp.lang.awk#1893

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Sat, 9 Mar 2024 03:07:24 +0100
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <usgg8t$20kn1$1@dont-email.me>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
<usek8o$1jmgf$1@toylet.eternal-september.org> <usf8bt$1odaj$1@dont-email.me>
<usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 9 Mar 2024 02:07:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0d232eda33724788bef0bf992dc6522c";
logging-data="2118369"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Arc92MJn581MpWh5ghkdK"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:32iuCaTm8OpQlCLB6zSsbdoBqQQ=
In-Reply-To: <usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sat, 9 Mar 2024 02:07 UTC

On 08.03.2024 16:12, Grant Taylor wrote:
> On 3/8/24 08:46, Janis Papanagnou wrote:
>> Awk without regexps makes little sense;
>
> I think this comes down to what is a regular expression and what is not
> a regular expression.
>
>> mind that the basic syntax of Awk programs is described as
>> pattern { action }
>
> I'm guessing that 40-60% of the awk that I use doesn't use what I would
> consider to be regular expressions.
> [...]
> Maybe I have an imprecise understanding / definition.

Your definition matches the common naming, where I deliberately
deviate from. (I think that "pattern" is an inferior naming and
"condition" should better be used, where a 'condition' can also
be a regexp that I regularly write as '/regexp/' or '/pattern/'
in explanations.) So I agree that it's likely that this alone
doesn't serve well as explanation for the existence of regexps
in Awk. The rationale is better seen in the statement "Awk was
designed to see how Sed and Grep could be generalized." that I
quoted (not literally, but from the original Awk book).

Janis

Re: "sed" question

<usggn5$20n50$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1894&group=comp.lang.awk#1894

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!usenet.network!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Sat, 9 Mar 2024 03:15:00 +0100
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <usggn5$20n50$1@dont-email.me>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
<usek8o$1jmgf$1@toylet.eternal-september.org> <usf8bt$1odaj$1@dont-email.me>
<usfcfm$1pd84$1@toylet.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 9 Mar 2024 02:15:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0d232eda33724788bef0bf992dc6522c";
logging-data="2120864"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18YfFNkjvTKXYHuU0tdTD1p"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:wJGJ7NhM754guy11dCDDtY5rIxQ=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <usfcfm$1pd84$1@toylet.eternal-september.org>
 by: Janis Papanagnou - Sat, 9 Mar 2024 02:15 UTC

On 08.03.2024 16:56, Mr. Man-wai Chang wrote:
> On 8/3/2024 10:46 pm, Janis Papanagnou wrote:
>>
>> Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
>> released 1979. Before that tool we had Sed (1974), and before that we
>> had Ed and Grep (1973). My perception is that regexps were there as a
>> basic concept of UNIX in all these tools, so why should Awk be exempt.
>> According to the authors Awk was designed to see how Sed and Grep could
>> be generalized.
>
> That part of history is beyond me. Sorry... my fault for not doing a check.

The mistake may stem from a myth (I heard it before already); it may
have been misinterpreted where it's said that in the first Awk there
was no match function (which is true, but it means the concrete match()
function not the abstract function of a (regexp) pattern match).

Janis

Re: "sed" question

<tv26ck-3qt.ln1@ID-313840.user.individual.net>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1917&group=comp.lang.awk#1917

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: geoff@clare.See-My-Signature.invalid (Geoff Clare)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Tue, 12 Mar 2024 13:47:09 +0000
Lines: 21
Message-ID: <tv26ck-3qt.ln1@ID-313840.user.individual.net>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
<usagql$j9bc$1@dont-email.me>
<usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
<usb6pa$ncok$1@dont-email.me>
<usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
<87bk7poa7u.fsf@nosuchdomain.example.com>
<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
<87zfv9mkpj.fsf@nosuchdomain.example.com>
<usek8o$1jmgf$1@toylet.eternal-september.org> <usf8bt$1odaj$1@dont-email.me>
<usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>
<usfcle$1pd84$2@toylet.eternal-september.org>
Reply-To: netnews@gclare.org.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net BcExBxBWR+RlSdABP2H2twsAt7Ic0PPNaGtHXDj0vZHAmwQW81
X-Orig-Path: ID-313840.user.individual.net!not-for-mail
Cancel-Lock: sha1:DBEZrd/lF807Iby1yHouDto1a3g= sha256:rvGwk5rG9CmDZtcNw4oKKXSBLAMmJGEkdnS0QsT/N0s=
User-Agent: Pan/0.154 (Izium; 517acf4)
 by: Geoff Clare - Tue, 12 Mar 2024 13:47 UTC

Mr. Man-wai Chang wrote:

> Do Linux and Unix have a ONE AND ONLY ONE STANDARD regex library?
>
> It seemed that tools and programming languages have their own
> implementions, let alone different versions among them.

In the POSIX/UNIX standard the functions used for handling regular
expressions are regcomp() and regexec() (and regerror() and regfree()).
They are in the C library, not a separate "regex library".

They support different RE flavours via flags. The standard requires
that "basic regular expressions" (default) and "extended regular
expressions" (with REG_EXTENDED flag) are supported. Implementations
can support other flavours with non-standard flags.

POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).

--
Geoff Clare <netnews@gclare.org.uk>

Re: "sed" question

<65f0a63c$0$716$14726298@news.sunsite.dk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1919&group=comp.lang.awk#1919

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
References: <us9vka$fepq$1@dont-email.me> <usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net> <usfcle$1pd84$2@toylet.eternal-september.org> <tv26ck-3qt.ln1@ID-313840.user.individual.net>
Organization: non
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: arnold@freefriends.org (Aharon Robbins)
Originator: arnold@freefriends.org (Aharon Robbins)
Date: 12 Mar 2024 19:00:13 GMT
Lines: 8
Message-ID: <65f0a63c$0$716$14726298@news.sunsite.dk>
NNTP-Posting-Host: 2ea3cda8.news.sunsite.dk
X-Trace: 1710270013 news.sunsite.dk 716 arnold@skeeve.com/198.99.81.75:45524
X-Complaints-To: staff@sunsite.dk
 by: Aharon Robbins - Tue, 12 Mar 2024 19:00 UTC

In article <tv26ck-3qt.ln1@ID-313840.user.individual.net>,
Geoff Clare <netnews@gclare.org.uk> wrote:
>POSIX requires that awk uses extended regular expressions (i.e. the
>same as regcomp() with REG_EXTENDED).

There is the additional requirement that \ inside [....] can
be used to escape characters, so that [abc\]def] is valid in
awk but not in other uses of REG_EXTENDED.

Re: "sed" question

<dvn8ck-pdr.ln1@ID-313840.user.individual.net>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1926&group=comp.lang.awk#1926

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: geoff@clare.See-My-Signature.invalid (Geoff Clare)
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Wed, 13 Mar 2024 13:57:33 +0000
Lines: 18
Message-ID: <dvn8ck-pdr.ln1@ID-313840.user.individual.net>
References: <us9vka$fepq$1@dont-email.me>
<usf9s5$ldr$1@tncsrv09.home.tnetconsulting.net>
<usfcle$1pd84$2@toylet.eternal-september.org>
<tv26ck-3qt.ln1@ID-313840.user.individual.net>
<65f0a63c$0$716$14726298@news.sunsite.dk>
Reply-To: netnews@gclare.org.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net e8OJNVx0bGsrmfi2prmVSA6fKaS7T0SHGzeJrZPzaV4o65N4Dg
X-Orig-Path: ID-313840.user.individual.net!not-for-mail
Cancel-Lock: sha1:Np64dP3ni+iu4shk1clQ9XPRvL8= sha256:7Lwmn4FDpFEYiYKvkvEcGykZzvvXJWskikORgna/WY8=
User-Agent: Pan/0.154 (Izium; 517acf4)
 by: Geoff Clare - Wed, 13 Mar 2024 13:57 UTC

Aharon Robbins wrote:

> In article <tv26ck-3qt.ln1@ID-313840.user.individual.net>,
> Geoff Clare <netnews@gclare.org.uk> wrote:
>>POSIX requires that awk uses extended regular expressions (i.e. the
>>same as regcomp() with REG_EXTENDED).
>
> There is the additional requirement that \ inside [....] can
> be used to escape characters,

Yes, awk effectively has an extra "layer" of backslash escaping
before the ERE rules kick in, both inside and outside [....].
I didn't mention this so as not to overload the OP with
information - he seemed more interested in the different flavours
of RE than in nitty gritty details like that.

--
Geoff Clare <netnews@gclare.org.uk>

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor