Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

<wiggy> in a stunning new move I actually tested this upload


devel / comp.lang.awk / expressive iteration with macros

SubjectAuthor
* expressive iteration with macrosKaz Kylheku
`* Re: expressive iteration with macrosKpop 2GM
 `* Re: expressive iteration with macrosKaz Kylheku
  `* Re: expressive iteration with macrosKpop 2GM
   `* Re: expressive iteration with macrosKaz Kylheku
    `* Re: expressive iteration with macrosKpop 2GM
     +- Re: expressive iteration with macrosKpop 2GM
     `* Re: expressive iteration with macrosKaz Kylheku
      `* Re: expressive iteration with macrosKpop 2GM
       `- Re: expressive iteration with macrosKpop 2GM

1
expressive iteration with macros

<20220423194037.907@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1231&group=comp.lang.awk#1231

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: expressive iteration with macros
Date: Sun, 24 Apr 2022 02:43:45 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <20220423194037.907@kylheku.com>
Injection-Date: Sun, 24 Apr 2022 02:43:45 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0dc8692cd8fbf5b026ffc5b38aacb618";
logging-data="7243"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fE9e2Vpz1cit7Ad+bcquN7jBcjb8OnPQ="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:8gYrRF41Rym3eyLObUCMu/fr2oo=
 by: Kaz Kylheku - Sun, 24 Apr 2022 02:43 UTC

$ cppawk '
#include <iter.h>
#include <cons.h>

#define NAME $1
#define UID $3

BEGIN {
FS = ":"
loop (records("/etc/passwd"),
maximizing(max_uid, UID),
argmax(longest_name_uid, UID, length(NAME)),
argmax(longest_name, NAME, length(NAME)),
from(line, 1),
counting(roots, UID == 0),
argmax(longest_name_line, line, length(NAME)),
if (NAME ~ /^r/, collect(start_with_r, NAME)))
; // empty
print "highest observed UID:", max_uid
print "number of superuser aliases:", roots
print "UID with longest name:", longest_name_uid
print "line in file with longest name:", longest_name_line
print "longest name:", longest_name
print "names starting with 'r':", sexp(start_with_r)
}'
highest observed UID: 65534
number of superuser aliases: 1
UID with longest name: 123
line in file with longest name: 43
longest name: gnome-initial-setup
names starting with r: ("root" "rtkit")

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: expressive iteration with macros

<b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1255&group=comp.lang.awk#1255

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:28c1:b0:6a5:ba25:1768 with SMTP id l1-20020a05620a28c100b006a5ba251768mr8204062qkp.464.1653738176409;
Sat, 28 May 2022 04:42:56 -0700 (PDT)
X-Received: by 2002:a81:949:0:b0:2f7:c45b:e291 with SMTP id
70-20020a810949000000b002f7c45be291mr48398930ywj.503.1653738176246; Sat, 28
May 2022 04:42:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 28 May 2022 04:42:55 -0700 (PDT)
In-Reply-To: <20220423194037.907@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <20220423194037.907@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
Subject: Re: expressive iteration with macros
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Sat, 28 May 2022 11:42:56 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1205
 by: Kpop 2GM - Sat, 28 May 2022 11:42 UTC

so you implemented something resembling the functionality of SQL SELECT statement GROUP BY ?

Re: expressive iteration with macros

<20220528092856.926@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1260&group=comp.lang.awk#1260

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: expressive iteration with macros
Date: Sat, 28 May 2022 16:43:17 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <20220528092856.926@kylheku.com>
References: <20220423194037.907@kylheku.com>
<b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
Injection-Date: Sat, 28 May 2022 16:43:17 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="124eefc806ce73386c0e67cc6e9f5abd";
logging-data="13220"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/XRicwaUGSOM02kbNtfuaRMzcHPXdNiFc="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:D2XtViQjzz4f/94mEf1D+zv6SwI=
 by: Kaz Kylheku - Sat, 28 May 2022 16:43 UTC

On 2022-05-28, Kpop 2GM <jason.cy.kwan@gmail.com> wrote:
> so you implemented something resembling the functionality of SQL
> SELECT statement GROUP BY ?

No such thing appears in the example you replied to, so funny you should
mention it; but in fact I have a group_by function in the <array.h>
header, which is still undocumented.

https://www.kylheku.com/cgit/cppawk/tree/cppawk-include/array.h

I don't know SQL, but this is like the group-by function you
find in some dynamic programming languages.

Here is a quick demo. First, a background warmup. Let's write
an uncoditional action which builds a list of cons cell pairs
made from fields $1 and $2, pushing them onto the lst variable:

../cppawk '
#include <cons.h>
#include <array.h>
{ push(cons($1, $2), lst) }
END { print sexp(lst) }'
a 1
a 2
a 3
b 1
a 4
c 2
c 3
[Ctrl-D][Enter]
(("c" . 3) ("c" . 2) ("a" . 4) ("b" . 1) ("a" . 3) ("a" . 2) ("a" . 1))

OK, now let's introduce group-by:

../cppawk '
#include <cons.h>
#include <array.h>
#include <fun.h>
{ push(cons($1, $2), lst) }
END { group_by(fun(car), lst, arr);
for (i in arr) print i, sexp(arr[i]) }'
a 1
a 2
a 3
b 1
a 4
c 2
c 3
[Ctrl-D][Enter]
a (("a" . 4) ("a" . 3) ("a" . 2) ("a" . 1))
b (("b" . 1))
c (("c" . 3) ("c" . 2))

group_by has populated the array arr with keys a, b, c,
each one tied to a list of those cons pair items which
have that key.

group_by(fun(car), lst, arr) means: for each item x in list,
apply the car function to extract the key k as if by k = car(x).
Then collect the item x into a list that is specific to k.
Each such collected then appears as arr[k] in the array.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: expressive iteration with macros

<a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1261&group=comp.lang.awk#1261

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a37:4454:0:b0:69f:c339:e2dc with SMTP id r81-20020a374454000000b0069fc339e2dcmr31008854qka.771.1653770190447;
Sat, 28 May 2022 13:36:30 -0700 (PDT)
X-Received: by 2002:a25:824f:0:b0:65c:a703:6d48 with SMTP id
d15-20020a25824f000000b0065ca7036d48mr3083458ybn.418.1653770190146; Sat, 28
May 2022 13:36:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 28 May 2022 13:36:29 -0700 (PDT)
In-Reply-To: <20220528092856.926@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <20220423194037.907@kylheku.com> <b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
<20220528092856.926@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>
Subject: Re: expressive iteration with macros
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Sat, 28 May 2022 20:36:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Kpop 2GM - Sat, 28 May 2022 20:36 UTC

On Saturday, May 28, 2022 at 12:43:19 PM UTC-4, Kaz Kylheku wrote:
> > so you implemented something resembling the functionality of SQL
> > SELECT statement GROUP BY ?
> No such thing appears in the example you replied to, so funny you should
> mention it; but in fact I have a group_by function in the <array.h>
> header, which is still undocumented.
>
> https://www.kylheku.com/cgit/cppawk/tree/cppawk-include/array.h
>
> I don't know SQL, but this is like the group-by function you
> find in some dynamic programming languages.
>
> Here is a quick demo. First, a background warmup. Let's write
> an uncoditional action which builds a list of cons cell pairs
> made from fields $1 and $2, pushing them onto the lst variable:
>
> ./cppawk '
> #include <cons.h>
> #include <array.h>
> { push(cons($1, $2), lst) }
> END { print sexp(lst) }'
> a 1
> a 2
> a 3
> b 1
> a 4
> c 2
> c 3
> [Ctrl-D][Enter]
> (("c" . 3) ("c" . 2) ("a" . 4) ("b" . 1) ("a" . 3) ("a" . 2) ("a" . 1))
>
> OK, now let's introduce group-by:
>
> ./cppawk '
> #include <cons.h>
> #include <array.h>
> #include <fun.h>
> { push(cons($1, $2), lst) }
> END { group_by(fun(car), lst, arr);
> for (i in arr) print i, sexp(arr[i]) }'
> a 1
> a 2
> a 3
> b 1
> a 4
> c 2
> c 3
> [Ctrl-D][Enter]
> a (("a" . 4) ("a" . 3) ("a" . 2) ("a" . 1))
> b (("b" . 1))
> c (("c" . 3) ("c" . 2))
>
> group_by has populated the array arr with keys a, b, c,
> each one tied to a list of those cons pair items which
> have that key.
>
> group_by(fun(car), lst, arr) means: for each item x in list,
> apply the car function to extract the key k as if by k = car(x).
> Then collect the item x into a list that is specific to k.
> Each such collected then appears as arr[k] in the array.
> --
> TXR Programming Language: http://nongnu.org/txr
> Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

impressive library indeed. i took a lot at your GIT tree.

I guess i come from a completely different angle in terms off adding on features to awk - i made mine to be

- all still at scripting level,
- make close to zero amount of external calls (other than benchmarking utility - only mawk2 gives me sub-second timestamps, the rest i need to go to gnu-date)

- same code base being able to all run from at least 4 variants of awk that i have (so it can't leverage any of the extra goodies from gawk, and i have to devise equivalent ones),

that includes haing them self-identify which awk variant it's running on, but fingerprinted entirely based on intrinsic behavior of that awk that cannot be tricked via setting a variable - at shell at awk or at file, or naming the binary differently in the directory, nor does it rely on what it says at ARGV[ 0 ]

- need my functions to be able to account for nuisances and caveats for each awk-variant, and have single unified function thta can process around their unique weaknesses (like the stupid 2^31-1 limit of mawk 1.3.4), and

- regardless of locale setting, i designed my library such that byte-mode awks are fully unicode aware, while gawk in unicode mode can handle any arbitrary combination of unsafe bytes, and process them without incurring any warning messages (without needing to force suppress them) -

i even managed to have it stay entirely within gawk unicode mode, not make any external calls, and base64 decode out a mp3 file byte-for-byte, entirely from my own library's features.

Re: expressive iteration with macros

<20220528180320.150@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1262&group=comp.lang.awk#1262

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: expressive iteration with macros
Date: Sun, 29 May 2022 01:17:59 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <20220528180320.150@kylheku.com>
References: <20220423194037.907@kylheku.com>
<b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
<20220528092856.926@kylheku.com>
<a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>
Injection-Date: Sun, 29 May 2022 01:17:59 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="caec2525927c55d4e2b96131dd040382";
logging-data="6588"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ubwZiZVIKNMu4l62qvfGul6qCtYpx3UE="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:TPO8KMFAZA6G86lQSD+mzD4xbeg=
 by: Kaz Kylheku - Sun, 29 May 2022 01:17 UTC

On 2022-05-28, Kpop 2GM <jason.cy.kwan@gmail.com> wrote:
> I guess i come from a completely different angle in terms off adding on features to awk - i made mine to be
>
> - all still at scripting level,
> - make close to zero amount of external calls (other than benchmarking
> utility - only mawk2 gives me sub-second timestamps, the rest i need
> to go to gnu-date)

cppawk is a shell script, and calls the preprocessor and awk; but you
can capture the preprocessor output and then you just have an awk script
you can pass to awk; basically you can use it like a "compiler" to
produce a single "executable" out of one or more files and/or command
line program.

That would be a use case when preparing something for an embedded
system, where you might not want the preprocessor, or if you
don't want the preprocessing overhead each time you run some
frequently run program.

> - same code base being able to all run from at least 4 variants of awk
> that i have (so it can't leverage any of the extra goodies from gawk,
> and i have to devise equivalent ones),

I not only have that, but you in cppawk you can test which Awk you're
using at preprocessing time:

#if __gawk__
...
#else
...
#endif

can test using #if which Awk you're running on. There are command
line options to tell cppawk which Awk to generate code for and execute.

One big exception to portability is Gawk indirect functions which
group_by depends on. So group_by will not be available if you don't
have Gawk.

Take a look at <case.h>; it's provides a portable case statement syntax
that becomes switch if you have Gawk, or else portable code for other
Awks.

Indirect function stuff failing on mawk:

$ ./cppawk --awk=mawk '
#include <cons.h>
#include <array.h>
#include <fun.h>
{ push(cons($1, $2), lst) }
END { group_by(fun(car), lst, arr);
for (i in arr) print i, sexp(arr[i]) }'
In file included from ./cppawk-include/fun.h:32:0,
from <stdin>:4:
../cppawk-include/fun-priv.h:40:2: warning: #warning "<fun.h> requires an Awk with function indirection like newer GNU Awk" [-Wcpp]
#warning "<fun.h> requires an Awk with function indirection like newer GNU Awk"
^~~~~~~
mawk: /dev/fd/63: line 835: function group_by never defined

Things not requiring <fun.h> are good to go, though.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: expressive iteration with macros

<307aaeed-eea5-43f0-80c5-300d837b5965n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1263&group=comp.lang.awk#1263

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:2a14:b0:6a3:8820:283e with SMTP id o20-20020a05620a2a1400b006a38820283emr23817777qkp.53.1653827175177;
Sun, 29 May 2022 05:26:15 -0700 (PDT)
X-Received: by 2002:a25:9c06:0:b0:64d:e861:859d with SMTP id
c6-20020a259c06000000b0064de861859dmr47173511ybo.274.1653827175029; Sun, 29
May 2022 05:26:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sun, 29 May 2022 05:26:14 -0700 (PDT)
In-Reply-To: <20220528180320.150@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <20220423194037.907@kylheku.com> <b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
<20220528092856.926@kylheku.com> <a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>
<20220528180320.150@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <307aaeed-eea5-43f0-80c5-300d837b5965n@googlegroups.com>
Subject: Re: expressive iteration with macros
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Sun, 29 May 2022 12:26:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3686
 by: Kpop 2GM - Sun, 29 May 2022 12:26 UTC

> > - same code base being able to all run from at least 4 variants of awk
> > that i have (so it can't leverage any of the extra goodies from gawk,
> > and i have to devise equivalent ones),
> I not only have that, but you in cppawk you can test which Awk you're
> using at preprocessing time:
>
> #if __gawk__
> ...
> #else
> ...
> #endif
>
> can test using #if which Awk you're running on. There are command
> line options to tell cppawk which Awk to generate code for and execute.

i actually meant it as being able to tell whether gawk was invoked with -c flag or -n flag or -P flag or -M flag , multiply all that by unicode-ness - without relying on looking at the invocation call, or peek at "ps" output.

e.g. my system has

awk is /usr/local/bin/awk
awk is /usr/bin/awk
awk is /opt/homebrew/bin/awk

nawk is /usr/local/bin/nawk
gawk is /usr/local/bin/gawk
gawk is /opt/homebrew/bin/gawk

mawk is /usr/local/bin/mawk
mawk is /opt/homebrew/bin/mawk
mawk2 is /usr/local/bin/mawk2

#1 and #3 are alias to gawk, but #2 is for nawk, so i clearly cant rely on just the binary name alone. nawk is close to pure junk at this point, its ONLY usefulness being a debugging interface like no other.

An example of where detection matter is measuring array length. Gawk -P mode (posix) disables length(array), so calling it like that errors out. So my library would auto-detect it's gawk -P, and route it to count using a for-loop, but since the looping method is very slow, i send every other invocation combo to just length(array).

to detect just gawk -P, i came up with this strange test that only hits for gawk -P and no other

function _testawk_util8(){ # only gawk -P
return \
("x\4"<"\x4")
}

For now, i could detect these splits properly :

## gawk -e |- 06 gawk -ne |- 01 gawk -nMbe |- 93
## gawk -be |- 90 gawk -nbe |- 85 mawk1 -- |- 29
## gawk -ce |- 49 gawk -Me |- 76 mawk2 -- |- 21
## gawk -cbe |- 33 gawk -Mbe |- 98 nawk[UTF8] |- 12
## gawk -Pe |- 39 gawk -nMe |- 09 nawk[byte] |- 11

i can't get the god damn mpfr extension to compile properly on M1, cest la vie

Re: expressive iteration with macros

<065500e0-ba77-4227-8a97-173fe0030880n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1264&group=comp.lang.awk#1264

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:258e:b0:680:f33c:dbcd with SMTP id x14-20020a05620a258e00b00680f33cdbcdmr35973035qko.542.1653836686328;
Sun, 29 May 2022 08:04:46 -0700 (PDT)
X-Received: by 2002:a25:9c06:0:b0:64d:e861:859d with SMTP id
c6-20020a259c06000000b0064de861859dmr47736869ybo.274.1653836686171; Sun, 29
May 2022 08:04:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sun, 29 May 2022 08:04:45 -0700 (PDT)
In-Reply-To: <307aaeed-eea5-43f0-80c5-300d837b5965n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <20220423194037.907@kylheku.com> <b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
<20220528092856.926@kylheku.com> <a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>
<20220528180320.150@kylheku.com> <307aaeed-eea5-43f0-80c5-300d837b5965n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <065500e0-ba77-4227-8a97-173fe0030880n@googlegroups.com>
Subject: Re: expressive iteration with macros
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Sun, 29 May 2022 15:04:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Kpop 2GM - Sun, 29 May 2022 15:04 UTC

@Kaz : here's a quick illustration of what my library does (among others) - this was ran over mawk2, which is completely unicode-blind on its own right :

- it could map hangul to latin letter syllables
- calculate CRC32 on it
- URL encoding, base64 encoding, and dump out the byte composition in octal
- emulate "xxd -ps" with a pure hex dump
- make it split $0 into individual UTF-8 characters (for something that's not UTF-8 aware)
- and show a fully decomposed view of them (per UTF-8 NFC/NFD setup) :

mawk2x 'BEGIN { OFS="\f" } { print NF, $0, $1, $NF, hangulk2e($NF); print crc32($0), urlencode($0), base64enc($0); print xxdps($0); print strdump($0) } { split0uc(); OFS="\f"; print NF, $0; NF=NF; print } { for(_^=_<_;_<=NF;_++) { print _,$_, ordC($_), decomposeHangul($_) } }' <<<'오복녀'

1 오복녀
오복녀
오복녀
오복녀=Oh Bok-Nyeo=Oh Bok-Nyeo 오복녀crc32#0x7A39C92A
%EC%98%A4%EB%B3%B5%EB%85%80
7Jik67O164WA
ec98a4ebb3b5eb8580
\354\230\244\353\263\265\353\205\200
3 오복녀



1 오
50724
U+C624:오:50724:S:6692:ᄋ:L:11:ᅩ:V:8:~_:_O_:~_:_sLV:T:0
2 복
48373
U+BCF5:복:48373:S:4341:ᄇ:L:7:ᅩ:V:8:B_:_O_:K_:ᆨ:T:1
3 녀
45376
U+B140:녀:45376:S:1344:ᄂ:L:2:ᅧ:V:6:N_:YEO:~_:_sLV:T:0

Another different chunk of my library pertains to my own big int functions instead of relying on GMP, something like this

echo 127 | mawk2x '{ timerF(!(_______=___=__=$NF)); for(____=_^=_<_;_<11;) {__=pow(__,_+=1); print timerF(2,1),sprintf("%s ^ %6.f",_______,____*=_),(___=(___~"\\^")? (___)(" * ")_:___"^("_)")",anylog(__,2,14) } }'

0.0000260=127 ^ 2=127^(2)=13.97736937354433
0.0000600=127 ^ 6=127^(2 * 3)=41.93210812063299
0.0000910=127 ^ 24=127^(2 * 3 * 4)=167.72843248253198
0.0002110=127 ^ 120=127^(2 * 3 * 4 * 5)=838.64216241265990
0.0025120=127 ^ 720=127^(2 * 3 * 4 * 5 * 6)=5031.85297447595985
0.1265000=127 ^ 5040=127^(2 * 3 * 4 * 5 * 6 * 7)=35222.97082133172079
3.5375310=127 ^ 40320=127^(2 * 3 * 4 * 5 * 6 * 7 * 8)=281783.76657065376639

The slowness of it only starts to show when log2 is 125K+

Re: expressive iteration with macros

<20220529090152.7@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1265&group=comp.lang.awk#1265

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-1380@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: expressive iteration with macros
Date: Sun, 29 May 2022 16:16:20 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <20220529090152.7@kylheku.com>
References: <20220423194037.907@kylheku.com>
<b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
<20220528092856.926@kylheku.com>
<a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>
<20220528180320.150@kylheku.com>
<307aaeed-eea5-43f0-80c5-300d837b5965n@googlegroups.com>
Injection-Date: Sun, 29 May 2022 16:16:20 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="caec2525927c55d4e2b96131dd040382";
logging-data="19060"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Y0jkhA8n0924jpEHT78IO2djc38dYsQk="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:a7xhrOd1ngdbS7j2aB9pPzIzRoI=
 by: Kaz Kylheku - Sun, 29 May 2022 16:16 UTC

On 2022-05-29, Kpop 2GM <jason.cy.kwan@gmail.com> wrote:
>
>> > - same code base being able to all run from at least 4 variants of awk
>> > that i have (so it can't leverage any of the extra goodies from gawk,
>> > and i have to devise equivalent ones),
>> I not only have that, but you in cppawk you can test which Awk you're
>> using at preprocessing time:
>>
>> #if __gawk__
>> ...
>> #else
>> ...
>> #endif
>>
>> can test using #if which Awk you're running on. There are command
>> line options to tell cppawk which Awk to generate code for and execute.
>
> i actually meant it as being able to tell whether gawk was invoked
> with -c flag or -n flag or -P flag or -M flag , multiply all that by

I could add support for that in cppawk. It parses all the options in
order to support a few cpp options, and a couple of its own. The
rest are passed to awk. I could have it recognize -c being passed
to gawk, to set some preprocessor symbol.

> unicode-ness - without relying on looking at the invocation call, or
> peek at "ps" output.

The advantage of having a preprocessing layer is that it may be
in many situations acceptable that the output of preprocessing just
*assumes* it is running on a certain brand of awk, of a certain
version, invoked in a certain way. Then you don't have any run-time
detection and switching overheads in the code.

I suspect that in many cases, the user of a portable Awk library
is actually just using one specific awk, and doesn't care whether
the preprocessed output works with other awks.

Or else if they do care about their code running on other awks also,
many users may be accepting of the limitations of doing it statically:
being able to generate efficient code that is tuned to a particular awk,
or else inefficient code that works with more awks, rather than one body
code which switches at run-time.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: expressive iteration with macros

<92c6bf63-92ce-49fb-a00f-e49650055921n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1266&group=comp.lang.awk#1266

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:6214:2485:b0:462:4bf3:a817 with SMTP id gi5-20020a056214248500b004624bf3a817mr31267364qvb.82.1653941376171;
Mon, 30 May 2022 13:09:36 -0700 (PDT)
X-Received: by 2002:a05:6902:14e:b0:64f:d2eb:2df0 with SMTP id
p14-20020a056902014e00b0064fd2eb2df0mr39358191ybh.557.1653941375909; Mon, 30
May 2022 13:09:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Mon, 30 May 2022 13:09:35 -0700 (PDT)
In-Reply-To: <20220529090152.7@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <20220423194037.907@kylheku.com> <b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
<20220528092856.926@kylheku.com> <a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>
<20220528180320.150@kylheku.com> <307aaeed-eea5-43f0-80c5-300d837b5965n@googlegroups.com>
<20220529090152.7@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <92c6bf63-92ce-49fb-a00f-e49650055921n@googlegroups.com>
Subject: Re: expressive iteration with macros
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Mon, 30 May 2022 20:09:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Kpop 2GM - Mon, 30 May 2022 20:09 UTC

On Sunday, May 29, 2022 at 12:16:22 PM UTC-4, Kaz Kylheku wrote:
> On 2022-05-29, Kpop 2GM <jason....@gmail.com> wrote:
> >
> >> > - same code base being able to all run from at least 4 variants of awk
> >> > that i have (so it can't leverage any of the extra goodies from gawk,
> >> > and i have to devise equivalent ones),
> >> I not only have that, but you in cppawk you can test which Awk you're
> >> using at preprocessing time:
> >>
> >> #if __gawk__
> >> ...
> >> #else
> >> ...
> >> #endif
> >>
> >> can test using #if which Awk you're running on. There are command
> >> line options to tell cppawk which Awk to generate code for and execute..
> >
> > i actually meant it as being able to tell whether gawk was invoked
> > with -c flag or -n flag or -P flag or -M flag , multiply all that by
> I could add support for that in cppawk. It parses all the options in
> order to support a few cpp options, and a couple of its own. The
> rest are passed to awk. I could have it recognize -c being passed
> to gawk, to set some preprocessor symbol.
> > unicode-ness - without relying on looking at the invocation call, or
> > peek at "ps" output.
> The advantage of having a preprocessing layer is that it may be
> in many situations acceptable that the output of preprocessing just
> *assumes* it is running on a certain brand of awk, of a certain
> version, invoked in a certain way. Then you don't have any run-time
> detection and switching overheads in the code.
>
> I suspect that in many cases, the user of a portable Awk library
> is actually just using one specific awk, and doesn't care whether
> the preprocessed output works with other awks.
>
> Or else if they do care about their code running on other awks also,
> many users may be accepting of the limitations of doing it statically:
> being able to generate efficient code that is tuned to a particular awk,
> or else inefficient code that works with more awks, rather than one body
> code which switches at run-time.
> --
> TXR Programming Language: http://nongnu.org/txr
> Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

there's only one single user of that "portable library" - me

i wrote it for myself only.

it's not properly documented, it's not fully debugged, and as much as i tried to give it a best shot, i still have a handful of locations where i couldn't figure out the place i originally got the idea from, and give proper credit, so i refrained from sharing it in full for the sake of propriety

previously i wrote a small library specific to gawk features (mostly the true multi-dimensional array and unicode bits), and their built-in sorting.

so when i rewrote the entire library to account for all those awks, it took some creative thinking on how to circumvent it, but it's totally worth the effort, cuz now my mawks can perform unicode substring-ing sometimes even faster than gawk does

i also have no use-case at all for something like arabic, so i wouldn't be wasting time implementing those right to left text in unicode

but i *do* now have a module that allows typing in the 2 letter country code of any nation, and get back its emoji flag

but if you type the codes for russia or belarus, it force overrides it and outputs the Ukrainian flag instead =p

Re: expressive iteration with macros

<3b74c9f4-e3a6-4d51-a16f-d02b341ad00bn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1325&group=comp.lang.awk#1325

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:6214:1c83:b0:46b:a79a:2f0b with SMTP id ib3-20020a0562141c8300b0046ba79a2f0bmr409176qvb.103.1659600121145;
Thu, 04 Aug 2022 01:02:01 -0700 (PDT)
X-Received: by 2002:a25:c013:0:b0:671:8102:eb2 with SMTP id
c19-20020a25c013000000b0067181020eb2mr520283ybf.316.1659600120843; Thu, 04
Aug 2022 01:02:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Thu, 4 Aug 2022 01:02:00 -0700 (PDT)
In-Reply-To: <92c6bf63-92ce-49fb-a00f-e49650055921n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
References: <20220423194037.907@kylheku.com> <b01b8f69-eb56-4ed0-b813-68acbf6a9d6cn@googlegroups.com>
<20220528092856.926@kylheku.com> <a551b408-219f-45e8-b2d7-da41c59cb981n@googlegroups.com>
<20220528180320.150@kylheku.com> <307aaeed-eea5-43f0-80c5-300d837b5965n@googlegroups.com>
<20220529090152.7@kylheku.com> <92c6bf63-92ce-49fb-a00f-e49650055921n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3b74c9f4-e3a6-4d51-a16f-d02b341ad00bn@googlegroups.com>
Subject: Re: expressive iteration with macros
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Thu, 04 Aug 2022 08:02:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 130
 by: Kpop 2GM - Thu, 4 Aug 2022 08:02 UTC

@Kaz : here's what I meant by awk variant tester - the objective is simple -

get as many of them as possible to print out a different value for the exact same function call, and using their differences to create indicator flags in order to proper account for that behavior in the codes.

The example isn't perfect because some still print out the same values, but it already covers a wide swath :

Using this code, one could automate their testing on multiple awk variants, with a built-in 5-minute auto-timeout for each variant if the testing still hasn't finished by then.

The first test resulting in 0/1 is only true whenever GMP is invoked (because it prints out "-nan" instead of "+nan"/"nan" for everything else).

The second test incorporates many of their bespoke nuances, which changes their exponent against a base of 127,

============================
cmd='function ____(_,__,___) { __="\333\222"; ___=(toupper(-(_=-log(_<_))/_) != toupper(_/_)); ___=___ "......" sprintf("%c%c%c",12,8,8); ___=(___)\
((_=(_+=_^=_=_<_)+(++_+--_)^++_)^((_%100)+length(__)+(0x4)+8*(sprintf("%u",3E10)%2)+("0x10")+32*("x\4"<"\x4")+64*(__~"[^"(__)"]"))); return ___ } BEGIN { print ____() }'; echo "\n\n code tested ::\n\n$( gawk -o- -e "${cmd}" | mawk 'sub("^",(_)_)^_+gsub(/\14/,"\\f")+gsub(/\13/,"\\v")+gsub(/\11/,"\\t")+gsub(/\10/,"\\b")+gsub(/\7/,"\\a")+gsub(/\15/,"\\r")+gsub(/\33/,"\\33")+gsub(/\34/,"\\34")+gsub(/\177/,"\\177")+gsub(/\0/,"\\0")' FS='^$' \_=' ' | mawk 3 FS='^$' RS='(\n )+\n' ORS='\n\n' | gsed -zE 's/ ([+<*/^>=%-][=]|[/^*=~]) /\1/g' | mawk 'gsub(/\t/,(_=" ")_ (_)_)+gsub((_)_," \140")^_+gsub(/\t/,"\140 \140 ")' ORS= RS='^$' FS='^$' )\n"; for idx in 1 ; do for awk0 in gawk nawk mawk mawk2 ; do for flg in $( <<< "${awk0}" mawk '{ print (($_)=="gawk") ? "te Mte e b Se Sbe ce cbe ne nbe Me Mbe nMe nMbe Pe MPe " : "-" }' ); do timeout --foreground 300 printf ' %-6s -%-5s :: %s\n' "${awk0}" "${flg}" "$( eval " \"\${awk0}\" -\"\${flg}\" \"\${cmd}\" " )" ; done ; done ; done | lgp3 4 | gcat -b | mawk 'gsub(/\t/,(_=" ")_ (_)_)+gsub((_)_," \140")^_' RS='^$' FS='^$'

code tested ::

` `BEGIN {
` ` print ____()
` `}

` `function ____(_, __, ___)
` `{
` ` __="\333\222"
` ` ___=(toupper(-(_=-log(_ < _))/_) != toupper(_/_))
` ` ___=___ "......" sprintf("%c%c%c", 12, 8, 8)
` ` ___=(___) ((_=(_+=_^=_=_ < _) + (++_ + --_)^++_)^((_ % 100) + length(__) + (0x4) + 8*(sprintf("%u", 3E10) % 2) + ("0x10") + 32*("x" < "") + 64*(__~("[^" (__) "]"))))
` ` return ___
` `}

gawk: cmd. line:1: warning: `function' is not supported in old awk
gawk: cmd. line:1: warning: `toupper' is not supported in old awk
gawk: cmd. line:2: warning: operator `^=' is not supported in old awk
gawk: cmd. line:2: warning: operator `^' is not supported in old awk
gawk: cmd. line:2: warning: `return' is not supported in old awk
gawk: cmd. line:1: warning: `function' is not supported in old awk
gawk: cmd. line:1: warning: `toupper' is not supported in old awk
gawk: cmd. line:2: warning: operator `^=' is not supported in old awk
gawk: cmd. line:2: warning: operator `^' is not supported in old awk
gawk: cmd. line:2: warning: `return' is not supported in old awk

` ` 1 ` ` gawk ` -te ` `:: 0......
20975825942850833350709513021069607858612813027289575844399267446784
` ` 2 ` ` gawk ` -Mte ` :: 1......
20975825942850835435487946371038619427046538071435595461971578449921
` ` 3 ` ` gawk ` -e ` ` :: 0......
20975825942850833350709513021069607858612813027289575844399267446784
` ` 4 ` ` gawk ` -b ` ` :: 0......
2663929894742055667923408371469246315099621159900709173939115476910080

` ` 5 ` ` gawk ` -Se ` `:: 0......
20975825942850833350709513021069607858612813027289575844399267446784
` ` 6 ` ` gawk ` -Sbe ` :: 0......
2663929894742055667923408371469246315099621159900709173939115476910080
` ` 7 ` ` gawk ` -ce ` `:: 0......
80631397449585884603480471012312324983606434321474214952960
` ` 8 ` ` gawk ` -cbe ` :: 0......
10240187476097406396860348881012181757649990571271861294858240

` ` 9 ` ` gawk ` -ne ` `:: 0......
96067968254367458374461558004074452761005365968933701617309045473364966403804172795092679764568178688
` `10 ` ` gawk ` -nbe ` :: 0......
12200631968304669482593883986169010334579188647924663063729436788016311555400024520016975449783684562944
` `11 ` ` gawk ` -Me ` `:: 1......
20975825942850835435487946371038619427046538071435595461971578449921
` `12 ` ` gawk ` -Mbe ` :: 1......
2663929894742056100306969189121904667234910335072320623670390463139967

` `13 ` ` gawk ` -nMe ` :: 1......
96067968254367481276365696967779773877012643204635128378812984311724350641524683124538450810649569281
` `14 ` ` gawk ` -nMbe `:: 1......
12200631968304670122098443514908031282380605686988661304109249007588992531473634756816383252952495298687
` `15 ` ` gawk ` -Pe ` `:: 0......
7746094530492103116603130830240466614297156825794967015584458761805042780817132149091242541651768498124851548434645116915842513649999335390349056080566380658688
` `16 ` ` gawk ` -MPe ` :: 1......
1691310158431340276446001387259617718107722316775496316977546955395900025960587599577785437677679318686381164900905921415897601

` `17 ` ` nawk ` -- ` ` :: 0......
4.68994168836430963037851433017e+94
` `18 ` ` mawk ` -- ` ` :: 0......
3.17393e+111
` `19 ` ` mawk2 `-- ` ` :: 0......
4.50553e+195

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor