Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

As a computer, I find your faith in technology amusing.


devel / comp.lang.awk / Re: Generic transformations of arbitrary data entities

SubjectAuthor
* Generic transformations of arbitrary data entitiesJanis Papanagnou
+- Re: Generic transformations of arbitrary data entitiesJanis Papanagnou
+* Re: Generic transformations of arbitrary data entitiesMike Sanders
|`* Re: Generic transformations of arbitrary data entitiesJanis Papanagnou
| `* Re: Generic transformations of arbitrary data entitiesMike Sanders
|  `* [meta] Index page for projects/snippets (was Re: GenericJanis Papanagnou
|   `* Re: [meta] Index page for projects/snippets (was Re: Generic transformations of Mike Sanders
|    `- Re: [meta] Index page for projects/snippets (was Re: GenericJanis Papanagnou
`- Re: Generic transformations of arbitrary data entitiesKpop 2GM

1
Generic transformations of arbitrary data entities

<ug8nd9$2fv55$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1621&group=comp.lang.awk#1621

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Generic transformations of arbitrary data entities
Date: Thu, 12 Oct 2023 14:04:56 +0200
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <ug8nd9$2fv55$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 12 Oct 2023 12:04:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9abc16eca00f21799c76efdfb03c41f9";
logging-data="2620581"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18T4jBpRhzmOR0J0xr2UDNb"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:UQoUswNJYyaq5pOKS1Ha9oDiZhI=
X-Enigmail-Draft-Status: N1110
X-Mozilla-News-Host: news://news.eternal-september.org:119
 by: Janis Papanagnou - Thu, 12 Oct 2023 12:04 UTC

In a recent thread I posted an Awk code pattern to define words that
match a pattern and conditionally transforms it; it just relied on
POSIX Awk features. Actually, though, it's a generally usable code
pattern. With standard Awk you can substitute the entity pattern and
function to transform the defined data entities as necessary.

GNU Awk supports a couple newer features to make that generalization
more explicit, by use of first class patterns and indirect functions.

# generic function to transform specified data entities
function trent (line, pattern, transform, out)
{
for (line=$0; match(line, pattern);
line=substr(line, RSTART+RLENGTH))
{
out = out substr(line, 1, RSTART-1) \
@transform(substr(line, RSTART, RLENGTH))
}
out = out line
return out
}

With a transformation function like

function highlight (str)
{
return "\033[7m" str "\033[0m"
}

a sample usage can be

BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
}

Applied to the task from the other thread you can provide

function isogram_highlight (str)
{
return (isogram(str) ? "\033[7m" str "\033[0m" : str)
}

using Mike's (only slightly changed by me) isogram() algorithm

function isogram(str, c, x, y) {
y = length(str)
for (x = 1; x < y; x++) {
c = substr(str, x, 1)
if (index(substr(str, x + 1), c)) return 0
}
return 1
}

in a context like

BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
print trent($0, words, "isogram_highlight")
}

Note again that this solution based on a generalized algorithm
uses GNU Awk specific features and is not conforming to POSIX!

Janis

Re: Generic transformations of arbitrary data entities

<ug96iv$2jd1g$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1622&group=comp.lang.awk#1622

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: Generic transformations of arbitrary data entities
Date: Thu, 12 Oct 2023 18:23:58 +0200
Organization: A noiseless patient Spider
Lines: 79
Message-ID: <ug96iv$2jd1g$1@dont-email.me>
References: <ug8nd9$2fv55$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 12 Oct 2023 16:23:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9abc16eca00f21799c76efdfb03c41f9";
logging-data="2733104"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184yhKktxTUs3PN/PUZ+5qB"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:5AeywxOFqhyzq4IYRYYaK91WfU8=
In-Reply-To: <ug8nd9$2fv55$1@dont-email.me>
 by: Janis Papanagnou - Thu, 12 Oct 2023 16:23 UTC

On 12.10.2023 14:04, Janis Papanagnou wrote:
> In a recent thread I posted an Awk code pattern to define words that
> match a pattern and conditionally transforms it; it just relied on
> POSIX Awk features. Actually, though, it's a generally usable code
> pattern. With standard Awk you can substitute the entity pattern and
> function to transform the defined data entities as necessary.
>
> GNU Awk supports a couple newer features to make that generalization
> more explicit, by use of first class patterns and indirect functions.
>
>
> # generic function to transform specified data entities
> function trent (line, pattern, transform, out)
> {
> for (line=$0; match(line, pattern);

The line=$0 assignment was a remains from an earlier version. Here
you don't want it, since 'line' is passed as a function parameter.
So make that just

for ( ; match(line, pattern);

> line=substr(line, RSTART+RLENGTH))
> {
> out = out substr(line, 1, RSTART-1) \
> @transform(substr(line, RSTART, RLENGTH))
> }
> out = out line
> return out
> }
>
> With a transformation function like
>
> function highlight (str)
> {
> return "\033[7m" str "\033[0m"
> }
>
> a sample usage can be
>
> BEGIN { words = @/[[:alpha:]]+/ }
> {
> print trent($0, words, "highlight")
> }
>
>
> Applied to the task from the other thread you can provide
>
> function isogram_highlight (str)
> {
> return (isogram(str) ? "\033[7m" str "\033[0m" : str)
> }
>
> using Mike's (only slightly changed by me) isogram() algorithm
>
> function isogram(str, c, x, y) {
> y = length(str)
> for (x = 1; x < y; x++) {
> c = substr(str, x, 1)
> if (index(substr(str, x + 1), c)) return 0
> }
> return 1
> }
>
> in a context like
>
> BEGIN { words = @/[[:alpha:]]+/ }
> {
> print trent($0, words, "highlight")
> print trent($0, words, "isogram_highlight")
> }
>
>
> Note again that this solution based on a generalized algorithm
> uses GNU Awk specific features and is not conforming to POSIX!
>
> Janis
>

Re: Generic transformations of arbitrary data entities

<ug9fnt$2lcu4$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1624&group=comp.lang.awk#1624

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: porkchop@invalid.foo (Mike Sanders)
Newsgroups: comp.lang.awk
Subject: Re: Generic transformations of arbitrary data entities
Date: Thu, 12 Oct 2023 19:00:13 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 79
Sender: Mike Sanders <busybox@sdf.org>
Message-ID: <ug9fnt$2lcu4$2@dont-email.me>
References: <ug8nd9$2fv55$1@dont-email.me>
Injection-Date: Thu, 12 Oct 2023 19:00:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e437a32b1da05df3e95d49f5a799fcda";
logging-data="2798532"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19qBG+W86y4cS6AK++VfHuB"
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (NetBSD/9.3 (amd64))
Cancel-Lock: sha1:mQo0XcEaZWLiRr0E9abCq2jlxi4=
 by: Mike Sanders - Thu, 12 Oct 2023 19:00 UTC

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

> In a recent thread I posted an Awk code pattern to define words that
> match a pattern and conditionally transforms it; it just relied on
> POSIX Awk features. Actually, though, it's a generally usable code
> pattern. With standard Awk you can substitute the entity pattern and
> function to transform the defined data entities as necessary.
>
> GNU Awk supports a couple newer features to make that generalization
> more explicit, by use of first class patterns and indirect functions.
>
>
> # generic function to transform specified data entities
> function trent (line, pattern, transform, out)
> {
> for (line=$0; match(line, pattern);
> line=substr(line, RSTART+RLENGTH))
> {
> out = out substr(line, 1, RSTART-1) \
> @transform(substr(line, RSTART, RLENGTH))
> }
> out = out line
> return out
> }
>
> With a transformation function like
>
> function highlight (str)
> {
> return "\033[7m" str "\033[0m"
> }
>
> a sample usage can be
>
> BEGIN { words = @/[[:alpha:]]+/ }
> {
> print trent($0, words, "highlight")
> }
>
>
> Applied to the task from the other thread you can provide
>
> function isogram_highlight (str)
> {
> return (isogram(str) ? "\033[7m" str "\033[0m" : str)
> }
>
> using Mike's (only slightly changed by me) isogram() algorithm
>
> function isogram(str, c, x, y) {
> y = length(str)
> for (x = 1; x < y; x++) {
> c = substr(str, x, 1)
> if (index(substr(str, x + 1), c)) return 0
> }
> return 1
> }
>
> in a context like
>
> BEGIN { words = @/[[:alpha:]]+/ }
> {
> print trent($0, words, "highlight")
> print trent($0, words, "isogram_highlight")
> }
>
>
> Note again that this solution based on a generalized algorithm
> uses GNU Awk specific features and is not conforming to POSIX!
>
> Janis

Good stuff. Adding this to my notes in fact. I really was hoping
others would see some value in using hilite(). Its handy on my end too.

--
:wq
Mike Sanders

Re: Generic transformations of arbitrary data entities

<ugarri$3225c$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1627&group=comp.lang.awk#1627

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: Generic transformations of arbitrary data entities
Date: Fri, 13 Oct 2023 09:33:05 +0200
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <ugarri$3225c$1@dont-email.me>
References: <ug8nd9$2fv55$1@dont-email.me> <ug9fnt$2lcu4$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Oct 2023 07:33:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ca60a1ca090bc6cd1be566003f254ebf";
logging-data="3213484"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19EFd4mrrvPDNOUlsKmZVWf"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:2uYcq1HRiKkyZSIYhC7VCGd7PHg=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <ug9fnt$2lcu4$2@dont-email.me>
 by: Janis Papanagnou - Fri, 13 Oct 2023 07:33 UTC

On 12.10.2023 21:00, Mike Sanders wrote:
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> [...]
>>
>> BEGIN { words = @/[[:alpha:]]+/ }
>> {
>> print trent($0, words, "highlight")
>> print trent($0, words, "isogram_highlight")
>> }
>>
>>
>> Note again that this solution based on a generalized algorithm
>> uses GNU Awk specific features and is not conforming to POSIX!
>
> Good stuff. Adding this to my notes in fact. I really was hoping
> others would see some value in using hilite(). Its handy on my end too.

I'm using ANSI escaped from time to time, and also just recently,
e.g. for coloring.

But my point here was more the generalization. The task to change
some entities on a line while preserving the spacing, delimiters,
and other information is quite common. I used it a couple times
and always reprogrammed the two-lines loop with different pattern
for different transformations. That's why I think that GNU Awk's
features - too sad you cannot use them! - are valuable; they can
emulate quite nicely what other languages do with real function
arguments.

I expanded my test program[*] with some more simple applications
that lead to

BEGIN {
...
words = @/[[:alpha:]]+/
numbers = @/[[:digit:]]+/
names = @/([[:upper:]][.])*[[:upper:]][[:lower:]]*/
} {
print trent($0, words, "highlight")
print trent($0, words, "isogram_highlight")
print trent($0, numbers, "black_out")
print trent($0, names, "black_out")
print trent($0, names, "anonymize")
}

Just to demonstrate the point by possible combinations of patterns
(that can of course be simply refined) and functions (identified
by their names).

Janis

[*] Extended test program: volatile.gridbug.de/transform_words

Re: Generic transformations of arbitrary data entities

<ugcnvh$3g1d6$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1629&group=comp.lang.awk#1629

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: porkchop@invalid.foo (Mike Sanders)
Newsgroups: comp.lang.awk
Subject: Re: Generic transformations of arbitrary data entities
Date: Sat, 14 Oct 2023 00:39:14 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 41
Sender: Mike Sanders <busybox@sdf.org>
Message-ID: <ugcnvh$3g1d6$2@dont-email.me>
References: <ug8nd9$2fv55$1@dont-email.me> <ug9fnt$2lcu4$2@dont-email.me> <ugarri$3225c$1@dont-email.me>
Injection-Date: Sat, 14 Oct 2023 00:39:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7a4b14fe9c3413f52ecd523ddc996f08";
logging-data="3671462"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1yEiA3GkUYMCvHTqpYWLq"
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (NetBSD/9.3 (amd64))
Cancel-Lock: sha1:t6+jeXv9fEuSRJlRwcFj7yR5rA0=
 by: Mike Sanders - Sat, 14 Oct 2023 00:39 UTC

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

> I'm using ANSI escaped from time to time, and also just recently,
> e.g. for coloring.

Myself as well...

<https://drive.google.com/file/d/1tf_X3U3TwJQz67z3gdFBSZo2oKW2vcao/view>
> But my point here was more the generalization. The task to change
> some entities on a line while preserving the spacing, delimiters,
> and other information is quite common. I used it a couple times
> and always reprogrammed the two-lines loop with different pattern
> for different transformations. That's why I think that GNU Awk's
> features - too sad you cannot use them! - are valuable; they can
> emulate quite nicely what other languages do with real function
> arguments.

I hope too soon =) Yet a while longer I can't.

> I expanded my test program[*] with some more simple applications
> that lead to
>
> BEGIN {
> ...
> words = @/[[:alpha:]]+/
> numbers = @/[[:digit:]]+/
> names = @/([[:upper:]][.])*[[:upper:]][[:lower:]]*/
> }

That is so cool!

> [*] Extended test program: volatile.gridbug.de/transform_words

Will you have an index page of your projects/snippets
in the future Janis?

--
:wq
Mike Sanders

[meta] Index page for projects/snippets (was Re: Generic transformations of arbitrary data entities)

<ugdtcn$3r0ut$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1630&group=comp.lang.awk#1630

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: [meta] Index page for projects/snippets (was Re: Generic
transformations of arbitrary data entities)
Date: Sat, 14 Oct 2023 13:17:42 +0200
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <ugdtcn$3r0ut$1@dont-email.me>
References: <ug8nd9$2fv55$1@dont-email.me> <ug9fnt$2lcu4$2@dont-email.me>
<ugarri$3225c$1@dont-email.me> <ugcnvh$3g1d6$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 14 Oct 2023 11:17:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="12d72012d0e8891b29ccc16a4efebb28";
logging-data="4031453"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18JsV0UeDINu4gdGamUmpSH"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:t1RzISl+6yKQQiXekwMIr/g8S+g=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <ugcnvh$3g1d6$2@dont-email.me>
 by: Janis Papanagnou - Sat, 14 Oct 2023 11:17 UTC

On 14.10.2023 02:39, Mike Sanders wrote:
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>
>> [*] Extended test program: volatile.gridbug.de/transform_words
>
> Will you have an index page of your projects/snippets
> in the future Janis?

Unfortunately(?), no. - I've never[*] started to systematically publish
any code (and I don't intend to do so). My approach was discussions in
Usenet, sharing knowledge, and code only on demand or where it supports
the shared and discussed topics. There's also too much stuff that got
accumulated over the decades; it would require quite some effort to
provide that in a form of sufficient quality. My view was that anything
useful that I posted could eventually be retrieved using some search
engine[**]. The ideas (those that are worth it) and insights can still
spread (or become forgotten). For me it's "Open Ideas", something like
Open Source for non-code contributions. Occasionally I drop some code
on grigbug.de ('volatile' for stuff I might delete, 'random' for stuff
that might stay available), but that's just a small fraction of the
stuff I have on my disks. These two sub-domains have thus no index
page[***] and bound to a post (or an email), but previously in Usenet
posted links might still have the information.

For the intention of my previous post the code for the sample functions
were unnecessary, but I wanted to provide them as "amendment" for folks
who want to see some complete and runnable code.

Feel free to ask if you need something specific.

Janis

[*] "never" = only rarely, or only in specific cases.

[**] Sadly whenever I now try to find some older stuff I often cannot
find it any more (using Google).

[***] Other sub-domains for specific topics do an organized form with
an index.

Re: [meta] Index page for projects/snippets (was Re: Generic transformations of arbitrary data entities)

<ugk0ji$1jtts$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1631&group=comp.lang.awk#1631

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: porkchop@invalid.foo (Mike Sanders)
Newsgroups: comp.lang.awk
Subject: Re: [meta] Index page for projects/snippets (was Re: Generic transformations of arbitrary data entities)
Date: Mon, 16 Oct 2023 18:49:23 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 16
Sender: Mike Sanders <busybox@sdf.org>
Message-ID: <ugk0ji$1jtts$1@dont-email.me>
References: <ug8nd9$2fv55$1@dont-email.me> <ug9fnt$2lcu4$2@dont-email.me> <ugarri$3225c$1@dont-email.me> <ugcnvh$3g1d6$2@dont-email.me> <ugdtcn$3r0ut$1@dont-email.me>
Injection-Date: Mon, 16 Oct 2023 18:49:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9d4456f4bcca0aad239c10032da9c0af";
logging-data="1701820"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+CPqVNprjWbXZAh2fvIpEY"
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (NetBSD/9.3 (amd64))
Cancel-Lock: sha1:BvlPs0FNc/ueXfY7Wg64fotuWB0=
 by: Mike Sanders - Mon, 16 Oct 2023 18:49 UTC

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

> Unfortunately(?), no...

No? I say 'yes'. Much to read/learn...

Me? I think I will in fact. Index by the end of the week
and lots of interesting (at least to me) items on the way.

You only live once Janis, I hope someday you'll reconsider
for the benefit of others =)

--
:wq
Mike Sanders

Re: [meta] Index page for projects/snippets (was Re: Generic transformations of arbitrary data entities)

<ugk3g2$1rdhi$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1632&group=comp.lang.awk#1632

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: [meta] Index page for projects/snippets (was Re: Generic
transformations of arbitrary data entities)
Date: Mon, 16 Oct 2023 21:38:40 +0200
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <ugk3g2$1rdhi$1@dont-email.me>
References: <ug8nd9$2fv55$1@dont-email.me> <ug9fnt$2lcu4$2@dont-email.me>
<ugarri$3225c$1@dont-email.me> <ugcnvh$3g1d6$2@dont-email.me>
<ugdtcn$3r0ut$1@dont-email.me> <ugk0ji$1jtts$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 16 Oct 2023 19:38:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1e49fcd67244ed885d4be3a453381991";
logging-data="1947186"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Cp9sBGcGzS4UYrmYr0CxB"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:3O9P30ILInaJTivIah+g5DhFDeQ=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <ugk0ji$1jtts$1@dont-email.me>
 by: Janis Papanagnou - Mon, 16 Oct 2023 19:38 UTC

On 16.10.2023 20:49, Mike Sanders wrote:
>
> You only live once Janis, I hope someday you'll reconsider
> for the benefit of others =)

For the benefit of others, spread the word... - with or without
an index. :-)

I promise I will reconsider it in my next life! ;-)

Janis

Re: Generic transformations of arbitrary data entities

<2e1284b3-700e-48f4-acf8-cc57b8f293bdn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1640&group=comp.lang.awk#1640

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:6214:4a4f:b0:66d:9b:9389 with SMTP id ph15-20020a0562144a4f00b0066d009b9389mr70214qvb.5.1698435477609;
Fri, 27 Oct 2023 12:37:57 -0700 (PDT)
X-Received: by 2002:a9d:4e94:0:b0:6c4:c9dd:cfa6 with SMTP id
v20-20020a9d4e94000000b006c4c9ddcfa6mr932813otk.0.1698435477435; Fri, 27 Oct
2023 12:37:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Fri, 27 Oct 2023 12:37:56 -0700 (PDT)
In-Reply-To: <ug8nd9$2fv55$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:ac80:1475:6079:42f9;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:ac80:1475:6079:42f9
References: <ug8nd9$2fv55$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2e1284b3-700e-48f4-acf8-cc57b8f293bdn@googlegroups.com>
Subject: Re: Generic transformations of arbitrary data entities
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Fri, 27 Oct 2023 19:37:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4861
 by: Kpop 2GM - Fri, 27 Oct 2023 19:37 UTC

On Thursday, October 12, 2023 at 8:05:01 AM UTC-4, Janis Papanagnou wrote:
> In a recent thread I posted an Awk code pattern to define words that
> match a pattern and conditionally transforms it; it just relied on
> POSIX Awk features. Actually, though, it's a generally usable code
> pattern. With standard Awk you can substitute the entity pattern and
> function to transform the defined data entities as necessary.
>
> GNU Awk supports a couple newer features to make that generalization
> more explicit, by use of first class patterns and indirect functions.
>
>
> # generic function to transform specified data entities
> function trent (line, pattern, transform, out)
> {
> for (line=$0; match(line, pattern);
> line=substr(line, RSTART+RLENGTH))
> {
> out = out substr(line, 1, RSTART-1) \
> @transform(substr(line, RSTART, RLENGTH))
> }
> out = out line
> return out
> }
>
> With a transformation function like
>
> function highlight (str)
> {
> return "\033[7m" str "\033[0m"
> }
>
> a sample usage can be
>
> BEGIN { words = @/[[:alpha:]]+/ }
> {
> print trent($0, words, "highlight")
> }
>
>
> Applied to the task from the other thread you can provide
>
> function isogram_highlight (str)
> {
> return (isogram(str) ? "\033[7m" str "\033[0m" : str)
> }
>
> using Mike's (only slightly changed by me) isogram() algorithm
>
> function isogram(str, c, x, y) {
> y = length(str)
> for (x = 1; x < y; x++) {
> c = substr(str, x, 1)
> if (index(substr(str, x + 1), c)) return 0
> }
> return 1
> }
>
> in a context like
>
> BEGIN { words = @/[[:alpha:]]+/ }
> {
> print trent($0, words, "highlight")
> print trent($0, words, "isogram_highlight")
> }
>
>
> Note again that this solution based on a generalized algorithm
> uses GNU Awk specific features and is not conforming to POSIX!
>
> Janis

hmm ….. a heterogram is when # unique chars == string length, but isogram technically just means all chars within it show up at the same frequency -

i.e. "DODO" is an isogram, but the function above results a FALSE (0). The code below should rectify the test case differences. The updated function adds 2 rapid exit criteria based on whether (a) input string is empty or only 1 character long, or (b) whether # of copies of left most character isn't an integer multiple of the total input length. From there on, the freq counts returned by each subsequent gsub(…) must match that of the left-most char.

. . 1 .FRR . . . . . 0 .}:orig | new:{ .0
. . 2 .DODO . . . . .0 .}:orig | new:{ .1 .<-----
. . 3 .ECBFADEDCFAB .0 .}:orig | new:{ .1 .<-----
. . 4 .KWNAWKAN . . .0 .}:orig | new:{ .1 .<-----
. . 5 .BAIDU . . . . 1 .}:orig | new:{ .1

. . 6 .BLACKHORSE . .1 .}:orig | new:{ .1
. . 7 .DUBAI . . . . 1 .}:orig | new:{ .1
. . 8 .DUMBWAITER . .1 .}:orig | new:{ .1
. . 9 .ISOGRAM . . . 1 .}:orig | new:{ .1
. .10 .PATHFINDER . .1 .}:orig | new:{ .1

=====================================
function isogram_new(__, _, ___) {
.. .
.. . if ( ! ((_ = (___ = length(__)) <= !!___) ||
.. . . . . ___ % (___ = gsub(substr(__, ++_, _--), "", __))))
.. .
.. . . . for (_++; __; )
.. . . . . . ___ == gsub(substr(__, _, _), "", __) || _ *= __ = ""

.. . return _
}

— The 4Chan Teller

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor