Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Somebody's terminal is dropping bits. I found a pile of them over in the corner.


devel / comp.lang.awk / djb2 hash using awk

SubjectAuthor
* djb2 hash using awkMichael Sanders
+* Re: djb2 hash using awkJanis Papanagnou
|+- Re: djb2 hash using awkJanis Papanagnou
|+* Re: djb2 hash using awkMichael Sanders
||`- Re: djb2 hash using awkJanis Papanagnou
|`* Re: djb2 hash using awkMichael Sanders
| `* Re: djb2 hash using awkJanis Papanagnou
|  `* Re: djb2 hash using awkManuel Collado
|   `* Re: djb2 hash using awkMichael Sanders
|    `- Re: djb2 hash using awkBen Bacarisse
+* Re: djb2 hash using awkManuel Collado
|`- Re: djb2 hash using awkMichael Sanders
+* Re: djb2 hash using awkBen Bacarisse
|`* Re: djb2 hash using awkMichael Sanders
| +* Re: djb2 hash using awkJanis Papanagnou
| |+* Re: djb2 hash using awkMichael Sanders
| ||`- Re: djb2 hash using awkJanis Papanagnou
| |`- Re: djb2 hash using awkJanis Papanagnou
| `* Re: djb2 hash using awkBen Bacarisse
|  `* Re: djb2 hash using awkMichael Sanders
|   +* Re: djb2 hash using awkJanis Papanagnou
|   |`* Re: djb2 hash using awkKenny McCormack
|   | `- Re: djb2 hash using awkJanis Papanagnou
|   `* Re: djb2 hash using awkBen Bacarisse
|    `* Re: djb2 hash using awkMichael Sanders
|     `* Re: djb2 hash using awkBen Bacarisse
|      `* Re: djb2 hash using awkMichael Sanders
|       `* Re: djb2 hash using awkBen Bacarisse
|        `- Re: djb2 hash using awkMike Sanders
+- Re: djb2 hash using awkMichael Sanders
+- Re: djb2 hash using awkMichael Sanders
`* [Complete] Re: djb2 hash using awkMike Sanders
 `* Re: [Complete] Re: djb2 hash using awkKeith Thompson
  +* Re: [Complete] Re: djb2 hash using awkMike Sanders
  |`* Re: [Complete] Re: djb2 hash using awkKeith Thompson
  | `* Re: [Complete] Re: djb2 hash using awkMike Sanders
  |  `* Re: [Complete] Re: djb2 hash using awkKeith Thompson
  |   `* Re: [Complete] Re: djb2 hash using awkMike Sanders
  |    `- Re: [Complete] Re: djb2 hash using awkMike Sanders
  `* Re: [Complete] Re: djb2 hash using awkJanis Papanagnou
   +- Re: [Complete] Re: djb2 hash using awkMike Sanders
   `- Re: [Complete] Re: djb2 hash using awkMichael Sanders

Pages:12
djb2 hash using awk

<69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1521&group=comp.lang.awk#1521

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:622a:c7:b0:40c:8ba5:33e7 with SMTP id p7-20020a05622a00c700b0040c8ba533e7mr4731qtw.1.1695412209807;
Fri, 22 Sep 2023 12:50:09 -0700 (PDT)
X-Received: by 2002:a05:6808:189e:b0:3ae:1799:9a50 with SMTP id
bi30-20020a056808189e00b003ae17999a50mr355853oib.11.1695412209551; Fri, 22
Sep 2023 12:50:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Fri, 22 Sep 2023 12:50:09 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
Subject: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Fri, 22 Sep 2023 19:50:09 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael Sanders - Fri, 22 Sep 2023 19:50 UTC

Greetings folks, just sharing the knowledge (& test posting using google groups).

Beware wordwrap...

function djb2(str, hash, i, char) {

# initialize hash to an arbitrary prime number
hash = 5381
n = length(str)

# iterate over characters and compute hash
for(i = 1; i <= n; i++) {
char = substr(str, i, 1)
# modulo simulates 32-bit unsigned integer
hash = (hash * 33 + ord(char)) % 2147483647
}

return hash
}

function ord(char) {

return (char ~ /[[:digit:]]/ ? int(char) : int(ord[char]))

}

BEGIN {

for(i=32;i<=126;i++) ord[sprintf("%c", i)] = i

print djb2("my key is...")

}

notes:

https://stackoverflow.com/questions/1579721/why-are-5381-and-33-so-important-in-the-djb2-algorithm

https://stackoverflow.com/questions/10696223/reason-for-the-number-5381-in-the-djb-hash-function

https://groups.google.com/g/comp.lang.c/c/lSKWXiuNOAk

https://en.wikipedia.org/wiki/Daniel_J._Bernstein

Re: djb2 hash using awk

<uem8ij$nn2f$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1522&group=comp.lang.awk#1522

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sat, 23 Sep 2023 10:45:06 +0200
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <uem8ij$nn2f$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Sep 2023 08:45:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ca1baf94a0b8b3955fd3a0c48ead1120";
logging-data="777295"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19IAqouCQqbq8aiiZlW1TYU"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:iz7DSFgZxVPEBwVOnfKKlaWeW8s=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
 by: Janis Papanagnou - Sat, 23 Sep 2023 08:45 UTC

On 22.09.2023 21:50, Michael Sanders wrote:
>
> Greetings folks, just sharing the knowledge (& test posting using google groups).

Hello,

the post through Google groups seems technically fine - it certainly
wasn't in the past -, only Usenet posting conventions seem violated
(line length).

But the Awk code rises a couple questions...

> Beware wordwrap...
>
> function djb2(str, hash, i, char) {
>
> # initialize hash to an arbitrary prime number
> hash = 5381
> n = length(str)

Variable 'n' not declared as local (just for good measure, and in
case you want to use the code in other code contexts).

>
> # iterate over characters and compute hash
> for(i = 1; i <= n; i++) {
> char = substr(str, i, 1)
> # modulo simulates 32-bit unsigned integer

The previous comment suggests that the subsequent code should use
the constant 2147483648 if you want a 32-bit unsigned integer. On
the other hand such formulas often use primes (and 2147483647 is
one). So it leaves me with uncertainty what is actually intended.

> hash = (hash * 33 + ord(char)) % 2147483647

Is it guaranteed that the expression evaluates correctly in all
awks? The intermediate results can become as large as 70866960411
(0x108000001B, which is larger than 32 bit), which needs to be
representable in awk (not sure what POSIX awk defines/requires
here).

> }
>
> return hash
> }
>
> function ord(char) {
>
> return (char ~ /[[:digit:]]/ ? int(char) : int(ord[char]))

Why use this costly pattern match and conditional expression
if you can anyway simply access the ord[] array where all data
is already present?

(I would also try to avoid name clashes like ord() and ord[],
BTW. Here it's simple, because the function is superfluous;
just replace the ord(char) at the calling side by ord[char]
and remove the function completely.)

>
> }
>
> BEGIN {
>
> for(i=32;i<=126;i++) ord[sprintf("%c", i)] = i

What will the code produce with, say, "extended" ASCII, or the
common ISO 8859-x family of character sets? Is there any reason
to restrict it here?

What are the criteria for the chosen character subset? How will
or how should a TAB character in the data change the result?

>
> print djb2("my key is...")
>
> }

As I said, just a couple of more or less obvious questions.

Janis

Re: djb2 hash using awk

<uem8pq$no3f$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1523&group=comp.lang.awk#1523

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sat, 23 Sep 2023 10:48:57 +0200
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <uem8pq$no3f$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<uem8ij$nn2f$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Sep 2023 08:48:58 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ca1baf94a0b8b3955fd3a0c48ead1120";
logging-data="778351"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fRpRbYp2dyhWAam7bv5Uf"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:dWaO5woD1gDD4ocf1u0SXZR+mO0=
In-Reply-To: <uem8ij$nn2f$1@dont-email.me>
 by: Janis Papanagnou - Sat, 23 Sep 2023 08:48 UTC

On 23.09.2023 10:45, Janis Papanagnou wrote:
> On 22.09.2023 21:50, Michael Sanders wrote:
>> [...]
>> # modulo simulates 32-bit unsigned integer
>
> The previous comment suggests that the subsequent code should use
> the constant 2147483648 if you want a 32-bit unsigned integer. [...]

Erm..., _unsigned_ 32 bit integer would of course mean 4294967296.

Janis

Re: djb2 hash using awk

<0f3ddfdb-c23c-4e58-b041-6d551ecdd9ea@gmail.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1524&group=comp.lang.awk#1524

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mcollado2011@gmail.com (Manuel Collado)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sat, 23 Sep 2023 10:54:25 +0200
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <0f3ddfdb-c23c-4e58-b041-6d551ecdd9ea@gmail.com>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="f290a4508b4b6aecd6dc25327707ff85";
logging-data="780013"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX185OmHDTh9eMFTG4544OS5vrpwrDrVzYKc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:X4u1eLdhJvsycQDJgmGhXsKCG2o=
Content-Language: en-US, es-ES
In-Reply-To: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
 by: Manuel Collado - Sat, 23 Sep 2023 08:54 UTC

There is something wrong:

$ LC_ALL=C gawk 'function ord(char) {
return (char ~ /[[:digit:]]/ ? int(char) : int(ord[char]))
}

BEGIN {print ord("x")}'
gawk: cmd. line:3: error: function `ord' called with space between name
and `(',
or used as a variable or an array

Please note ord[char] vs. ord(char)

HTH

El 22/9/23 a las 21:50, Michael Sanders escribió:
>
> Greetings folks, just sharing the knowledge (& test posting using google groups).
>
> Beware wordwrap...
>
> function djb2(str, hash, i, char) {
>
> # initialize hash to an arbitrary prime number
> hash = 5381
> n = length(str)
>
> # iterate over characters and compute hash
> for(i = 1; i <= n; i++) {
> char = substr(str, i, 1)
> # modulo simulates 32-bit unsigned integer
> hash = (hash * 33 + ord(char)) % 2147483647
> }
>
> return hash
> }
>
> function ord(char) {
>
> return (char ~ /[[:digit:]]/ ? int(char) : int(ord[char]))
>
> }
>
> BEGIN {
>
> for(i=32;i<=126;i++) ord[sprintf("%c", i)] = i
>
> print djb2("my key is...")
>
> }
>
> notes:
>
> https://stackoverflow.com/questions/1579721/why-are-5381-and-33-so-important-in-the-djb2-algorithm
>
> https://stackoverflow.com/questions/10696223/reason-for-the-number-5381-in-the-djb-hash-function
>
> https://groups.google.com/g/comp.lang.c/c/lSKWXiuNOAk
>
> https://en.wikipedia.org/wiki/Daniel_J._Bernstein

--
Manuel Collado - http://mcollado.z15.es

Re: djb2 hash using awk

<f6617bee-2425-4927-968c-47e35b418042n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1525&group=comp.lang.awk#1525

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:601a:b0:774:793:e7cc with SMTP id dw26-20020a05620a601a00b007740793e7ccmr15323qkb.1.1695469146257;
Sat, 23 Sep 2023 04:39:06 -0700 (PDT)
X-Received: by 2002:a05:6808:148f:b0:3ae:16a8:f441 with SMTP id
e15-20020a056808148f00b003ae16a8f441mr1171678oiw.11.1695469145958; Sat, 23
Sep 2023 04:39:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 23 Sep 2023 04:39:05 -0700 (PDT)
In-Reply-To: <uem8ij$nn2f$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com> <uem8ij$nn2f$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f6617bee-2425-4927-968c-47e35b418042n@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sat, 23 Sep 2023 11:39:06 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3636
 by: Michael Sanders - Sat, 23 Sep 2023 11:39 UTC

On Saturday, September 23, 2023 at 3:45:10 AM UTC-5, Janis Papanagnou wrote:

> Variable 'n' not declared as local (just for good measure, and in
> case you want to use the code in other code contexts).

Aye, I spotted this *after* I posted (of course...). Updated
function signature below:

function djb2(str, hash, n, i, char)

> Is it guaranteed that the expression evaluates correctly in all
> awks? The intermediate results can become as large as 70866960411
> (0x108000001B, which is larger than 32 bit), which needs to be
> representable in awk (not sure what POSIX awk defines/requires
> here).

Huh? From my initial post in this thread:

# modulo simulates 32-bit unsigned integer
hash = (hash * 33 + ord(char)) % 2147483647

Modulo wraps at 2147483647, it'll never outgrow
it limits. Maybe I'm not grokking what you meant.

And down-thread you remarked:

> Erm..., _unsigned_ 32 bit integer would of course mean 4294967296

That's the max, but certainly not the only possibility...

> [...]
>
> (I would also try to avoid name clashes like ord() and ord[],
> BTW. Here it's simple, because the function is superfluous;
> just replace the ord(char) at the calling side by ord[char]
> and remove the function completely.)

And yet, this rational assumes an extension to the language,
which implies *only* gawk though right? What about the *standard*
awks (tested dbj2() using only mawk & busybox's awk thus far -
the smallest implementations out there). Still, have you by chance
invoked gawk as:

gawk --traditional

I cant on my end as the hard drive died on porkchop.bsd,
my OpenBSD box...

> What will the code produce with, say, "extended" ASCII, or the
> common ISO 8859-x family of character sets? Is there any reason
> to restrict it here?
>
> What are the criteria for the chosen character subset? How will
> or how should a TAB character in the data change the result?

No issues with TAB (see my function ord()), & non 7bit character
sets? Dunno... only need lower ASCII here, hence the hard-coded
limit:

min: 32, max: 126

If so compelled, test other sets & post your findings here.
Would be interested to read what you come up with.

> As I said, just a couple of more or less obvious questions.

Thanks Janis, as always, appreciate your insights. Hoping to get
back to using tin as my news reader soon, the google groups
interface its perhaps not my cup of tea. :/

Re: djb2 hash using awk

<37a10951-8fc5-4b17-a809-9b5c2f3e0227n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1526&group=comp.lang.awk#1526

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:ac8:5796:0:b0:417:fcf8:905c with SMTP id v22-20020ac85796000000b00417fcf8905cmr16665qta.10.1695469373027;
Sat, 23 Sep 2023 04:42:53 -0700 (PDT)
X-Received: by 2002:a9d:62d7:0:b0:6c0:b1b2:9504 with SMTP id
z23-20020a9d62d7000000b006c0b1b29504mr557052otk.3.1695469372818; Sat, 23 Sep
2023 04:42:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 23 Sep 2023 04:42:52 -0700 (PDT)
In-Reply-To: <0f3ddfdb-c23c-4e58-b041-6d551ecdd9ea@gmail.com>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com> <0f3ddfdb-c23c-4e58-b041-6d551ecdd9ea@gmail.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <37a10951-8fc5-4b17-a809-9b5c2f3e0227n@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sat, 23 Sep 2023 11:42:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1837
 by: Michael Sanders - Sat, 23 Sep 2023 11:42 UTC

Hi Manuel.

Question, have you tried: --traditional or --posix options?

See also:

https://www.gnu.org/software/gawk/manual/html_node/Compatibility-Mode.html

On Saturday, September 23, 2023 at 3:54:29 AM UTC-5, Manuel Collado wrote:

> There is something wrong:
>
> $ LC_ALL=C gawk 'function ord(char) {
> return (char ~ /[[:digit:]]/ ? int(char) : int(ord[char]))
> }
> BEGIN {print ord("x")}'
> gawk: cmd. line:3: error: function `ord' called with space between name
> and `(',
> or used as a variable or an array
>
> Please note ord[char] vs. ord(char)
>
> HTH

Re: djb2 hash using awk

<22d88cf9-4566-42d3-ab0d-0aed262fc642n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1527&group=comp.lang.awk#1527

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:601a:b0:774:793:e7cc with SMTP id dw26-20020a05620a601a00b007740793e7ccmr16362qkb.1.1695471940736;
Sat, 23 Sep 2023 05:25:40 -0700 (PDT)
X-Received: by 2002:a05:6870:955b:b0:1dc:27f6:b866 with SMTP id
v27-20020a056870955b00b001dc27f6b866mr886001oal.1.1695471940520; Sat, 23 Sep
2023 05:25:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 23 Sep 2023 05:25:39 -0700 (PDT)
In-Reply-To: <uem8ij$nn2f$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com> <uem8ij$nn2f$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <22d88cf9-4566-42d3-ab0d-0aed262fc642n@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sat, 23 Sep 2023 12:25:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1566
 by: Michael Sanders - Sat, 23 Sep 2023 12:25 UTC

On Saturday, September 23, 2023 at 3:45:10 AM UTC-5, Janis Papanagnou wrote:

> (I would also try to avoid name clashes like ord() and ord[],
> BTW. Here it's simple, because the function is superfluous;
> just replace the ord(char) at the calling side by ord[char]
> and remove the function completely.)

One more quick question... does gawk have an ord() builtin
function or not?

Re: djb2 hash using awk

<uemohd$q6tp$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1528&group=comp.lang.awk#1528

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sat, 23 Sep 2023 15:17:32 +0200
Organization: A noiseless patient Spider
Lines: 103
Message-ID: <uemohd$q6tp$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<uem8ij$nn2f$1@dont-email.me>
<f6617bee-2425-4927-968c-47e35b418042n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Sep 2023 13:17:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ca1baf94a0b8b3955fd3a0c48ead1120";
logging-data="859065"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+86BN5gGZnAfX3FfDQfqc9"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:kgAc4lo7UzJRoO/zgk3SvJ1icRM=
In-Reply-To: <f6617bee-2425-4927-968c-47e35b418042n@googlegroups.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sat, 23 Sep 2023 13:17 UTC

On 23.09.2023 13:39, Michael Sanders wrote:
> On Saturday, September 23, 2023 at 3:45:10 AM UTC-5, Janis Papanagnou wrote:
> [...]
>> Is it guaranteed that the expression evaluates correctly in all
>> awks? The intermediate results can become as large as 70866960411
>> (0x108000001B, which is larger than 32 bit), which needs to be
>> representable in awk (not sure what POSIX awk defines/requires
>> here).
>
> Huh? From my initial post in this thread:
>
> # modulo simulates 32-bit unsigned integer
> hash = (hash * 33 + ord(char)) % 2147483647
>
> Modulo wraps at 2147483647, it'll never outgrow
> it limits. Maybe I'm not grokking what you meant.

With this formula the hash variable may (because of the modulus)
get values up to 2147483647-1, so within the expression the value
may grow beyond that limit to (2147483647-1)*33+126 because of
the previous (potentially large) hash value calculated.

(There's a good chance that modern systems and languages calculate
that without problem, but if we just assume "32-bit integer" then
it will (typically implicitly) overflow and produce wrong results.
A comment in the code may be useful and is often sufficient here,
so that folks using/porting the code to their environment may have
a visible caveat and may check that. - Myself I haven't inspected
the POSIX specs on that, that's why I formulated it as question.)

>> [...]
>>
>> (I would also try to avoid name clashes like ord() and ord[],
>> BTW. Here it's simple, because the function is superfluous;
>> just replace the ord(char) at the calling side by ord[char]
>> and remove the function completely.)
>
> And yet, this rational assumes an extension to the language,
> which implies *only* gawk though right? [...]

Maybe I misunderstand you. My suggestion here is not relying on
extensions; it's basically just a change of the expression '#<<'

for(i = 1; i <= n; i++) {
char = substr(str, i, 1)
hash = (hash * 33 + ord[char]) % 2147483647 #<<
}

# function ord(char) ## deleted

# once in the BEGIN section (as you've done it)
for(i=32;i<=126;i++) ord[sprintf("%c", i)] = i

>
>> What will the code produce with, say, "extended" ASCII, or the
>> common ISO 8859-x family of character sets? Is there any reason
>> to restrict it here?
>>
>> What are the criteria for the chosen character subset? How will
>> or how should a TAB character in the data change the result?
>
> No issues with TAB (see my function ord()), & non 7bit character
> sets? Dunno... only need lower ASCII here, hence the hard-coded
> limit:
>
> min: 32, max: 126

I understand that. My question with TAB was aiming at what this
sample text should calculate

Hello<Blank>brave<Blank>new<Tab>World!<Newline>

Shall the Blank be part of the hash but the Tab not? - Appears to
me to be inconsistent, and if it is "as designed" it should have
an explicit and clear comment on that difference (and also that
(and maybe also why) e.g. [other] control characters [deliberately]
evaluate to 0).

>
> If so compelled, test other sets & post your findings here.

Not sure what that means - by "sets" you mean "code-sets"? - ...

> Would be interested to read what you come up with.

....if so then it would be simple; I'd include the whole range of
8-bit characters (including control characters), i.e. 0..255 to
cover all octet streams (and take also means to not exclude the
\n, or RS and FS in Awk parlance).

>> As I said, just a couple of more or less obvious questions.
>
> Thanks Janis, as always, appreciate your insights. Hoping to get
> back to using tin as my news reader soon, the google groups
> interface its perhaps not my cup of tea. :/

In your private environment at least you are free to choose. The
horror is when a company forces you to switch from Unix to Windows,
when all applications are getting Web based, and if you're not
allowed to install "local" tools.

Janis

Re: djb2 hash using awk

<uemoo7$q8bk$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1529&group=comp.lang.awk#1529

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sat, 23 Sep 2023 15:21:10 +0200
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <uemoo7$q8bk$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<uem8ij$nn2f$1@dont-email.me>
<22d88cf9-4566-42d3-ab0d-0aed262fc642n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Sep 2023 13:21:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ca1baf94a0b8b3955fd3a0c48ead1120";
logging-data="860532"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+NlL713IZ9O8rPh9dro4UG"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:/yHUyS7BWaL0y4wXKTUnY7M2oY4=
In-Reply-To: <22d88cf9-4566-42d3-ab0d-0aed262fc642n@googlegroups.com>
 by: Janis Papanagnou - Sat, 23 Sep 2023 13:21 UTC

On 23.09.2023 14:25, Michael Sanders wrote:
> On Saturday, September 23, 2023 at 3:45:10 AM UTC-5, Janis Papanagnou wrote:
>
>> (I would also try to avoid name clashes like ord() and ord[],
>> BTW. Here it's simple, because the function is superfluous;
>> just replace the ord(char) at the calling side by ord[char]
>> and remove the function completely.)
>
> One more quick question... does gawk have an ord() builtin
> function or not?

Not that I know of. - I seem to recall that when I once needed that
I implemented an array (as you did). - There's a chance that GNU Awk
has something in some extension libraries (but I'm too lazy to check).

Janis

Re: djb2 hash using awk

<8734z4innl.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1530&group=comp.lang.awk#1530

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sat, 23 Sep 2023 22:29:34 +0100
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <8734z4innl.fsf@bsb.me.uk>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="55b414175deac24d53d7dcd307fb4e06";
logging-data="1039639"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Iq+S0liVtK1aGE4VFrVxF65sETcMj+DA="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:usbMYRpLu7O1X7xvp8aX3+iPJZ0=
sha1:YqKyS0jOWji4rat5kp9qdlw7CAw=
X-BSB-Auth: 1.85a427e6a2ec387d0861.20230923222934BST.8734z4innl.fsf@bsb.me.uk
 by: Ben Bacarisse - Sat, 23 Sep 2023 21:29 UTC

Michael Sanders <chrome.tty@gmail.com> writes:

> Greetings folks, just sharing the knowledge (& test posting using
> google groups).
>
> Beware wordwrap...
>
> function djb2(str, hash, i, char) {
>
> # initialize hash to an arbitrary prime number
> hash = 5381
> n = length(str)
>
> # iterate over characters and compute hash
> for(i = 1; i <= n; i++) {
> char = substr(str, i, 1)
> # modulo simulates 32-bit unsigned integer
> hash = (hash * 33 + ord(char)) % 2147483647

This modulus does nor do what you claim you want to do (as has been
pointed out). To simulate 32-bit unsigned arithmetic you need

hash = (hash * 33 + ord(char)) % 4294967296

here. Mind you, there is no formal definition of what the DJB2 hash is.
The only reliable reference is a C function that uses unsigned long int
calculations. This could be 32-bits wide, but is often in wider.

> }
>
> return hash
> }
>
> function ord(char) {
>
> return (char ~ /[[:digit:]]/ ? int(char) : int(ord[char]))

What version of AWK, run with what flags, accepts this? I can't get it
past gawk, nawk, mawk nor original-awk.

> }
>
> BEGIN {
>
> for(i=32;i<=126;i++) ord[sprintf("%c", i)] = i
>
> print djb2("my key is...")
>
> }

--
Ben.

Re: djb2 hash using awk

<bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1531&group=comp.lang.awk#1531

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:ac8:5b87:0:b0:416:5f26:b8e0 with SMTP id a7-20020ac85b87000000b004165f26b8e0mr26043qta.0.1695522052081;
Sat, 23 Sep 2023 19:20:52 -0700 (PDT)
X-Received: by 2002:a05:6870:8c34:b0:1d6:9bd7:d849 with SMTP id
ec52-20020a0568708c3400b001d69bd7d849mr1374680oab.11.1695522051670; Sat, 23
Sep 2023 19:20:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 23 Sep 2023 19:20:51 -0700 (PDT)
In-Reply-To: <8734z4innl.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com> <8734z4innl.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sun, 24 Sep 2023 02:20:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 23
 by: Michael Sanders - Sun, 24 Sep 2023 02:20 UTC

On Saturday, September 23, 2023 at 4:29:40 PM UTC-5, Ben Bacarisse wrote:

> This modulus does nor do what you claim you want to do (as has been
> pointed out).

No sir, it does, in fact.

# modulo simulates 32-bit unsigned integer

https://en.wikipedia.org/wiki/2,147,483,647

https://www.bing.com/search?FORM=U523DF&PC=U523&q=is+this+2147483647+a+32-bit+number

https://www.google.com/search?q=is+this+2147483647+a+32-bit+number

> What version of AWK, run with what flags, accepts this? I can't get it
> past gawk, nawk, mawk nor original-awk.

https://gnuwin32.sourceforge.net/packages/mawk.htm

https://frippery.org/busybox/

Re: djb2 hash using awk

<3564ba3d-be5c-45d9-baab-e4f01af48d10n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1532&group=comp.lang.awk#1532

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:622a:164a:b0:412:2d47:701d with SMTP id y10-20020a05622a164a00b004122d47701dmr25908qtj.0.1695523842787;
Sat, 23 Sep 2023 19:50:42 -0700 (PDT)
X-Received: by 2002:a05:6808:2006:b0:3a9:b9eb:9990 with SMTP id
q6-20020a056808200600b003a9b9eb9990mr3429359oiw.0.1695523842384; Sat, 23 Sep
2023 19:50:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sat, 23 Sep 2023 19:50:41 -0700 (PDT)
In-Reply-To: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3564ba3d-be5c-45d9-baab-e4f01af48d10n@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sun, 24 Sep 2023 02:50:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1902
 by: Michael Sanders - Sun, 24 Sep 2023 02:50 UTC

On Friday, September 22, 2023 at 2:50:10 PM UTC-5, Michael Sanders wrote:

> Greetings folks, just sharing the knowledge (& test posting using google groups).

Note to self: built a version for gawk & tested online here:

https://ideone.com/ZZVQl2

script follows:

function djb2gawk(str, n, i, hash, ascii) {

hash = 5381
n = length(str)

for(i = 1; i <= n; ++i) {
char = substr(str, i, 1)
ascii = alphanum(char)
hash = (hash * 33 + ascii) % ((2^32) - 1)
}

return hash
}

function alphanum(char) {

return index("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", char)
}

BEGIN {
str = "comp.lang.awk"
print djb2gawk(str)
}

Re: djb2 hash using awk

<ueoeh4$17d4t$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1533&group=comp.lang.awk#1533

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sun, 24 Sep 2023 06:38:59 +0200
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <ueoeh4$17d4t$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk>
<bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Sep 2023 04:39:00 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="20fe7a765c5427ab0bac7301e974d9ce";
logging-data="1291421"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+RnvYiLld5x8i7SIryalxN"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:QgIzqdR9ab6jktTFBMoQdav8MjE=
In-Reply-To: <bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sun, 24 Sep 2023 04:38 UTC

On 24.09.2023 04:20, Michael Sanders wrote:
> On Saturday, September 23, 2023 at 4:29:40 PM UTC-5, Ben Bacarisse wrote:
>
>> This modulus does nor do what you claim you want to do (as has been
>> pointed out).
>
> No sir, it does, in fact.
>
> # modulo simulates 32-bit unsigned integer

For unsigned ints with a range that is implementable by binary words
x % 4294967296 (modulo) is equivalent to x & 4294967295 (bitwise-and),
x % 2147483648 (modulo) is equivalent to x & 2147483647 (bitwise-and).
(Mind the differing last digit of modulo and bitwise-and expressions!)
The expressions on the first line are for 32-bit unsigned integer and
the expressions on the second line are for 31-bit unsigned integer
calculations.

Your sample (the fact I pointed out in an earlier post and Ben repeated
here) uses x % 2147483647 which doesn't match a "32-bit simulation".
(Compare it with the four forms above and think about it.)

To add something new with this post I want to point out that a modulus
with primes (like 2147483647, a Mersenne prime) is regularly done e.g.
in random number generators or in cryptography. In these areas you may
have large numbers x and a comparable small prime modulus. Since you
don't have such large registers for the x you can iterate over parts
of the number sequentially (using a loop) to perform the operation.
This insight can also be applied in case of the issue you have with
your original code where internal overflows can arise; in case that
the prime number _2147483647_ was *intended* (as opposed to 31/32 bit
unsigned integer operations) you can avoid these computation errors
by that iterative method (here, in fact, a simple split of x in only
two parts would suffice for the "iteration").

Janis

Re: djb2 hash using awk

<714b8baa-d89c-497a-89c3-28493b9248fb@gmail.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1534&group=comp.lang.awk#1534

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mcollado2011@gmail.com (Manuel Collado)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sun, 24 Sep 2023 10:27:43 +0200
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <714b8baa-d89c-497a-89c3-28493b9248fb@gmail.com>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<uem8ij$nn2f$1@dont-email.me>
<22d88cf9-4566-42d3-ab0d-0aed262fc642n@googlegroups.com>
<uemoo7$q8bk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="5edd6494317ad3216efe20cf541279bb";
logging-data="1352889"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Eg4OkYr5Mn1d/eLC9JZXl3bSg9pdfPRk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:MGcqKVUnqnOds+QxBu+eefq1vHQ=
Content-Language: en-US, es-ES
In-Reply-To: <uemoo7$q8bk$1@dont-email.me>
 by: Manuel Collado - Sun, 24 Sep 2023 08:27 UTC

El 23/9/23 a las 15:21, Janis Papanagnou escribió:
> On 23.09.2023 14:25, Michael Sanders wrote:
>> On Saturday, September 23, 2023 at 3:45:10 AM UTC-5, Janis Papanagnou wrote:
>>
>> One more quick question... does gawk have an ord() builtin
>> function or not?
>
> Not that I know of. - I seem to recall that when I once needed that
> I implemented an array (as you did). - There's a chance that GNU Awk
> has something in some extension libraries (but I'm too lazy to check).

Yes, ord() and chr() are included in the gawk distribution as a dynamic
extension.

gawk -l ordchr 'BEGIN{print ord("x")}'
120

HTH
--
Manuel Collado - http://mcollado.z15.es

Re: djb2 hash using awk

<ed711e3e-142e-4cb7-b11c-40e2afce8d64n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1535&group=comp.lang.awk#1535

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:6214:1911:b0:655:dad8:378b with SMTP id er17-20020a056214191100b00655dad8378bmr27566qvb.9.1695551426810;
Sun, 24 Sep 2023 03:30:26 -0700 (PDT)
X-Received: by 2002:a05:6830:468e:b0:6bc:6658:2d3f with SMTP id
ay14-20020a056830468e00b006bc66582d3fmr1617235otb.1.1695551426622; Sun, 24
Sep 2023 03:30:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sun, 24 Sep 2023 03:30:26 -0700 (PDT)
In-Reply-To: <ueoeh4$17d4t$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk> <bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
<ueoeh4$17d4t$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ed711e3e-142e-4cb7-b11c-40e2afce8d64n@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sun, 24 Sep 2023 10:30:26 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 28
 by: Michael Sanders - Sun, 24 Sep 2023 10:30 UTC

On Saturday, September 23, 2023 at 11:39:04 PM UTC-5, Janis Papanagnou wrote:

> Your sample (the fact I pointed out in an earlier post and Ben repeated
> here) uses x % 2147483647 which doesn't match a "32-bit simulation".

That's not correct & there is is no error using 2147483647, it works just
as I intended. But elsewhere... I do think my choice of naming ord()
& ord[] were poor, as these may be intrinsic to some versions of
a given awk's namespace.

> (Mind the differing last digit of modulo and bitwise-and expressions!)

See the dbj2gawk() version I posted elsewhere in this thread. But of
course the older awks lack bitwise...

> Since you don't have such large registers...

Exactly, hence the signed int masquerading as an unsigned int.
We dont want to get sidetracked on that point, there are lots of
other 32-bit numbers that can be chosen in any event (& ought to
be to for security reasons). I took one that works with mawk,
busybox awk. Just think about this & you'll see what I mean.

> in case that the prime number _2147483647_ was *intended*...

Yes, intended.

Thanks Janis.

Re: djb2 hash using awk

<e76f0d00-2955-46cb-8963-0f7894af10cbn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1536&group=comp.lang.awk#1536

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:6214:440b:b0:65b:a1f:e338 with SMTP id oj11-20020a056214440b00b0065b0a1fe338mr4044qvb.4.1695551504951;
Sun, 24 Sep 2023 03:31:44 -0700 (PDT)
X-Received: by 2002:a05:6830:19a:b0:6ab:8d3:5209 with SMTP id
q26-20020a056830019a00b006ab08d35209mr1355089ota.5.1695551504655; Sun, 24 Sep
2023 03:31:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sun, 24 Sep 2023 03:31:44 -0700 (PDT)
In-Reply-To: <714b8baa-d89c-497a-89c3-28493b9248fb@gmail.com>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<uem8ij$nn2f$1@dont-email.me> <22d88cf9-4566-42d3-ab0d-0aed262fc642n@googlegroups.com>
<uemoo7$q8bk$1@dont-email.me> <714b8baa-d89c-497a-89c3-28493b9248fb@gmail.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e76f0d00-2955-46cb-8963-0f7894af10cbn@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sun, 24 Sep 2023 10:31:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1624
 by: Michael Sanders - Sun, 24 Sep 2023 10:31 UTC

On Sunday, September 24, 2023 at 3:27:47 AM UTC-5, Manuel Collado wrote:

Yes it does help. Earnest thanks Manuel!

> Yes, ord() and chr() are included in the gawk distribution as a dynamic
> extension.
>
> gawk -l ordchr 'BEGIN{print ord("x")}'
> 120
>
> HTH

Re: djb2 hash using awk

<uepdh3$1c7fk$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1537&group=comp.lang.awk#1537

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sun, 24 Sep 2023 15:28:02 +0200
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <uepdh3$1c7fk$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk>
<bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
<ueoeh4$17d4t$1@dont-email.me>
<ed711e3e-142e-4cb7-b11c-40e2afce8d64n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 24 Sep 2023 13:28:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="20fe7a765c5427ab0bac7301e974d9ce";
logging-data="1449460"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19z362bajU9obE1TlQiL3jQ"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:QLz73FYk+pK36+je4hNxdSHhb+8=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <ed711e3e-142e-4cb7-b11c-40e2afce8d64n@googlegroups.com>
 by: Janis Papanagnou - Sun, 24 Sep 2023 13:28 UTC

On 24.09.2023 12:30, Michael Sanders wrote:
> On Saturday, September 23, 2023 at 11:39:04 PM UTC-5, Janis Papanagnou wrote:
>
>> Your sample (the fact I pointed out in an earlier post and Ben repeated
>> here) uses x % 2147483647 which doesn't match a "32-bit simulation".
>
> That's not correct & there is is no error using 2147483647, it works just
> as I intended.

I cannot see how "it works" if it is not "working" for the valid
31-bit number 2147483647; the modulus 2147483647 % 2147483647 == 0
thus it doesn't "simulate" 31-bit arithmetic (and also not 32-bit
arithmetic) with that modulus. But maybe you just have a different
understanding when speaking about unsigned integer "simulation".

For an operation x % 2147483647 you are (as I previously mentioned,
similar as in some coding theory applications) doing an arithmetic
reduction (or "folding") of longer numbers to smaller ones. That's
fine per se, but you are not "simulating" unsigned integer math. If
you wanted to say that you are restricting computation and numeric
output to 31 bit values then it's okay. With the already mentioned
flaw that the expressions exceeds 31 and 32 bit in the expression,
so the math processor needs to support larger integers or FP math
with sufficient resolution in the mantissa. Which is another point
why your expression is not [reliably] working [correctly] within a
32 bit range. To illustrate let's assume that the expression will
(as shown to be possible) overflow the 32 bit, then the result will
effectively have implicitly undergone an implicit x % 2147483648
operation and then an explicit x % 2147483647 operation; together
res = x % 2147483648 % 2147483647. By this there is algorithmically
a discontinuity introduced which is almost always not desired.

> But elsewhere... I do think my choice of naming ord()
> & ord[] were poor, as these may be intrinsic to some versions of
> a given awk's namespace.
>
>> (Mind the differing last digit of modulo and bitwise-and expressions!)
>
> See the dbj2gawk() version I posted elsewhere in this thread. But of
> course the older awks lack bitwise...

Yes, that's why you'd use the modulus; but x % 2147483648 are
necessary to handle all 31 bit unsigned integer values and not
x % 2147483647.

Notwithstanding that the use the latter may be algorithmically
needed. Such algorithms may choose also other primes p in x % p.
But still note the undesired introduced discontinuity mentioned
above in case of the overflows within expressions.

Janis

Re: djb2 hash using awk

<uepe2h$1cads$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1538&group=comp.lang.awk#1538

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sun, 24 Sep 2023 15:37:20 +0200
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <uepe2h$1cads$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk>
<bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
<ueoeh4$17d4t$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 24 Sep 2023 13:37:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="20fe7a765c5427ab0bac7301e974d9ce";
logging-data="1452476"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18CBmvVBcbjwd1D5OueQqpY"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:N2STK1TESbePmZSIamodqU9PDGA=
In-Reply-To: <ueoeh4$17d4t$1@dont-email.me>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sun, 24 Sep 2023 13:37 UTC

On 24.09.2023 06:38, Janis Papanagnou wrote:
>
> To add something new with this post I want to point out that a modulus
> with primes (like 2147483647, a Mersenne prime) is regularly done e.g.
> in random number generators or in cryptography. In these areas you may
> have large numbers x and a comparable small prime modulus. Since you
> don't have such large registers for the x you can iterate over parts
> of the number sequentially (using a loop) to perform the operation. [...]

I just found on an old disk some Awk code where I used this method
to compute and check IBANs some years ago. Here's the diff of two
approaches, one using external multiple-precision arithmetic using
'bc' co-process (GNU Awk) and one working on chunks of the numeric
data string.

$ diff iban1 iban2
59a60,72
> function mod(n, s, l, z, m) {
> while (length(s) > 4) {
> z = substr(s, 1, 4)
> m = z % n
> s = m substr(s, 5)
> }
> return s % n
> }
>
> function mod97(s) {
> return mod(97, s)
> }
>
66,67c79
< print "98 - ( " iban " % 97 )" |& "bc"
< "bc" |& getline checksum
---
> checksum = 98 - mod97(iban)
72,73c84
< print iban " % 97 == 1" |& "bc"
< "bc" |& getline ok
---
> ok = mod97(iban) == 1

(Just for illustration purposes in case someone's interested in that
method.)

Janis

Re: djb2 hash using awk

<87r0mngy2n.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1539&group=comp.lang.awk#1539

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sun, 24 Sep 2023 20:39:44 +0100
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <87r0mngy2n.fsf@bsb.me.uk>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk>
<bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="35830dbda12fe6f2404008a3688ccf07";
logging-data="1577316"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vV8qVutYbUnXHWR/7AN39QWwg97PK7qc="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:AGEzsjFxtGrBMd+3wqt2xvGKaRs=
sha1:Zn8lL8lGjLMPASauWy2z7GrRU4I=
X-BSB-Auth: 1.0a2c19c5af7f0f7c5f66.20230924203944BST.87r0mngy2n.fsf@bsb.me.uk
 by: Ben Bacarisse - Sun, 24 Sep 2023 19:39 UTC

Michael Sanders <chrome.tty@gmail.com> writes:

> On Saturday, September 23, 2023 at 4:29:40 PM UTC-5, Ben Bacarisse wrote:
>
>> This modulus does nor do what you claim you want to do (as has been
>> pointed out).
>
> No sir, it does, in fact.

I can only repeat that it does not. A simple test shows that it does
not, so it is probable that you don't know what "simulates 32-bit
unsigned integer" means.

Positive integers are not "unsigned". 32-bit unsigned arithmetic has no
sign representation at all, and the operations are all done modulo
2**32.

Even if you interpreted "32-bit unsigned integers" to mean that the
range should be that of the positive (but signed) 32-bit integers, your
modulus is still wrong. For that, you would need to use 2147483648 as
the modulus, giving 0 to 2147483647 inclusive.

> # modulo simulates 32-bit unsigned integer
>
> https://en.wikipedia.org/wiki/2,147,483,647
>
> https://www.bing.com/search?FORM=U523DF&PC=U523&q=is+this+2147483647+a+32-bit+number
>
> https://www.google.com/search?q=is+this+2147483647+a+32-bit+number

What do you think these links show other than the obvious and
uncontested fact that the largest signed 32-bit number is a 32-bit
number?

>> What version of AWK, run with what flags, accepts this? I can't get it
>> past gawk, nawk, mawk nor original-awk.
>
> https://gnuwin32.sourceforge.net/packages/mawk.htm

What version of mawk is that? On my system:

$ mawk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
$ mawk -f hash.awk
mawk: hash.awk: line 19: syntax error at or near [
mawk: hash.awk: line 25: syntax error at or near [

Lines 19 and 25 have 'ord['... where ord has been previously defined as
a function.

> https://frippery.org/busybox/

Ah, yes, busybox awk accepts it. I wonder why? It's definitely an
outlier. Don't use ord as the name of a function and of an array.

--
Ben.

Re: djb2 hash using awk

<87lecvgxt1.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1540&group=comp.lang.awk#1540

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sun, 24 Sep 2023 20:45:30 +0100
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <87lecvgxt1.fsf@bsb.me.uk>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<uem8ij$nn2f$1@dont-email.me>
<22d88cf9-4566-42d3-ab0d-0aed262fc642n@googlegroups.com>
<uemoo7$q8bk$1@dont-email.me>
<714b8baa-d89c-497a-89c3-28493b9248fb@gmail.com>
<e76f0d00-2955-46cb-8963-0f7894af10cbn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="35830dbda12fe6f2404008a3688ccf07";
logging-data="1577316"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ukqIHttR4MllI0Y+wB+61dTw/r5ob3Ds="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:JdZQLD17tm0fgmzX8l82EMSgB94=
sha1:GPUmtk8+G/pZzeo5Ikarm+GE7Tw=
X-BSB-Auth: 1.7978f137072c0b8358c9.20230924204530BST.87lecvgxt1.fsf@bsb.me.uk
 by: Ben Bacarisse - Sun, 24 Sep 2023 19:45 UTC

Michael Sanders <chrome.tty@gmail.com> writes:

> On Sunday, September 24, 2023 at 3:27:47 AM UTC-5, Manuel Collado wrote:
>
> Yes it does help. Earnest thanks Manuel!

Note that the gawk library extension 'ord' will not do what your ord
function does, particularly with digits. But treating digits
differently to other characters is not what DJB2 does, so the library
extension ord will fix a bug.

--
Ben.

Re: djb2 hash using awk

<9f8d5437-e1e5-40bb-bf98-1b9940984cb9n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1541&group=comp.lang.awk#1541

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:620a:8181:b0:773:f2a0:fda5 with SMTP id ot1-20020a05620a818100b00773f2a0fda5mr34191qkn.4.1695590251205;
Sun, 24 Sep 2023 14:17:31 -0700 (PDT)
X-Received: by 2002:a05:6808:200c:b0:3a9:d030:5023 with SMTP id
q12-20020a056808200c00b003a9d0305023mr3049786oiw.3.1695590250999; Sun, 24 Sep
2023 14:17:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Sun, 24 Sep 2023 14:17:30 -0700 (PDT)
In-Reply-To: <87r0mngy2n.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=38.172.116.217; posting-account=aT9GTAoAAABt4sMQrwgB4gcMvkAfx1kx
NNTP-Posting-Host: 38.172.116.217
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk> <bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
<87r0mngy2n.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9f8d5437-e1e5-40bb-bf98-1b9940984cb9n@googlegroups.com>
Subject: Re: djb2 hash using awk
From: chrome.tty@gmail.com (Michael Sanders)
Injection-Date: Sun, 24 Sep 2023 21:17:31 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2109
 by: Michael Sanders - Sun, 24 Sep 2023 21:17 UTC

On Sunday, September 24, 2023 at 2:39:49 PM UTC-5, Ben Bacarisse wrote:

[...]

Ben, listen the number wraps (or folds as stated elsewhere)
to zero at its max using modulo. Thats obvious. I, you, Janis
(& likely others) see that.

Now the thing is, the comment I placed in the script was the
word 'simulates', if you want to fritter away your time debating
that, have at it. Me? Too busy to deal with the snarky attitude.

The scripts I posted, both dbj2() & dbj2gawk() work fine & correctly
on my end & it seems rich to me that you cant get them to run but
otherwise know that I'm wrong. Hmmm... None of this is the point,
we simply want a large number (32bit - 1 as is the case) to lesson
the chance of collisions in the hash & that's it... I cant believe
you & Janis cant see that for what it is. Good grief, I should've known...

Re: djb2 hash using awk

<ueqe6u$1i163$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1542&group=comp.lang.awk#1542

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Mon, 25 Sep 2023 00:45:50 +0200
Organization: A noiseless patient Spider
Lines: 6
Message-ID: <ueqe6u$1i163$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk>
<bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
<87r0mngy2n.fsf@bsb.me.uk>
<9f8d5437-e1e5-40bb-bf98-1b9940984cb9n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 24 Sep 2023 22:45:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9815563ed436bf900b0e0de1a29957e6";
logging-data="1639619"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19UKT2+z9P2wofjOn/WPktZ"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:x+5ItLcPBnIu3H9SLkOwv0fYF5U=
In-Reply-To: <9f8d5437-e1e5-40bb-bf98-1b9940984cb9n@googlegroups.com>
 by: Janis Papanagnou - Sun, 24 Sep 2023 22:45 UTC

On 24.09.2023 23:17, Michael Sanders wrote:
> [...]

You are right, we all wasted our time.

Re: djb2 hash using awk

<ueqh6h$1aij4$1@news.xmission.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1543&group=comp.lang.awk#1543

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gazelle@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Sun, 24 Sep 2023 23:36:49 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <ueqh6h$1aij4$1@news.xmission.com>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com> <87r0mngy2n.fsf@bsb.me.uk> <9f8d5437-e1e5-40bb-bf98-1b9940984cb9n@googlegroups.com> <ueqe6u$1i163$1@dont-email.me>
Injection-Date: Sun, 24 Sep 2023 23:36:49 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="1395300"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Sun, 24 Sep 2023 23:36 UTC

In article <ueqe6u$1i163$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>On 24.09.2023 23:17, Michael Sanders wrote:
>> [...]
>
>You are right, we all wasted our time.
>

And you've got no one to blame but yourself.

You & Ben.

It was pretty clear OP wasn't interested in the usual "human compiler"
treatment, but you guys went ahead with it anyway.

--
The book "1984" used to be a cautionary tale;
Now it is a "how-to" manual.

Re: djb2 hash using awk

<ueqkpb$1j3ch$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1544&group=comp.lang.awk#1544

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Mon, 25 Sep 2023 02:38:02 +0200
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <ueqkpb$1j3ch$1@dont-email.me>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<87r0mngy2n.fsf@bsb.me.uk>
<9f8d5437-e1e5-40bb-bf98-1b9940984cb9n@googlegroups.com>
<ueqe6u$1i163$1@dont-email.me> <ueqh6h$1aij4$1@news.xmission.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 25 Sep 2023 00:38:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9815563ed436bf900b0e0de1a29957e6";
logging-data="1674641"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19lCpghJ5FYpRm1mSvbMMZ1"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:fFlUCFWVtucnyZxDLerps3KBmms=
In-Reply-To: <ueqh6h$1aij4$1@news.xmission.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Mon, 25 Sep 2023 00:38 UTC

On 25.09.2023 01:36, Kenny McCormack wrote:
> In article <ueqe6u$1i163$1@dont-email.me>,
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> On 24.09.2023 23:17, Michael Sanders wrote:
>>> [...]
>>
>> You are right, we all wasted our time.
>>
>
> And you've got no one to blame but yourself.

Sure.

>
> You & Ben.
>
> It was pretty clear OP wasn't interested in the usual "human compiler"
> treatment, but you guys went ahead with it anyway.
>

The OP initially said about his interest it's "sharing the knowledge".
Anyone tell us what that would be in this thread. - Rhetorical, don't
bother.

He could as well have contributed BEGIN { print 42 } . "It's working."
And it's not rising all the questions the OP's code did and still does.

And (with your contribution and my reply) we continue wasting our time,
don't we, Kenny?

Re: djb2 hash using awk

<87edinghym.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1545&group=comp.lang.awk#1545

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.awk
Subject: Re: djb2 hash using awk
Date: Mon, 25 Sep 2023 02:27:45 +0100
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <87edinghym.fsf@bsb.me.uk>
References: <69a6c6d6-624b-47a9-8119-873dffdcb161n@googlegroups.com>
<8734z4innl.fsf@bsb.me.uk>
<bd073233-0b83-43fc-a415-2fd2f02e7da0n@googlegroups.com>
<87r0mngy2n.fsf@bsb.me.uk>
<9f8d5437-e1e5-40bb-bf98-1b9940984cb9n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="89473c2f9de4172bd4ed25b23f610392";
logging-data="1812973"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/MAgyDvCGCGVnquARDehUwdqXP+Yf0UTc="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:l5cjz5Bch4ErY7GBmxTa18VcJKQ=
sha1:O+nQUIPqBn+GCeQMOyYdYCyhhIQ=
X-BSB-Auth: 1.3a220e4600c3d91609f3.20230925022745BST.87edinghym.fsf@bsb.me.uk
 by: Ben Bacarisse - Mon, 25 Sep 2023 01:27 UTC

Michael Sanders <chrome.tty@gmail.com> writes:

> On Sunday, September 24, 2023 at 2:39:49 PM UTC-5, Ben Bacarisse wrote:
>
> [...]
>
> Ben, listen the number wraps (or folds as stated elsewhere)
> to zero at its max using modulo. Thats obvious. I, you, Janis
> (& likely others) see that.

Indeed we do all know that.

> Now the thing is, the comment I placed in the script was the
> word 'simulates', if you want to fritter away your time debating
> that, have at it. Me? Too busy to deal with the snarky attitude.

No, I have no interest in debating it. I thought you might want to know
how to implement the behaviour described by the comment you wrote,
especially as that behaviour was a key part of how the original DJB hash
was defined.

> The scripts I posted, both dbj2() & dbj2gawk() work fine & correctly
> on my end & it seems rich to me that you cant get them to run but
> otherwise know that I'm wrong.

Of course I can get your code to run. It's trivial to fix the bug of
using ord for two incompatible purposes, but I find it rich that you
think it at all odd that someone could know your code is wrong without
running it. That's what programmers do every day.

> Hmmm... None of this is the point,
> we simply want a large number

Both the subject line of the thread, and the links you gave in the
original post, suggest you wanted to implement a specific hash function:
DJB's 32-bit "multiply by 33" hash. But your code does not do that.
For example, the DJB hash of the string "abcdef" is 4048079738, but your
code gives 1900599327. Your hash can go no higher than 2**31 - 2, but
the original one uses the full range of 32-bit unsigned int.

> (32bit - 1 as is the case) to lesson
> the chance of collisions in the hash & that's it...

This is why halving the range by using only 31 bits is not a good idea.
It's not a /terrible/ idea because 31 bits is still a huge range, but
since 32-bit wrapping is free on most hardware, there is absolutely no
upside to using modulo 2147483647 arithmetic for a hash.

And your saying "32bit - 1 as is the case" means you still don't know
what the code you wrote is really doing.

> I cant believe
> you & Janis cant see that for what it is. Good grief, I should've
> known...

Well now you know!

1) DJB's has hash usually uses 32-bit unsigned arithmetic. (Very old
implementation might use 16-bit arithmetic, and some newer ones might
use 64 bits.)

2) You can do that in many modern awks with modulo 4294967296
arithmetic.

3) In gawk you could use its logical 'and' function along with its
hexadecimal numbers to make it very clear what's going on

hash = and(hash * 33 + ord(c), 0xFFFFFFFF)

--
Ben.

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor