Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Aren't you glad you're not getting all the government you pay for now?


devel / comp.lang.awk / Re: [gawk] FP precision

SubjectAuthor
* [gawk] FP precisionJanis Papanagnou
`* Re: [gawk] FP precisionKeith Thompson
 `* Re: [gawk] FP precisionJanis Papanagnou
  `* Re: [gawk] FP precisionKeith Thompson
   `- Re: [gawk] FP precisionKaz Kylheku

1
[gawk] FP precision

<tqdog3$20o01$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1358&group=comp.lang.awk#1358

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: [gawk] FP precision
Date: Fri, 20 Jan 2023 10:56:19 +0100
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <tqdog3$20o01$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 20 Jan 2023 09:56:19 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="6d6e352c778efe51d2083ab8b7093e77";
logging-data="2121729"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/f14eNoXqlr4kDHg7JzQAw"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:B4C8XkM7MZHeXMTBGAHJCKyCBfo=
X-Enigmail-Draft-Status: N1110
X-Mozilla-News-Host: news://news.eternal-september.org:119
 by: Janis Papanagnou - Fri, 20 Jan 2023 09:56 UTC

In an article about "AWK As A Major Systems Programming Language"[*]
(in chapter 5.3 "Future Work") we can read:
"Some issues are known and may not be resolvable. For example,
64-bit integer values such as the timestamps in stat() data on
modern systems don’t fit into awk’s 64-bit double-precision
numbers which only have 53 bits of significand. This is also a
problem for the bit-manipulation functions."
I was a bit astonished to read that; I thought that IEEE 80-bit FP
(with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
or, generally not in applications?

This (and other answers) on a SO post[**] may address the question:
"That is, you may have 32-bit or 64-bit variables, but when they
are loaded into the FPU registers, they are converted to 80 bit;
the FPU then (by default) performs all calculations in 80 but;
after the calculation, the result is stored back into a 32-bit
or 64-bit variables."
So it's standard only in FPUs and losses are accepted when passing
values from FPUs to memory entities (presumably for performance
reasons)?

Janis

[*] http://www.skeeve.com/awk-sys-prog.html

[**]
https://stackoverflow.com/questions/612507/what-are-the-applications-benefits-of-an-80-bit-extended-precision-data-type

Re: [gawk] FP precision

<87cz79jdnf.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1360&group=comp.lang.awk#1360

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: [gawk] FP precision
Date: Fri, 20 Jan 2023 09:53:40 -0800
Organization: None to speak of
Lines: 30
Message-ID: <87cz79jdnf.fsf@nosuchdomain.example.com>
References: <tqdog3$20o01$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader01.eternal-september.org; posting-host="356714a9dd42e22ebdc83653058250ea";
logging-data="2288052"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/B8/n8D5qIRzfFK1hmBgWo"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:FMF2mmhmoBrfWiGQKanmJM0YF4I=
sha1:OPe/nCI0Cohr0/IS/XANOcCClEg=
 by: Keith Thompson - Fri, 20 Jan 2023 17:53 UTC

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
> In an article about "AWK As A Major Systems Programming Language"[*]
> (in chapter 5.3 "Future Work") we can read:
> "Some issues are known and may not be resolvable. For example,
> 64-bit integer values such as the timestamps in stat() data on
> modern systems don’t fit into awk’s 64-bit double-precision
> numbers which only have 53 bits of significand. This is also a
> problem for the bit-manipulation functions."
> I was a bit astonished to read that; I thought that IEEE 80-bit FP
> (with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
> or, generally not in applications?

True -- but a 64-bit time_t value can be stored in a 64-bit IEEE double
without loss of information as long as it's not too big. The smallest
positive integer that can't be represented exactly in a 64-bit IEEE
double is 2**53+1; as a time_t. That's around 285 billion years in the
future.

Other 64-bit integer values can be a problem. Large values will lose
their low-order bits.

$ gawk 'BEGIN{print(2**53-1); print(2**53); print(2**53+1)}'
9007199254740991
9007199254740992
9007199254740992

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Re: [gawk] FP precision

<tqeo3a$26eld$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1361&group=comp.lang.awk#1361

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: [gawk] FP precision
Date: Fri, 20 Jan 2023 19:55:37 +0100
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <tqeo3a$26eld$1@dont-email.me>
References: <tqdog3$20o01$1@dont-email.me>
<87cz79jdnf.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 20 Jan 2023 18:55:38 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="9fa005cd61d3716316ce1a368e3eed67";
logging-data="2308781"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QAzb5cInSaeoIZ//Wwz0U"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:9vo+VaBN2ufyKDVL1xk16Dq2DGY=
In-Reply-To: <87cz79jdnf.fsf@nosuchdomain.example.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 20 Jan 2023 18:55 UTC

On 20.01.2023 18:53, Keith Thompson wrote:
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>> In an article about "AWK As A Major Systems Programming Language"[*]
>> (in chapter 5.3 "Future Work") we can read:
>> "Some issues are known and may not be resolvable. For example,
>> 64-bit integer values such as the timestamps in stat() data on
>> modern systems don’t fit into awk’s 64-bit double-precision
>> numbers which only have 53 bits of significand. This is also a
>> problem for the bit-manipulation functions."
>> I was a bit astonished to read that; I thought that IEEE 80-bit FP
>> (with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
>> or, generally not in applications?
>
> True -- but a 64-bit time_t value can be stored in a 64-bit IEEE double
> without loss of information as long as it's not too big.

Yes. - My point was that a [standard] 80 bit FP number would have a
64 bit mantissa that allows lossless storage (in the mantissa) while
simply ignoring the signs/exponent parts; for gawk 64 bit operations
and 64 bit time_t. If implementation [of gawk] would technically use
an 80 bit "carrier" (instead of a 64 bit "long integer") the issues
mentioned might not be an issue. Or are you saying that the "problem"
mentioned by the article is just a gawk implementation issue to not
use the "64 bit carrier" sophisticatedly (and instead unnecessarily
try to use only the 56 bit mantissa or the long integer)?

> The smallest
> positive integer that can't be represented exactly in a 64-bit IEEE
> double is 2**53+1; as a time_t. That's around 285 billion years in the
> future.
>
> [...]

Janis

Re: [gawk] FP precision

<87tu0lhu56.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1362&group=comp.lang.awk#1362

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.awk
Subject: Re: [gawk] FP precision
Date: Fri, 20 Jan 2023 11:40:21 -0800
Organization: None to speak of
Lines: 54
Message-ID: <87tu0lhu56.fsf@nosuchdomain.example.com>
References: <tqdog3$20o01$1@dont-email.me>
<87cz79jdnf.fsf@nosuchdomain.example.com>
<tqeo3a$26eld$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader01.eternal-september.org; posting-host="356714a9dd42e22ebdc83653058250ea";
logging-data="2315865"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19RzYmYPWVbn5defT6zFqdB"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:gTiAZe7s1GZys0FzPV+itQTSZE0=
sha1:/jqNGdn/48/f/myK8gHlQEQBfDw=
 by: Keith Thompson - Fri, 20 Jan 2023 19:40 UTC

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
> On 20.01.2023 18:53, Keith Thompson wrote:
>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>> In an article about "AWK As A Major Systems Programming Language"[*]
>>> (in chapter 5.3 "Future Work") we can read:
>>> "Some issues are known and may not be resolvable. For example,
>>> 64-bit integer values such as the timestamps in stat() data on
>>> modern systems don’t fit into awk’s 64-bit double-precision
>>> numbers which only have 53 bits of significand. This is also a
>>> problem for the bit-manipulation functions."
>>> I was a bit astonished to read that; I thought that IEEE 80-bit FP
>>> (with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
>>> or, generally not in applications?
>>
>> True -- but a 64-bit time_t value can be stored in a 64-bit IEEE double
>> without loss of information as long as it's not too big.
>
> Yes. - My point was that a [standard] 80 bit FP number would have a
> 64 bit mantissa that allows lossless storage (in the mantissa) while
> simply ignoring the signs/exponent parts; for gawk 64 bit operations
> and 64 bit time_t. If implementation [of gawk] would technically use
> an 80 bit "carrier" (instead of a 64 bit "long integer") the issues
> mentioned might not be an issue. Or are you saying that the "problem"
> mentioned by the article is just a gawk implementation issue to not
> use the "64 bit carrier" sophisticatedly (and instead unnecessarily
> try to use only the 56 bit mantissa or the long integer)?
>
>> The smallest
>> positive integer that can't be represented exactly in a 64-bit IEEE
>> double is 2**53+1; as a time_t. That's around 285 billion years in the
>> future.
>>
>> [...]

Different implementations of awk might use different floating-point
representations on different platforms. I don't think the
characteristics of floating-point are defined by the language.
(Is there even a formal language definition?)

There aren't many systems these days that don't use IEEE floating-point,
but awk could probably be supported on such systems. There's a VMS port
of gawk, and VAX floating-point probably uses a different mantissa size.

My point, I guess, is that typical awk implementations store numbers in
64-bit IEEE floating-point, which means they can't store full 64-bit
integers (and can lose precision silently) -- but time_t values are
probably not the best illustation of that issue, because while time_t is
typically a signed 64-bit integer, most time_t values fit in 32 bits (33
starting in 2038, which still won't be a problem for awk).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Re: [gawk] FP precision

<20230121002050.427@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1363&group=comp.lang.awk#1363

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.awk
Subject: Re: [gawk] FP precision
Date: Sat, 21 Jan 2023 08:33:35 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <20230121002050.427@kylheku.com>
References: <tqdog3$20o01$1@dont-email.me>
<87cz79jdnf.fsf@nosuchdomain.example.com> <tqeo3a$26eld$1@dont-email.me>
<87tu0lhu56.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 21 Jan 2023 08:33:35 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="1ef615aa6043aaf8fc3b1a68c2965024";
logging-data="2656289"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19LoKj5GVYiSowhKtRsgE6PB/li5AUkQeI="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:fBmDTrIp8d3uoirkZ6d0nI+/3mM=
 by: Kaz Kylheku - Sat, 21 Jan 2023 08:33 UTC

On 2023-01-20, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Different implementations of awk might use different floating-point
> representations on different platforms. I don't think the
> characteristics of floating-point are defined by the language.
> (Is there even a formal language definition?)

In defining Awk, the POSIX spec defers a lot of details to C,
which seems like handwaving, but actually pins things down.

E.g. in the area of numeric constants:

The token NUMBER shall represent a numeric constant. Its form and
numeric value shall either be equivalent to the
decimal-floating-constant token as specified by the ISO C standard, or
it shall be a sequence of decimal digits and shall be evaluated as an
integer constant in decimal. In addition, implementations may accept
numeric constants with the form and numeric value equivalent to the
hexadecimal-constant and hexadecimal-floating-constant tokens as
specified by the ISO C standard.

If the value is too large or too small to be representable (see
Concepts Derived from the ISO C Standard), the behavior is undefined.

This "Concepts Derived from the ISO C Standard" section is a general
one, outside of the Awk chapter.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_01_02

It's a large section, whose earliest paragraphs are relevant to Awk
numerics.

1.1.2 Concepts Derived from the ISO C Standard

Some of the standard utilities perform complex data manipulation using
their own procedure and arithmetic languages, as defined in their
EXTENDED DESCRIPTION or OPERANDS sections. Unless otherwise noted, the
arithmetic and semantic concepts (precision, type conversion, control
flow, and so on) shall be equivalent to those defined in the ISO C
standard, as described in the following sections. Note that there is no
requirement that the standard utilities be implemented in any particular
programming language.

Arithmetic Precision and Operations

Integer variables and constants, including the values of operands and
option-arguments, used by the standard utilities listed in this volume
of POSIX.1-2017 shall be implemented as equivalent to the ISO C standard
signed long data type; floating point shall be implemented as equivalent
to the ISO C standard double type. Conversions between types shall be as
described in the ISO C standard. All variables shall be initialized to
zero if they are not otherwise assigned by the input to the application.

[ ... ]

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor