Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
 If you find that it is broken please let me know here rocksolid.nodes.help


devel / comp.lang.c / Re: A Famous Security Bug

SubjectAuthor
* A Famous Security BugStefan Ram
+* Re: A Famous Security BugKaz Kylheku
|+* Re: A Famous Security BugScott Lurndal
||`* Re: A Famous Security BugKeith Thompson
|| `- Re: A Famous Security BugKeith Thompson
|+* Re: A Famous Security BugDavid Brown
||`* Re: A Famous Security BugKaz Kylheku
|| +* Re: A Famous Security BugChris M. Thomasson
|| |`* Re: A Famous Security BugScott Lurndal
|| | `* Re: A Famous Security BugChris M. Thomasson
|| |  `* Re: A Famous Security BugScott Lurndal
|| |   `* Re: A Famous Security BugChris M. Thomasson
|| |    `- Re: A Famous Security BugChris M. Thomasson
|| +* Re: A Famous Security BugKeith Thompson
|| |+* Re: A Famous Security BugKaz Kylheku
|| ||+* Re: A Famous Security BugKeith Thompson
|| |||`* Re: A Famous Security BugKaz Kylheku
|| ||| +* Re: A Famous Security BugJames Kuyper
|| ||| |`- Re: A Famous Security BugKaz Kylheku
|| ||| +- Re: A Famous Security BugDavid Brown
|| ||| `* Re: A Famous Security BugKeith Thompson
|| |||  `* Re: A Famous Security BugKaz Kylheku
|| |||   `* Re: A Famous Security BugDavid Brown
|| |||    `* Re: A Famous Security BugKaz Kylheku
|| |||     +* Re: A Famous Security BugDavid Brown
|| |||     |`- Re: A Famous Security BugKaz Kylheku
|| |||     `* Re: A Famous Security BugJames Kuyper
|| |||      `* Re: A Famous Security BugKaz Kylheku
|| |||       `* Re: A Famous Security BugDavid Brown
|| |||        `* Re: A Famous Security BugKaz Kylheku
|| |||         +* Re: A Famous Security BugDavid Brown
|| |||         |`* Re: A Famous Security BugKaz Kylheku
|| |||         | `- Re: A Famous Security BugDavid Brown
|| |||         `- Re: A Famous Security BugChris M. Thomasson
|| ||+- Re: A Famous Security BugJames Kuyper
|| ||`* Re: A Famous Security BugDavid Brown
|| || `* Re: A Famous Security BugKaz Kylheku
|| ||  `- Re: A Famous Security BugDavid Brown
|| |`* Re: A Famous Security BugJames Kuyper
|| | `* Re: A Famous Security BugKaz Kylheku
|| |  `- Re: A Famous Security BugJames Kuyper
|| `- Re: A Famous Security BugDavid Brown
|`* Re: A Famous Security BugAnton Shepelev
| +- Re: A Famous Security BugKeith Thompson
| +* Re: A Famous Security BugKaz Kylheku
| |+* Re: A Famous Security BugDavid Brown
| ||`* Re: A Famous Security BugKaz Kylheku
| || +- Re: A Famous Security BugJames Kuyper
| || `* Re: A Famous Security BugDavid Brown
| ||  `* Re: A Famous Security BugRichard Kettlewell
| ||   +- Re: A Famous Security BugKaz Kylheku
| ||   +* Re: A Famous Security BugDavid Brown
| ||   |`- Re: A Famous Security BugKaz Kylheku
| ||   `* Re: A Famous Security BugTim Rentsch
| ||    `* Re: A Famous Security BugMalcolm McLean
| ||     `* Re: A Famous Security BugTim Rentsch
| ||      +- Re: A Famous Security BugDavid Brown
| ||      `- Re: A Famous Security BugKeith Thompson
| |`* Re: A Famous Security BugAnton Shepelev
| | `- Re: A Famous Security BugScott Lurndal
| +- Re: A Famous Security BugTim Rentsch
| `* Re: A Famous Security BugJames Kuyper
|  `* Re: A Famous Security Bugbart
|   +* Re: A Famous Security BugKeith Thompson
|   |`* Re: A Famous Security BugKaz Kylheku
|   | `* Re: A Famous Security BugDavid Brown
|   |  +- Re: A Famous Security BugScott Lurndal
|   |  `* Re: A Famous Security Bugbart
|   |   `- Re: A Famous Security BugDavid Brown
|   `* Re: A Famous Security BugJames Kuyper
|    `* Re: A Famous Security Bugbart
|     +* Re: A Famous Security BugDavid Brown
|     |`* Re: A Famous Security Bugbart
|     | +* Re: A Famous Security BugDavid Brown
|     | |`* Re: A Famous Security Bugbart
|     | | +* Re: A Famous Security BugKeith Thompson
|     | | |+- Re: A Famous Security BugDavid Brown
|     | | |+* Re: A Famous Security BugMichael S
|     | | ||+- Re: A Famous Security BugDavid Brown
|     | | ||`- Re: A Famous Security BugKeith Thompson
|     | | |`* Re: A Famous Security Bugbart
|     | | | `* Re: A Famous Security BugMichael S
|     | | |  +* Re: A Famous Security Bugbart
|     | | |  |+* Re: A Famous Security BugDavid Brown
|     | | |  ||`* Re: A Famous Security BugMalcolm McLean
|     | | |  || `- Re: A Famous Security BugMichael S
|     | | |  |`- Re: A Famous Security BugScott Lurndal
|     | | |  `* Re: A Famous Security BugDavid Brown
|     | | |   `- Re: A Famous Security BugScott Lurndal
|     | | `* Re: A Famous Security BugDavid Brown
|     | |  `* Re: A Famous Security BugMichael S
|     | |   `* Re: A Famous Security BugDavid Brown
|     | |    +* Re: A Famous Security BugMichael S
|     | |    |+- Re: A Famous Security BugDavid Brown
|     | |    |`- Re: A Famous Security Bugbart
|     | |    `* Re: A Famous Security Bugbart
|     | |     +* Re: A Famous Security BugMichael S
|     | |     |`* Re: A Famous Security Bugbart
|     | |     | +* Re: A Famous Security BugDavid Brown
|     | |     | |`- Re: A Famous Security BugScott Lurndal
|     | |     | `* Re: A Famous Security BugMichael S
|     | |     `- Re: A Famous Security BugDavid Brown
|     | `* Re: A Famous Security BugMichael S
|     +- Re: A Famous Security BugTim Rentsch
|     +- Re: A Famous Security BugMichael S
|     +* Re: A Famous Security BugMichael S
|     `- Re: A Famous Security BugJames Kuyper
+- Re: A Famous Security BugJoerg Mertens
+* Re: A Famous Security BugChris M. Thomasson
`* Re: A Famous Security BugStefan Ram

Pages:123456
Re: A Famous Security Bug

<20240322083037.20@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36071&group=comp.lang.c#36071

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 15:33:03 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <20240322083037.20@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me>
Injection-Date: Fri, 22 Mar 2024 15:33:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
logging-data="3158152"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+IMgzCWWuLARqlPrqSyangIsldyoYragw="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:z8nv+VqU62+Y+H2av7H2TlqdbWY=
 by: Kaz Kylheku - Fri, 22 Mar 2024 15:33 UTC

On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
> On 21/03/2024 21:21, Kaz Kylheku wrote:
>
>> Eliminating dead stores is a very basic dataflow-driven optimization.
>>
>> Because memset is part of the C language, the compiler knows
>> exactly what effect it has (that it's equivalent to setting
>> all the bytes to zero, like a sequence of assignments).
>>
>
> Yes.
>
>> If you don't want a call to be optimized away, call your
>> own function in another translation unit.
>
> No.
>
> There are several ways that guarantee your code will carry out the
> writes here (though none that guarantee the secret data is not also
> stored elsewhere). Using a function in a different TU is not one of
> these techniques. You do people a disfavour by recommending it.

It demonstrably is.

>> (And don't turn
>> on nonconforming cross-translation-unit optimizations.)
>>
>
> If I knew of any non-conforming cross-translation-unit optimisations in
> a compiler, I would avoid using them until the compiler vendor had fixed
> the bug in question.

They are not fixable. Translation units are separate, subject
to separate semantic analysis, which is settled prior to linkage.

The semantic analysis of one translation unit must be carried out in the
absence of any information about what is in another translation unit.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<20240322083648.539@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36072&group=comp.lang.c#36072

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 15:50:00 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 116
Message-ID: <20240322083648.539@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
Injection-Date: Fri, 22 Mar 2024 15:50:00 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
logging-data="3168614"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Ek98d4QYVvUzBSD+Nha26A+GmNGKJEdw="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:LcEymcPU7Hdn/lZc2Z3bGpOMMRQ=
 by: Kaz Kylheku - Fri, 22 Mar 2024 15:50 UTC

On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>> A "famous security bug":
>>>>>
>>>>> void f( void )
>>>>> { char buffer[ MAX ];
>>>>> /* . . . */
>>>>> memset( buffer, 0, sizeof( buffer )); }
>>>>>
>>>>> . Can you see what the bug is?
>>>>
>>>> I don't know about "the bug", but conditions can be identified under
>>>> which that would have a problem executing, like MAX being in excess
>>>> of available automatic storage.
>>>>
>>>> If the /*...*/ comment represents the elision of some security sensitive
>>>> code, where the memset is intended to obliterate secret information,
>>>> of course, that obliteration is not required to work.
>>>>
>>>> After the memset, the buffer has no next use, so the all the assignments
>>>> performed by memset to the bytes of buffer are dead assignments that can
>>>> be elided.
>>>>
>>>> To securely clear memory, you have to use a function for that purpose
>>>> that is not susceptible to optimization.
>>>>
>>>> If you're not doing anything stupid, like link time optimization, an
>>>> external function in another translation unit (a function that the
>>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>>> ought to suffice.
>>>
>>> Using LTO is not "stupid". Relying on people /not/ using LTO, or not
>>> using other valid optimisations, is "stupid".
>>
>> LTO is a nonconforming optimization. It destroys the concept that
>> when a translation unit is translated, the semantic analysis is
>> complete, such that the only remaining activity is resolution of
>> external references (linkage), and that the semantic analysis of one
>> translation unit deos not use information about another translation
>> unit.
>>
>> This has not yet changed in last April's N3096 draft, where
>> translation phases 7 and 8 are:
>>
>> 7. White-space characters separating tokens are no longer significant.
>> Each preprocessing token is converted into a token. The resulting
>> tokens are syntactically and semantically analyzed and translated
>> as a translation unit.
>>
>> 8. All external object and function references are resolved. Library
>> components are linked to satisfy external references to functions
>> and objects not defined in the current translation. All such
>> translator output is collected into a program image which contains
>> information needed for execution in its execution environment.
>>
>> and before that, the Program Structure section says:
>>
>> The separate translation units of a program communicate by (for
>> example) calls to functions whose identifiers have external linkage,
>> manipulation of objects whose identifiers have external linkage, or
>> manipulation of data files. Translation units may be separately
>> translated and then later linked to produce an executable program.
>>
>> LTO deviates from the the model that translation units are separate,
>> and the conceptual steps of phases 7 and 8.
> [...]
>
> Link time optimization is as valid as cross-function optimization *as
> long as* it doesn't change the defined behavior of the program.

It always does; the interaction of a translation unit with another
is an externally visible aspect of the C program. (That can be inferred
from the rules which forbid semantic analysis across translation
units, only linkage.)

That's why we can have a real world security issue caused by zeroing
being optimized away.

The rules spelled out in ISO C allow us to unit test a translation
unit by linking it to some harness, and be sure it has exactly the
same behaviors when linked to the production program.

If I have some translation unit in which there is a function foo, such
that when I call foo, it then calls an external function bar, that's
observable. I can link that unit to a program which supplies bar,
containing a printf call, then call foo and verify that the printf call
is executed.

Since ISO C says that the semantic analysis has been done (that
unit having gone through phase 7), we can take it for granted as a
done-and-dusted property of that translation unit that it calls bar
whenever its foo is invoked.

> Say I have a call to foo in main, and the definition of foo is in
> another translation unit. In the absence of LTO, the compiler will have
> to generate a call to foo. If LTO is able to determine that foo doesn't
> do anything, it can remove the code for the function call, and the
> resulting behavior of the linked program is unchanged.

There always situations in which optimizations that have been forbidden
don't cause a problem, and are even desirable.

If you have LTO turned on, you might be programming in GNU C or Clang C
or whatever, not standard C.

Sometimes programs have the same interpretation in GNU C and standard
C, or the same interpretation to someone who doesn't care about certain
differences.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<87le6az0s8.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36075&group=comp.lang.c#36075

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 09:31:03 -0700
Organization: None to speak of
Lines: 168
Message-ID: <87le6az0s8.fsf@nosuchdomain.example.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com>
<87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="d7de0737ea53c76359917fb7cbce40ac";
logging-data="3189370"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+XxOadGbx+xfy/6XiEedhL"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:DVjokyNvL66piN0QHeWauNqzmqQ=
sha1:HU95ZRBUWxmGryoU5mu8wM9X0EM=
 by: Keith Thompson - Fri, 22 Mar 2024 16:31 UTC

Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>>>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>>> A "famous security bug":
>>>>>>
>>>>>> void f( void )
>>>>>> { char buffer[ MAX ];
>>>>>> /* . . . */
>>>>>> memset( buffer, 0, sizeof( buffer )); }
>>>>>>
>>>>>> . Can you see what the bug is?
>>>>>
>>>>> I don't know about "the bug", but conditions can be identified under
>>>>> which that would have a problem executing, like MAX being in excess
>>>>> of available automatic storage.
>>>>>
>>>>> If the /*...*/ comment represents the elision of some security sensitive
>>>>> code, where the memset is intended to obliterate secret information,
>>>>> of course, that obliteration is not required to work.
>>>>>
>>>>> After the memset, the buffer has no next use, so the all the assignments
>>>>> performed by memset to the bytes of buffer are dead assignments that can
>>>>> be elided.
>>>>>
>>>>> To securely clear memory, you have to use a function for that purpose
>>>>> that is not susceptible to optimization.
>>>>>
>>>>> If you're not doing anything stupid, like link time optimization, an
>>>>> external function in another translation unit (a function that the
>>>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>>>> ought to suffice.
>>>>
>>>> Using LTO is not "stupid". Relying on people /not/ using LTO, or not
>>>> using other valid optimisations, is "stupid".
>>>
>>> LTO is a nonconforming optimization. It destroys the concept that
>>> when a translation unit is translated, the semantic analysis is
>>> complete, such that the only remaining activity is resolution of
>>> external references (linkage), and that the semantic analysis of one
>>> translation unit deos not use information about another translation
>>> unit.
>>>
>>> This has not yet changed in last April's N3096 draft, where
>>> translation phases 7 and 8 are:
>>>
>>> 7. White-space characters separating tokens are no longer significant.
>>> Each preprocessing token is converted into a token. The resulting
>>> tokens are syntactically and semantically analyzed and translated
>>> as a translation unit.
>>>
>>> 8. All external object and function references are resolved. Library
>>> components are linked to satisfy external references to functions
>>> and objects not defined in the current translation. All such
>>> translator output is collected into a program image which contains
>>> information needed for execution in its execution environment.
>>>
>>> and before that, the Program Structure section says:
>>>
>>> The separate translation units of a program communicate by (for
>>> example) calls to functions whose identifiers have external linkage,
>>> manipulation of objects whose identifiers have external linkage, or
>>> manipulation of data files. Translation units may be separately
>>> translated and then later linked to produce an executable program.
>>>
>>> LTO deviates from the the model that translation units are separate,
>>> and the conceptual steps of phases 7 and 8.
>> [...]
>>
>> Link time optimization is as valid as cross-function optimization *as
>> long as* it doesn't change the defined behavior of the program.
>
> It always does; the interaction of a translation unit with another
> is an externally visible aspect of the C program. (That can be inferred
> from the rules which forbid semantic analysis across translation
> units, only linkage.)
>
> That's why we can have a real world security issue caused by zeroing
> being optimized away.
>
> The rules spelled out in ISO C allow us to unit test a translation
> unit by linking it to some harness, and be sure it has exactly the
> same behaviors when linked to the production program.
>
> If I have some translation unit in which there is a function foo, such
> that when I call foo, it then calls an external function bar, that's
> observable. I can link that unit to a program which supplies bar,
> containing a printf call, then call foo and verify that the printf call
> is executed.
>
> Since ISO C says that the semantic analysis has been done (that
> unit having gone through phase 7), we can take it for granted as a
> done-and-dusted property of that translation unit that it calls bar
> whenever its foo is invoked.

We can take it for granted that the output performed by the printf call
will be performed, because output is observable behavior. If the
external function bar is modified, the LTO step has to be redone.

>> Say I have a call to foo in main, and the definition of foo is in
>> another translation unit. In the absence of LTO, the compiler will have
>> to generate a call to foo. If LTO is able to determine that foo doesn't
>> do anything, it can remove the code for the function call, and the
>> resulting behavior of the linked program is unchanged.
>
> There always situations in which optimizations that have been forbidden
> don't cause a problem, and are even desirable.
>
> If you have LTO turned on, you might be programming in GNU C or Clang C
> or whatever, not standard C.
>
> Sometimes programs have the same interpretation in GNU C and standard
> C, or the same interpretation to someone who doesn't care about certain
> differences.

Are you claiming that a function call is observable behavior?

Consider:

main.c:
#include "foo.h"
int main(void) {
foo();
}

foo.h:
#ifndef FOO_H
#define FOO_H
void foo(void);
#endif

foo.c:
void foo(void) {
// do nothing
}

Are you saying that the "call" instruction generated for the function
call is *observable behavior*? If an implementation doesn't generate
that "call" instruction because it's able to determine at link time that
the call does nothing, that optimization is forbidden?

I presume you'd agree that omitting the "call" instruction is allowed if
the call and the function definition are in the same translation unit.
What wording in the standard requires a "call" instruction to be
generated if they're in different translation units?

That's a trivial example, but other link time optimizations that don't
change a program's observable behavior (insert weasel words about
unspecified behavior) are also allowed.

In phase 8:
All external object and function references are resolved. Library
components are linked to satisfy external references to functions
and objects not defined in the current translation. All such
translator output is collected into a program image which contains
information needed for execution in its execution environment.

I don't see anything about required CPU instructions.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: A Famous Security Bug

<utkc18$311sb$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36077&group=comp.lang.c#36077

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 12:35:52 -0400
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <utkc18$311sb$2@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 16:36:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1b08a0f50df3424e24db82030d31985";
logging-data="3180427"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19kYu6MspfNq4x8BDzoJvDWAwJKQ3EqTAU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Ad9eROWxIwmmJQ3tUFIZHKh+oiE=
Content-Language: en-US
In-Reply-To: <87a5mr1ffp.fsf@nosuchdomain.example.com>
 by: James Kuyper - Fri, 22 Mar 2024 16:35 UTC

On 3/21/24 16:46, Keith Thompson wrote:
....
> Link time optimization is as valid as cross-function optimization *as
> long as* it doesn't change the defined behavior of the program.

Minor adjustment: due to unspecified behavior, some code can have
multiple permitted behaviors. LTO could be conforming even if it changed
the behavior, as long as it changes it to one of the other permitted
behaviors. For implementation-defined behavior, the fact that the change
could happen would have to be documented.

Re: A Famous Security Bug

<utkdpd$311sb$3@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36078&group=comp.lang.c#36078

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 13:05:49 -0400
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <utkdpd$311sb$3@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Mar 2024 17:05:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1b08a0f50df3424e24db82030d31985";
logging-data="3180427"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+RLyrjqw1bina1NnIfyrcC8UdKCZTWmlk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:XWLvKxzBbcnPXa9ye36nGkYyXr4=
In-Reply-To: <20240322083648.539@kylheku.com>
Content-Language: en-US
 by: James Kuyper - Fri, 22 Mar 2024 17:05 UTC

On 3/22/24 11:50, Kaz Kylheku wrote:
> On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
....
>> Link time optimization is as valid as cross-function optimization *as
>> long as* it doesn't change the defined behavior of the program.
>
> It always does; the interaction of a translation unit with another
> is an externally visible aspect of the C program.

The standard makes no use of the concept of "externally visible aspects".

"The least requirements on a conforming implementation are:
— Volatile accesses to objects are evaluated strictly according to the
rules of the abstract machine.
— At program termination, all data written into files shall be identical
to the result that execution of the program according to the abstract
semantics would have produced.
— The input and output dynamics of interactive devices shall take place
as specified in 7.23.3.
The intent of these requirements is that unbuffered or line-buffered
output appear as soon as possible, to ensure that prompting messages
appear prior to a program waiting for input.
This is the observable behavior of the program." (5.1.2.3p6).

The term "observable behavior" is italicized, an ISO convention
indicating that the sentence in which that term is italicized
constitutes the official definition of that term. Note, in particular,
that the term does NOT mean "behavior which can be observed", which
would otherwise be closely connected to your concept of "externally
visible aspects".

Note that "observable behavior" does NOT include function calls, not
even calls to functions defined in different translation units.

The standard explicitly permits optimizations which violate the abstract
semantics, so long as they result in the same observable behavior as if
the abstract semantics had been obeyed. Being able to express that
concept is the only reason that the term "observable behavior" exists.

> ... (That can be inferred
> from the rules which forbid semantic analysis across translation
> units, only linkage.)

I see no wording forbidding such analysis. The section you cite permits
separate translation, but does not forbid whole-program translation.

....
> If I have some translation unit in which there is a function foo, such
> that when I call foo, it then calls an external function bar, that's
> observable.

Not in the sense of "observable behavior" as that term is defined by the
C standard.

....
> Since ISO C says that the semantic analysis has been done (that
> unit having gone through phase 7),

A footnote makes it clear that the translation phases are purely
conceptual, identifying the precedence between the different semantic
rules that they specify. An implementation is not prohibited from
intermingling the translation phases, so long as it produces the same
observable behavior as if it had not intermingled them.

....
> If you have LTO turned on, you might be programming in GNU C or Clang C
> or whatever, not standard C.

True, but you also could be programming in standard C.

Re: A Famous Security Bug

<utkea9$31sr2$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36079&group=comp.lang.c#36079

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 13:14:49 -0400
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <utkea9$31sr2$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 17:14:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1b08a0f50df3424e24db82030d31985";
logging-data="3208034"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/iFVO8OHBya6sGTju2kCV8f7wdDzA6AMw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:nwe2Oq/POjgC3HI7JhsNbaq3ZG0=
In-Reply-To: <20240321211306.779b21d126e122556c34a346@gmail.moc>
Content-Language: en-US
 by: James Kuyper - Fri, 22 Mar 2024 17:14 UTC

On 3/21/24 14:13, Anton Shepelev wrote:
....
> I think this behavior (of a C compiler) rather stupid. In a
> low-level imperative language, the compiled program shall
> do whatever the programmer commands it to do.

C is NOT that low a level of language. The standard explicitly allows
implementations to use any method they find convenient to produce
observable behavior which is consistent with the requirements of the
standard. Despite describing how that behavior might be produced by the
abstract machine, it explicitly allows an implementation to achieve that
behavior by other means.

If you want to tell a system not only what a program must do, but also
how it must do it, you need to use a lower-level language than C. That's
not what C is for.

Re: A Famous Security Bug

<utkeb3$31sr2$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36080&group=comp.lang.c#36080

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 13:15:15 -0400
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <utkeb3$31sr2$2@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me>
<20240322083037.20@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 17:15:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1b08a0f50df3424e24db82030d31985";
logging-data="3208034"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LFDyq/NCEvhCkhDhsKVY3kcIQ3TSoCd0="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:uEYJj+cR9YNK7Cvaa6/WLRAFcSo=
Content-Language: en-US
In-Reply-To: <20240322083037.20@kylheku.com>
 by: James Kuyper - Fri, 22 Mar 2024 17:15 UTC

On 3/22/24 11:33, Kaz Kylheku wrote:
....
> They are not fixable. Translation units are separate, subject
> to separate semantic analysis, which is settled prior to linkage.>
> The semantic analysis of one translation unit must be carried out in the
> absence of any information about what is in another translation unit.

The standard imposes no such requirement. It permits separate
compilation. It does not mandate it.

Re: A Famous Security Bug

<20240322094449.555@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36081&group=comp.lang.c#36081

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 17:20:03 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 139
Message-ID: <20240322094449.555@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
Injection-Date: Fri, 22 Mar 2024 17:20:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
logging-data="3210798"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19hDZd8jZOex4BLrJvylQhxB/1JK8zVhDM="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:lnLseEdtpdHEI/elmvn6HGDer7U=
 by: Kaz Kylheku - Fri, 22 Mar 2024 17:20 UTC

On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>> Since ISO C says that the semantic analysis has been done (that
>> unit having gone through phase 7), we can take it for granted as a
>> done-and-dusted property of that translation unit that it calls bar
>> whenever its foo is invoked.
>
> We can take it for granted that the output performed by the printf call
> will be performed, because output is observable behavior. If the
> external function bar is modified, the LTO step has to be redone.

That's what undeniably has to be done in the LTO world. Nothing that
is done brings that world into conformance, though.

>>> Say I have a call to foo in main, and the definition of foo is in
>>> another translation unit. In the absence of LTO, the compiler will have
>>> to generate a call to foo. If LTO is able to determine that foo doesn't
>>> do anything, it can remove the code for the function call, and the
>>> resulting behavior of the linked program is unchanged.
>>
>> There always situations in which optimizations that have been forbidden
>> don't cause a problem, and are even desirable.
>>
>> If you have LTO turned on, you might be programming in GNU C or Clang C
>> or whatever, not standard C.
>>
>> Sometimes programs have the same interpretation in GNU C and standard
>> C, or the same interpretation to someone who doesn't care about certain
>> differences.
>
> Are you claiming that a function call is observable behavior?

Yes. It is the observable behavior of an unlinked translation unit.

It can be observed by linking a harness to it, with a main() function
and all else that is required to make it a complete program.

That harness becomes an instrument for observation.

> Consider:
>
> main.c:
> #include "foo.h"
> int main(void) {
> foo();
> }
>
>
> foo.h:
> #ifndef FOO_H
> #define FOO_H
> void foo(void);
> #endif
>
>
> foo.c:
> void foo(void) {
> // do nothing
> }
>
>
> Are you saying that the "call" instruction generated for the function
> call is *observable behavior*?

Of course; it can be observed externally, without doing any reverse
engineering on the translated unit.

External linkage is called "external" for a reason!

> If an implementation doesn't generate
> that "call" instruction because it's able to determine at link time that
> the call does nothing, that optimization is forbidden?

The text says so. Translation units are separate; semantic analysis is
finished in translation phase 7; linking in 8.

Out of translation phases 1-7 we get a concrete artifact: the translated
unit. That has externally visible features, like what symbols it
requires. Its behavior with regard to those symbols can be empirically
observed, validated by tests and expected to hold thereafter.

Since semantic analysis is complete, any observable behavior can be
taken to be a fact about that translated unit, a property of it, which
will not change when it is subject to linkage. The truth cannot be
clawed back, according to the way things are defined in the standard,
and this is a good thing.

> I presume you'd agree that omitting the "call" instruction is allowed if
> the call and the function definition are in the same translation unit.

Yes.

And that's a way to get the effect of LTO portably, in a conforming
way, in any implementation going back decades. Instead of linkage use
#include "foo.c", #include "bar.c" (taking steps to ensure your internal
names don't clash).

LTO is more convenient in that you don't have to use an unusual
program structure, and keeps your internal linkage scopes separate.
Just don't pretend it's conforming to standard C, any more than
-ffast-math.

LTO is "vooodoo" though. The translation units contain intermediate
code, not target code. The intermediate code continues to be subject
to compiler passes when the translation units are brought together.
Thus translation is going on, but the units are gone.

> What wording in the standard requires a "call" instruction to be
> generated if they're in different translation units?
>
> That's a trivial example, but other link time optimizations that don't
> change a program's observable behavior (insert weasel words about
> unspecified behavior) are also allowed.

An example would be the removal of material that is not referenced,
like functions not called anywhere, or entire translation units
whose external names are not referenced. That can cause issues too,
and I've run into them, but I can't call that nonconforming.
Nothing is semantically analyzed across translation units, only the
linkage graph itself, which may be found to be disconnected.

> In phase 8:
> All external object and function references are resolved. Library
> components are linked to satisfy external references to functions
> and objects not defined in the current translation. All such
> translator output is collected into a program image which contains
> information needed for execution in its execution environment.
>
> I don't see anything about required CPU instructions.

I don't see anything about /removing/ instructions that have to be
there according to the semantic analysis performed in order to
translate those units from phases 1 - 7, and that can be confirmed
to be present with a test harness.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<20240322102255.834@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36082&group=comp.lang.c#36082

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 17:28:08 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <20240322102255.834@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<utkc18$311sb$2@dont-email.me>
Injection-Date: Fri, 22 Mar 2024 17:28:08 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
logging-data="3210798"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18XiTAQeAntgpsfx0Qsx/3akx2g5orX2dY="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:jnMIdeF0uxqenxMcwog7HjiLLnA=
 by: Kaz Kylheku - Fri, 22 Mar 2024 17:28 UTC

On 2024-03-22, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> On 3/21/24 16:46, Keith Thompson wrote:
> ...
>> Link time optimization is as valid as cross-function optimization *as
>> long as* it doesn't change the defined behavior of the program.
>
> Minor adjustment: due to unspecified behavior, some code can have
> multiple permitted behaviors. LTO could be conforming even if it changed
> the behavior, as long as it changes it to one of the other permitted
> behaviors. For implementation-defined behavior, the fact that the change
> could happen would have to be documented.

Some unspecified behaviors can change at execution time, like the
unspecified value of an uninitialized unsigned char object in
a malloc-ed block.

If the unspecified behavior a translation unit is changed to another in
a way that obviously requires semantic analysis (such that a change
occurs in the translated unit that amounts to it having been
re-translated) then that appears to violate the requirements in ISO C
about semantic analysis being done in phase 7, and not any later.

I think translation units can be retained in a form that has not
completely gone through translation phase 7. Such that before linkage,
analysis can take place which completes phase 7, before 8 begins.

However, that analysis has to be done in isolation. The standard
describes translation units as being separate.

If we take N translation units from phases 1 to 6, and halfway through
7, and then to complete the semantic analysis of phase 7, the translator
peeks across all N units, then that is no longer proper separation of
translation units right through phase 7. Combination of translation
units can only begin in 8, by which time semantic analysis is done.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<utkfm6$311sb$4@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36083&group=comp.lang.c#36083

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 13:38:13 -0400
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <utkfm6$311sb$4@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 17:38:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1b08a0f50df3424e24db82030d31985";
logging-data="3180427"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19s4ppJlq6IxFlos/bDR9M/2qH+A87q/Mc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:iNwLSkKjQwCwH3SPnWSAkx4oeDk=
Content-Language: en-US
In-Reply-To: <20240322094449.555@kylheku.com>
 by: James Kuyper - Fri, 22 Mar 2024 17:38 UTC

On 3/22/24 13:20, Kaz Kylheku wrote:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
....
>> Are you claiming that a function call is observable behavior?
>
> Yes. It is the observable behavior of an unlinked translation unit.

In the context of the C standard, "observable behavior" is a term with a
precisely specified meaning which is NOT "behavior which can be
observed". That definition does not cover function calls, not even those
with external linkage. What the standard says about what optimizations
are permitted is in terms of "observable behavior", NOT "behavior which
can be observed".

>> Are you saying that the "call" instruction generated for the function
>> call is *observable behavior*?
>
> Of course; it can be observed externally, without doing any reverse
> engineering on the translated unit.

And the C standard imposes no requirement that such behavior occur as
described by the abstract semantics. Only actual observable behavior, as
that term is defined by the C standard, must occur as if those semantics
were followed - whether or not they actually were.

....
>> If an implementation doesn't generate
>> that "call" instruction because it's able to determine at link time that
>> the call does nothing, that optimization is forbidden?
>
> The text says so. Translation units are separate; semantic analysis is
> finished in translation phase 7; linking in 8.

Translation phases are specified solely for the purpose of expressing
the precedence of the corresponding semantic rules. The standard
explicitly allows for the phases to be intermingled or even done out of
order, so long as the observable behavior is behavior that would be
permitted if they had been done in the order specified.

> Out of translation phases 1-7 we get a concrete artifact: the translated
> unit. That has externally visible features, like what symbols it
> requires. Its behavior with regard to those symbols can be empirically
> observed, validated by tests and expected to hold thereafter.

And the standard imposes no requirements on those externally visible
features, only on some (but not ALL) of the behavior that results from
executing the program.

Re: A Famous Security Bug

<utkfn0$311sb$5@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36084&group=comp.lang.c#36084

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 13:38:40 -0400
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <utkfn0$311sb$5@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<utkc18$311sb$2@dont-email.me> <20240322102255.834@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 17:38:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1b08a0f50df3424e24db82030d31985";
logging-data="3180427"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18KpyXJ4z6XG0FtoHM6Q5HgAsNh5QLE0kw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:W6KRPWwefLjsvSnA3X36zWjKxGQ=
In-Reply-To: <20240322102255.834@kylheku.com>
Content-Language: en-US
 by: James Kuyper - Fri, 22 Mar 2024 17:38 UTC

On 3/22/24 13:28, Kaz Kylheku wrote:
....
> If the unspecified behavior a translation unit is changed to another in
> a way that obviously requires semantic analysis (such that a change
> occurs in the translated unit that amounts to it having been
> re-translated) then that appears to violate the requirements in ISO C
> about semantic analysis being done in phase 7, and not any later.

There is no such requirement. The translation phases are explicitly not
required to be done in the specified order, so long as the result is one
that would be permitted by doing them in that order.

Re: A Famous Security Bug

<utkftr$32ahu$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36085&group=comp.lang.c#36085

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 18:42:19 +0100
Organization: A noiseless patient Spider
Lines: 225
Message-ID: <utkftr$32ahu$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 17:42:19 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="07a44ddb34b47981e78dc5e82c11a0d6";
logging-data="3222078"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18uefwZcUm2cCEj3jrdgM4JxbwXK+Mld/c="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:OCnegnn6Ziepk9fiXvxjAK1lonc=
In-Reply-To: <20240322083648.539@kylheku.com>
Content-Language: en-GB
 by: David Brown - Fri, 22 Mar 2024 17:42 UTC

On 22/03/2024 16:50, Kaz Kylheku wrote:
> On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>>>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>>> A "famous security bug":
>>>>>>
>>>>>> void f( void )
>>>>>> { char buffer[ MAX ];
>>>>>> /* . . . */
>>>>>> memset( buffer, 0, sizeof( buffer )); }
>>>>>>
>>>>>> . Can you see what the bug is?
>>>>>
>>>>> I don't know about "the bug", but conditions can be identified under
>>>>> which that would have a problem executing, like MAX being in excess
>>>>> of available automatic storage.
>>>>>
>>>>> If the /*...*/ comment represents the elision of some security sensitive
>>>>> code, where the memset is intended to obliterate secret information,
>>>>> of course, that obliteration is not required to work.
>>>>>
>>>>> After the memset, the buffer has no next use, so the all the assignments
>>>>> performed by memset to the bytes of buffer are dead assignments that can
>>>>> be elided.
>>>>>
>>>>> To securely clear memory, you have to use a function for that purpose
>>>>> that is not susceptible to optimization.
>>>>>
>>>>> If you're not doing anything stupid, like link time optimization, an
>>>>> external function in another translation unit (a function that the
>>>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>>>> ought to suffice.
>>>>
>>>> Using LTO is not "stupid". Relying on people /not/ using LTO, or not
>>>> using other valid optimisations, is "stupid".
>>>
>>> LTO is a nonconforming optimization. It destroys the concept that
>>> when a translation unit is translated, the semantic analysis is
>>> complete, such that the only remaining activity is resolution of
>>> external references (linkage), and that the semantic analysis of one
>>> translation unit deos not use information about another translation
>>> unit.
>>>
>>> This has not yet changed in last April's N3096 draft, where
>>> translation phases 7 and 8 are:
>>>
>>> 7. White-space characters separating tokens are no longer significant.
>>> Each preprocessing token is converted into a token. The resulting
>>> tokens are syntactically and semantically analyzed and translated
>>> as a translation unit.
>>>
>>> 8. All external object and function references are resolved. Library
>>> components are linked to satisfy external references to functions
>>> and objects not defined in the current translation. All such
>>> translator output is collected into a program image which contains
>>> information needed for execution in its execution environment.
>>>
>>> and before that, the Program Structure section says:
>>>
>>> The separate translation units of a program communicate by (for
>>> example) calls to functions whose identifiers have external linkage,
>>> manipulation of objects whose identifiers have external linkage, or
>>> manipulation of data files. Translation units may be separately
>>> translated and then later linked to produce an executable program.
>>>
>>> LTO deviates from the the model that translation units are separate,
>>> and the conceptual steps of phases 7 and 8.
>> [...]
>>
>> Link time optimization is as valid as cross-function optimization *as
>> long as* it doesn't change the defined behavior of the program.
>
> It always does; the interaction of a translation unit with another
> is an externally visible aspect of the C program.

The C standards don't define a term "externally visible". They define
"observable behaviour", and require that a conforming implementation
generates a program that matches the "observable behaviour". This is in
5.1.2.2.2p6. Interaction between translation units is not part of the
observable behaviour of a program, because it is not relevant to the
concept of /running/ a program - it is only relevant when translating
the source to the program image.

Thus the "as if" rules apply - the compiler can do whatever it wants -
up to and including asking ChatGPT for an exe file - as long as the
result is a /program/ that gives the same "observable behaviour" as you
would get from an abstract machine.

You should read the footnotes to 5.1.1.2 "Translation phases".
Footnotes are not normative, but they are helpful in explaining the
meaning of the text. They note that compilers don't have to follow the
details of the translation phases, and that source files, translation
units, and translated translation units don't have to have one-to-one
correspondences.

The standard also does not say what the output of "translation" is - it
does not have to be assembly or machine code. It can happily be an
internal format, as used by gcc and clang/llvm. It does not define what
"linking" is, or how the translated translation units are "collected
into a program image" - combining the partially compiled units,
optimising, and then generating a program image is well within that
definition.

> (That can be inferred
> from the rules which forbid semantic analysis across translation
> units, only linkage.)

The rules do not forbid semantic analysis across translation units -
they merely do not /require/ it. You are making an inference without
any justification that I can see.

>
> That's why we can have a real world security issue caused by zeroing
> being optimized away.

No, it is not. We have real-world security issues for all sorts of
reasons, including people mistakenly thinking they can force particular
types of code generation by calling functions in different source files.

(To be clear here, before LTO became common, that was a strategy that
worked. There is a long history in C programming of dilemmas between
writing code that you know works efficiently on current tools, or
writing code that you know is guaranteed correct by the standards but is
inefficient with current tools.)

>
> The rules spelled out in ISO C allow us to unit test a translation
> unit by linking it to some harness, and be sure it has exactly the
> same behaviors when linked to the production program.
>

No, they don't.

If the unit you are testing calls something outside that unit, you may
get different behaviours when testing and when used in production. The
only thing you can be sure of from testing is that if you find a bug
during testing, you have a bug in the code. You can never use testing
to be sure that the code works (with the exception of exhaustive testing
of all possible inputs, which is rarely practical).

> If I have some translation unit in which there is a function foo, such
> that when I call foo, it then calls an external function bar, that's
> observable.

5.1.2.2.1p6 lists the three things that C defines as "observable
behaviour". Function calls - internal or external - are not amongst these.

> I can link that unit to a program which supplies bar,
> containing a printf call, then call foo and verify that the printf call
> is executed.

Yes, you can. The printf call - or, more exactly, the "input and output
dynamics" - are observable behaviour. The call to "bar", however, is not.

The compiler, when compiling the source of "foo", will include a call to
"bar" when it does not have the source code (or other detailed semantic
information) for "bar" available at the time. But you are mistaken to
think it does so because the call is "observable" or required by the C
standard. It does so because it cannot prove that /running/ the
function "bar" contains no observable behaviour, or otherwise affects
the observable behaviour of the program. The compiler cannot skip the
call unless it can be sure it is safe to do so - and if it knows nothing
about the implementation of "bar", it must assume the worst.

Sometimes the compiler may have additional information - such as if it
is declared the gcc "const" or "pure" attributes (or the standardised
"unsequenced" and "reproducible" attributes in the draft for the next C
version after C23). This may allow a compiler to re-arrange calls,
duplicating them, eliminating them, or re-ordering them in various ways.
(The C2y draft includes running such functions once at startup for
each input value, and preserving the results for later use, as a
permissible optimisation. It does this without having changed the
description of translation phases or observable behaviour. But of
course it is still just a draft.)


Click here to read the complete article
Re: A Famous Security Bug

<utkgd2$32aj7$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36086&group=comp.lang.c#36086

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 18:50:26 +0100
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <utkgd2$32aj7$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me>
<20240322083037.20@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 17:50:26 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="07a44ddb34b47981e78dc5e82c11a0d6";
logging-data="3222119"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+LbYk8WoQoQ17FRuE/kUAho49GXyLrTus="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:Qg3T/MfVk/gOcZfpk88pDjHw3/U=
In-Reply-To: <20240322083037.20@kylheku.com>
Content-Language: en-GB
 by: David Brown - Fri, 22 Mar 2024 17:50 UTC

On 22/03/2024 16:33, Kaz Kylheku wrote:
> On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
>> On 21/03/2024 21:21, Kaz Kylheku wrote:
>>
>>> Eliminating dead stores is a very basic dataflow-driven optimization.
>>>
>>> Because memset is part of the C language, the compiler knows
>>> exactly what effect it has (that it's equivalent to setting
>>> all the bytes to zero, like a sequence of assignments).
>>>
>>
>> Yes.
>>
>>> If you don't want a call to be optimized away, call your
>>> own function in another translation unit.
>>
>> No.
>>
>> There are several ways that guarantee your code will carry out the
>> writes here (though none that guarantee the secret data is not also
>> stored elsewhere). Using a function in a different TU is not one of
>> these techniques. You do people a disfavour by recommending it.
>
> It demonstrably is.

It depends on your compiler and the options you use. That is not a good
choice - especially when better ones are available.

>
>>> (And don't turn
>>> on nonconforming cross-translation-unit optimizations.)
>>>
>>
>> If I knew of any non-conforming cross-translation-unit optimisations in
>> a compiler, I would avoid using them until the compiler vendor had fixed
>> the bug in question.
>
> They are not fixable. Translation units are separate, subject
> to separate semantic analysis, which is settled prior to linkage.
>
> The semantic analysis of one translation unit must be carried out in the
> absence of any information about what is in another translation unit.
>

"Proof by repeated assertion" does not hold.

I have tried to explain the reality of what the C standards say in a
couple of posts (including one that I had not posted before you wrote
this one). I have tried to make things as clear as possible, and
hopefully you will see the point.

If not, then you must accept that you interpret the C standards in a
different manner from the main compile vendors, as well as some "big
names" in this group. That is, of course, not proof in itself - but you
must realise that for practical purposes you need to be aware of how
others interpret the standard, both for your own coding and for the
advice or recommendations you give to others.

Re: A Famous Security Bug

<utkho0$32p2g$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36087&group=comp.lang.c#36087

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 19:13:19 +0100
Organization: A noiseless patient Spider
Lines: 111
Message-ID: <utkho0$32p2g$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 18:13:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="07a44ddb34b47981e78dc5e82c11a0d6";
logging-data="3236944"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19sY81ooWNESs1EsWm6rEqnB6kw26h1vH0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:kqBM29KfS5Hf92SHZ1ZPmjT+9tY=
Content-Language: en-GB
In-Reply-To: <20240322094449.555@kylheku.com>
 by: David Brown - Fri, 22 Mar 2024 18:13 UTC

On 22/03/2024 18:20, Kaz Kylheku wrote:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:

>>
>> Are you claiming that a function call is observable behavior?
>
> Yes. It is the observable behavior of an unlinked translation unit.
>
> It can be observed by linking a harness to it, with a main() function
> and all else that is required to make it a complete program.
>
> That harness becomes an instrument for observation.

That is "observable" in the same sense that the size of a compiled
object file is "observable" by executing "ls -l". It is not "observable
behaviour" as defined by the C standards.

C defines "observable behaviour" for /programs/. Not for translation
units, or translated translation units (what one might call an "object
file" - be it assembly, machine code, or internal compiler-specific
formats).

For C, it makes no sense to talk about "observable behaviour" for a
unit. It is only by linking the unit to your test harness that you get
a "program", which then has "observable behaviour".

>>
>>
>> Are you saying that the "call" instruction generated for the function
>> call is *observable behavior*?
>
> Of course; it can be observed externally, without doing any reverse
> engineering on the translated unit.

The contents of an object file - or the instructions used in a complete
program - are not "observable behaviour" in C. Again, I refer you to
5.1.2.2.2p6.

>
>> If an implementation doesn't generate
>> that "call" instruction because it's able to determine at link time that
>> the call does nothing, that optimization is forbidden?
>
> The text says so. Translation units are separate; semantic analysis is
> finished in translation phase 7; linking in 8.

The text also says (in footnotes) that the phases are for conceptual
description only, and in practice they are typically folded together.

>> What wording in the standard requires a "call" instruction to be
>> generated if they're in different translation units?
>>
>> That's a trivial example, but other link time optimizations that don't
>> change a program's observable behavior (insert weasel words about
>> unspecified behavior) are also allowed.
>
> An example would be the removal of material that is not referenced,
> like functions not called anywhere, or entire translation units
> whose external names are not referenced. That can cause issues too,
> and I've run into them, but I can't call that nonconforming.
> Nothing is semantically analyzed across translation units, only the
> linkage graph itself, which may be found to be disconnected.
>

Removal of unreferenced material at link time is very common. In some
fields, it is standard practice to use compiler and linker flags geared
at making this easier. It is not really any different than using static
libraries - the linker will load all requested static libraries, then
throw out all parts that are not transitively reachable from non-library
code.

The inclusion or not of material in the program image is not directly
observable behaviour in C - there is no way to write portable C code to
determine if the function "foo" has been included in the image despite
never being referenced. (You can, of course, have the linker include
information about the image inside the image itself and read that with
volatile accesses from within the program.)

In small-systems embedded programming, "-ffunction-sections" and
"-fdata-sections", along with "-Wl,--gc-sections", are almost invariably
used for gcc to reduce the size of the final image. It makes it much
more practical to write re-usable code even if not all functions are
used in any given application. I have never heard of it "causing
issues", and I cannot see how it might be non-conforming. (And if it is
not a conformance issue, how is it relevant here?)

>> In phase 8:
>> All external object and function references are resolved. Library
>> components are linked to satisfy external references to functions
>> and objects not defined in the current translation. All such
>> translator output is collected into a program image which contains
>> information needed for execution in its execution environment.
>>
>> I don't see anything about required CPU instructions.
>
> I don't see anything about /removing/ instructions that have to be
> there according to the semantic analysis performed in order to
> translate those units from phases 1 - 7, and that can be confirmed
> to be present with a test harness.
>

The C standard doesn't deal with CPU instructions. It does not have a
concept of "running" a translated translation unit - you can only run a
complete program, at which point there is no distinction between the
translation units that are "collected" into the program image. It's all
fused together into one big lump, with one set of observable behaviours.

Re: A Famous Security Bug

<87cyrmyvnv.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36088&group=comp.lang.c#36088

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 11:21:40 -0700
Organization: None to speak of
Lines: 97
Message-ID: <87cyrmyvnv.fsf@nosuchdomain.example.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com>
<87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com>
<87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="d7de0737ea53c76359917fb7cbce40ac";
logging-data="3241002"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+JNyB04lHROpvk0kOmK/Pd"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:IzC8kiP/D9on/w8e/p1Ai/shCCs=
sha1:UffK3Q4Zuwb0KhE9KnzlZISXToU=
 by: Keith Thompson - Fri, 22 Mar 2024 18:21 UTC

Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> Since ISO C says that the semantic analysis has been done (that
>>> unit having gone through phase 7), we can take it for granted as a
>>> done-and-dusted property of that translation unit that it calls bar
>>> whenever its foo is invoked.
>>
>> We can take it for granted that the output performed by the printf call
>> will be performed, because output is observable behavior. If the
>> external function bar is modified, the LTO step has to be redone.
>
> That's what undeniably has to be done in the LTO world. Nothing that
> is done brings that world into conformance, though.
>
>>>> Say I have a call to foo in main, and the definition of foo is in
>>>> another translation unit. In the absence of LTO, the compiler will have
>>>> to generate a call to foo. If LTO is able to determine that foo doesn't
>>>> do anything, it can remove the code for the function call, and the
>>>> resulting behavior of the linked program is unchanged.
>>>
>>> There always situations in which optimizations that have been forbidden
>>> don't cause a problem, and are even desirable.
>>>
>>> If you have LTO turned on, you might be programming in GNU C or Clang C
>>> or whatever, not standard C.
>>>
>>> Sometimes programs have the same interpretation in GNU C and standard
>>> C, or the same interpretation to someone who doesn't care about certain
>>> differences.
>>
>> Are you claiming that a function call is observable behavior?
>
> Yes. It is the observable behavior of an unlinked translation unit.

An unlinked translation unit has no observable behavior in the way that
term is defined by the standard.

> It can be observed by linking a harness to it, with a main() function
> and all else that is required to make it a complete program.
>
> That harness becomes an instrument for observation.

And a "call" instruction in a program consisting of a single translation
unit can be observed in a variety of ways. That doesn't make it
"observable behavior".

Are you using the phrase "observable behavior" in a sense other than
what's defined in N1570 5.1.2.3?

[...]

>> Are you saying that the "call" instruction generated for the function
>> call is *observable behavior*?
>
> Of course; it can be observed externally, without doing any reverse
> engineering on the translated unit.

Is the "call" instruction *observable behavior* as defined in 5.1.2.3?

[...]

>> In phase 8:
>> All external object and function references are resolved. Library
>> components are linked to satisfy external references to functions
>> and objects not defined in the current translation. All such
>> translator output is collected into a program image which contains
>> information needed for execution in its execution environment.
>>
>> I don't see anything about required CPU instructions.
>
> I don't see anything about /removing/ instructions that have to be
> there according to the semantic analysis performed in order to
> translate those units from phases 1 - 7, and that can be confirmed
> to be present with a test harness.

The standard doesn't mention either adding or removing instructions.

Running a program under a test harness is effectively running a
different program. Of course it can yield information about the
original program, but in effect you're linking the program with a
different set of libraries.

I can use a test harness to observe whether a program uses an add or inc
instruction to evaluate `i++` (assuming the CPU has both instructions).
The standard doesn't care how the increment happens, as long as the
result is correct. It doesn't care *whether* the increment happens
unless the result affects the programs *observable behavior*.

What in the description of translation phases 7 and 8 makes
behavior-preserving optimizations valid in phase 7 and forbidden in
phase 8? (Again, insert weasel words about unspecified behavior.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: A Famous Security Bug

<20240322105321.365@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36089&group=comp.lang.c#36089

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 18:55:15 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 228
Message-ID: <20240322105321.365@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <utkftr$32ahu$1@dont-email.me>
Injection-Date: Fri, 22 Mar 2024 18:55:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
logging-data="3255872"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Xx5s3CrvOw/8Km61E0Nz2frTUIdwrCsg="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:wf5Qsrpq5fDQGrr2KPJy+kkzAr4=
 by: Kaz Kylheku - Fri, 22 Mar 2024 18:55 UTC

On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
> You should read the footnotes to 5.1.1.2 "Translation phases".
> Footnotes are not normative, but they are helpful in explaining the
> meaning of the text. They note that compilers don't have to follow the
> details of the translation phases, and that source files, translation
> units, and translated translation units don't have to have one-to-one
> correspondences.

Yes, I'm aware of that. For instance preprocessing can all be jumbled
into one process. But it has to produce that result.

Even if translation phases 7 and 8 are combined, the semantic analysis
of the individual translation unit has to appear to be settled before
linkage. So for instance a translation unit could incrementally emerge
from the semantic analysis steps, and those parts of it already analyzed
(phase 7) could start to be linked to other translation units (phase 8).

I'm just saying that certain information leakage is clearly permitted,
regardless of how the phases are integrated.

> The standard also does not say what the output of "translation" is - it
> does not have to be assembly or machine code. It can happily be an
> internal format, as used by gcc and clang/llvm. It does not define what
> "linking" is, or how the translated translation units are "collected
> into a program image" - combining the partially compiled units,
> optimising, and then generating a program image is well within that
> definition.
>
>> (That can be inferred
>> from the rules which forbid semantic analysis across translation
>> units, only linkage.)
>
> The rules do not forbid semantic analysis across translation units -
> they merely do not /require/ it. You are making an inference without
> any justification that I can see.

Translation phase 7 is clearly about a single translation unit in
isolation:

"The resulting tokens are syntactically and semantically analyzed
and translated as a translation unit."

Not: "as a combination of multiple translation uints".

5.1.1.1 clearly refers to "[t]he separate translation units of a
program".

LTO pretends that the program is still divided into the same translation
units, while minging them together in ways contrary to all those
chapter 5 descriptions.

The conforming way to obtain LTO is to actually combine multiple
preprocessing translation units into one.

>> That's why we can have a real world security issue caused by zeroing
>> being optimized away.
>
> No, it is not. We have real-world security issues for all sorts of
> reasons, including people mistakenly thinking they can force particular
> types of code generation by calling functions in different source files.

In fact, that code generation is forced, when people do not use LTO,
which is not enabled by default.

>> The rules spelled out in ISO C allow us to unit test a translation
>> unit by linking it to some harness, and be sure it has exactly the
>> same behaviors when linked to the production program.
>
> No, they don't.
>
> If the unit you are testing calls something outside that unit, you may
> get different behaviours when testing and when used in production.

Yes; if you do nonconforming things.

> only thing you can be sure of from testing is that if you find a bug
> during testing, you have a bug in the code. You can never use testing
> to be sure that the code works (with the exception of exhaustive testing
> of all possible inputs, which is rarely practical).

LTO will break translation units that are simple enough to be trivially
proven to have a certain behavior.

>> If I have some translation unit in which there is a function foo, such
>> that when I call foo, it then calls an external function bar, that's
>> observable.
>
> 5.1.2.2.1p6 lists the three things that C defines as "observable
> behaviour". Function calls - internal or external - are not amongst these.

External calls are de facto observable, because we have it for granted
when we have a translation unit that calls a certain function, we can
supply another translation unit which supplies that function. In
that function we can communicate with the host environment to confirm
that it was called.

>> I can link that unit to a program which supplies bar,
>> containing a printf call, then call foo and verify that the printf call
>> is executed.
>
> Yes, you can. The printf call - or, more exactly, the "input and output
> dynamics" - are observable behaviour. The call to "bar", however, is not.

If bar does not call the function, then the observable behavior of
printf doesn't occur either; they linked by logic / cause-and-effect.

A behavior that is not itself formally classified as observable can be
discovered by logical linkage to be necessary for the production of
observable behavior. It can be an "if, and only if" linkage.

If an observable behavior B occurs if, and only if, some behavior A
occurs, then the fact of whether A occurs or not is de facto observable.

> The compiler, when compiling the source of "foo", will include a call to
> "bar" when it does not have the source code (or other detailed semantic
> information) for "bar" available at the time.

Translation phases 1 to 7 forbid processing material from another
translation unit. Conforming semantic analysis of a translation unit has
nothing but that translation unit.

> But you are mistaken to
> think it does so because the call is "observable" or required by the C
> standard.

Sure; let's say that the call can be tied to observable behavior
elsewhere such that the call occurs if and only if the observable
behavior occurs.

> It does so because it cannot prove that /running/ the
> function "bar" contains no observable behaviour, or otherwise affects
> the observable behaviour of the program. The compiler cannot skip the
> call unless it can be sure it is safe to do so - and if it knows nothing
> about the implementation of "bar", it must assume the worst.

The compiler cannot do any of this if it is in a conforming mode.

But sure, in the nonconforming LTO paradigm, which does have to adhere
to sane rules, that more or less follow what would have to happen if
multiple preprocessing translation units were merged at the token level
and thus analyzed together.

> Sometimes the compiler may have additional information - such as if it
> is declared the gcc "const" or "pure" attributes (or the standardised
> "unsequenced" and "reproducible" attributes in the draft for the next C
> version after C23).

If the declarations are available only in another translation unit,
they cannot be taken into account when analyzing this translation unit.

>> Since ISO C says that the semantic analysis has been done (that
>> unit having gone through phase 7), we can take it for granted as a
>> done-and-dusted property of that translation unit that it calls bar
>> whenever its foo is invoked.
>
> No, we can't - see above. Nothing in the C standards forbids any
> additional analysis, or using other information in code generation.

Any semantic analysis performed be that which is stated in translation
phase 7, which happens for one translation unit, before considering
linkage to other translation units.

What forbids is is that no semantic analysis activity is decribed as
taking place in translation phase 8, other than linage.

>>> Say I have a call to foo in main, and the definition of foo is in
>>> another translation unit. In the absence of LTO, the compiler will have
>>> to generate a call to foo. If LTO is able to determine that foo doesn't
>>> do anything, it can remove the code for the function call, and the
>>> resulting behavior of the linked program is unchanged.
>>
>> There always situations in which optimizations that have been forbidden
>> don't cause a problem, and are even desirable.
>>
>
> Can you give examples?
>
> You already mentioned "-fast-math" (and by implication, its various
> subflags in gcc, clang and icc). These are clearly documented as
> allowing some violations of the C standards (and not least, the IEEE
> floating point standards, which are stricter than those of C).

Yes, and some people want that, learn how it works, and get their
programs working with it, all the while knowing that it's
nonconforming to IEEE and ISO C.

Another tool in the box.

> (While I don't much like an "appeal to authority" argument, I think it's
> worth noting that the major C / C++ compilers, gcc, clang/llvm and MSVC,
> all support link-time optimisation. They also all work together with
> both the C and C++ standards committees. It would be quite the scandal
> if there were any truth in your claims and these compiler vendors were
> all breaking the rules of the languages they help to specify!)


Click here to read the complete article
Re: A Famous Security Bug

<20240322115519.204@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36090&group=comp.lang.c#36090

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 19:27:32 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <20240322115519.204@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <utkfm6$311sb$4@dont-email.me>
Injection-Date: Fri, 22 Mar 2024 19:27:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
logging-data="3270917"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Q3TcdSgB6xgHO5bgzcQS/pIZCbP6AvZU="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:ZvQSlxATVnVnZ0bIAoxdIRo4nX8=
 by: Kaz Kylheku - Fri, 22 Mar 2024 19:27 UTC

On 2024-03-22, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> And the C standard imposes no requirement that such behavior occur as
> described by the abstract semantics. Only actual observable behavior, as
> that term is defined by the C standard, must occur as if those semantics
> were followed - whether or not they actually were.

But there is something. Though not normative text, EXAMPLE 1 gives
the range of possibilities for optimization:

EXAMPLE 1 An implementation might define a one-to-one correspondence
between abstract and actual semantics: at every sequence point, the
values of the actual objects would agree with those specified by the
abstract semantics. The keyword volatile would then be redundant.

Alternatively, an implementation might perform various optimizations
within each translation unit, such that the actual semantics would agree
with the abstract semantics only when making function calls across
translation unit boundaries.

I believe the intent of this example is to give the two extremes
representing the full range of what is envisioned as permissible.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<utkmpk$33puj$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36091&group=comp.lang.c#36091

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 12:39:33 -0700
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <utkmpk$33puj$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <uti2am$2d2ts$1@dont-email.me>
<ZE0LN.84950$_a1e.38190@fx16.iad> <uti8ve$2ekbr$1@dont-email.me>
<Hf3LN.544352$Ama9.472059@fx12.iad> <utijv2$2h3up$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Mar 2024 19:39:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1dc3221c1e8ed953d6c9b3654fc414ab";
logging-data="3270611"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+GRt7md4cPh47wZ6UIdXmyu3ZIFFI7b4w="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ZqZUIlGnF46SGvAZWdCoaCVXN4I=
Content-Language: en-US
In-Reply-To: <utijv2$2h3up$1@dont-email.me>
 by: Chris M. Thomasson - Fri, 22 Mar 2024 19:39 UTC

On 3/21/2024 5:38 PM, Chris M. Thomasson wrote:
> On 3/21/2024 4:19 PM, Scott Lurndal wrote:
>> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>>> On 3/21/2024 1:21 PM, Scott Lurndal wrote:
>>>> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>>>>
>>>>> "All of its “critical-sequences” are contained in externally assembled
>>>>> functions ( read all ) in order to prevent a rouge C compiler from
>>>>
>>>> As opposed to a viridian C compiler?
>>>
>>> I was worried about "overly aggressive" LTO messing around with my ASM.
>>
>> And you missed the oblique reference to the mispelling of 'rogue' as
>> 'rouge'.
>
> Yup! I sure did. I have red on my face!

I wonder if I have a bit of dyslexia. Sometimes when I am typing along
without looking at the keyboard, I can make a mistake that is backwards
wrt two letters.

For instance, spelling the word "careful" as "carfeul", car fuel? lol...
The mistake I made with rogue vs rouge is that same swapping error as
well. This is a "bad" one because spell checker does not flag it.

It's strange because when I look at the keyboard while I am typing,
well, that does not occur.

Re: A Famous Security Bug

<20240322123323.805@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36092&group=comp.lang.c#36092

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 19:43:04 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <20240322123323.805@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
Injection-Date: Fri, 22 Mar 2024 19:43:04 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
logging-data="3270966"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+g+JVPl5NgJtgnUMvX96llfqFOy2hHvEM="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:GN/kXJNP7Fmxwf6cfNb27xBZnTo=
 by: Kaz Kylheku - Fri, 22 Mar 2024 19:43 UTC

On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Is the "call" instruction *observable behavior* as defined in 5.1.2.3?

No it isn't. The Boolean fact whether or not that call is taken can be
tied to observable behavior elsewhere, though.

>
> [...]
>
>>> In phase 8:
>>> All external object and function references are resolved. Library
>>> components are linked to satisfy external references to functions
>>> and objects not defined in the current translation. All such
>>> translator output is collected into a program image which contains
>>> information needed for execution in its execution environment.
>>>
>>> I don't see anything about required CPU instructions.
>>
>> I don't see anything about /removing/ instructions that have to be
>> there according to the semantic analysis performed in order to
>> translate those units from phases 1 - 7, and that can be confirmed
>> to be present with a test harness.
>
> The standard doesn't mention either adding or removing instructions.
>
> Running a program under a test harness is effectively running a
> different program. Of course it can yield information about the
> original program, but in effect you're linking the program with a
> different set of libraries.

It's a different program, but the retained translation unit must be the
same, except that the external references it makes are resolved to
different entities.

If in one program we have an observable behavior which implies that a
call took place (that itself not being directly observable, by
definition, I again acknowledge) then under the same conditions in
another program, that call also has to take place, by the fact that the
translation unit has not changed.

> I can use a test harness to observe whether a program uses an add or inc
> instruction to evaluate `i++` (assuming the CPU has both instructions).
> The standard doesn't care how the increment happens, as long as the
> result is correct. It doesn't care *whether* the increment happens
> unless the result affects the programs *observable behavior*.

If i is an object with external linkage defined outside of some
tranlation unit and some function in the translation unit
unconditionally increments i (without further using its value), then
that has to happen, even in a program in which nothing else uses i.

By this blackbox method I'm describing, no, we cannot confirm whether
it's by an inc instruction or whatever. Just, does it happen.

In one test program we can tie that to observable behavior, like
printing the value of i before and after calling that function.

Though the increment isn't observable behavior (unless i is volatile?),
since it has been confirmed that the translation unit does that, it does
that.

> What in the description of translation phases 7 and 8 makes
> behavior-preserving optimizations valid in phase 7 and forbidden in
> phase 8? (Again, insert weasel words about unspecified behavior.)

That translation phase 7 is described as completing semantic analysis,
resulting in a translated unit which may be retained. (Moreover,
analysis of a single unit, not multiple.) and that 8 is described
as only resolving references and linking.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<utkphe$34l73$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36093&group=comp.lang.c#36093

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 21:26:22 +0100
Organization: A noiseless patient Spider
Lines: 478
Message-ID: <utkphe$34l73$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <utkftr$32ahu$1@dont-email.me>
<20240322105321.365@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 20:26:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ce13a9b3027441d1758690182a14dc57";
logging-data="3298531"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19YogSSpriwIG/wIjTATmWqMCioW5EChDU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:g21nXuuKnSDcyIaTokkOU7Su+r0=
In-Reply-To: <20240322105321.365@kylheku.com>
Content-Language: en-GB
 by: David Brown - Fri, 22 Mar 2024 20:26 UTC

On 22/03/2024 19:55, Kaz Kylheku wrote:
> On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
>> You should read the footnotes to 5.1.1.2 "Translation phases".
>> Footnotes are not normative, but they are helpful in explaining the
>> meaning of the text. They note that compilers don't have to follow the
>> details of the translation phases, and that source files, translation
>> units, and translated translation units don't have to have one-to-one
>> correspondences.
>
> Yes, I'm aware of that. For instance preprocessing can all be jumbled
> into one process. But it has to produce that result.
>
> Even if translation phases 7 and 8 are combined, the semantic analysis
> of the individual translation unit has to appear to be settled before
> linkage. So for instance a translation unit could incrementally emerge
> from the semantic analysis steps, and those parts of it already analyzed
> (phase 7) could start to be linked to other translation units (phase 8).
>

Again, you are inferring far too much here. The standard is /not/
limiting like this.

Compilers can make use of all sorts of additional information. They
have always been able to do so. They can use extra information provided
by compiler extensions - such as gcc attributes. They can use
information from profiling to optimise based on real-world usage. They
can analyse source code files and use that analysis for optimisation
(and hopefully also static error checking).

Consider this:

A compiler can happily analyse each source code file in all kinds of
ways, completely independently of what the C standards (or perhaps, by
happy coincidence, using the same types of pre-processing and
interpretation). This analysis can be stored in files or some other
storage place. Do you agree that this is allowed, or do you think the C
standards somehow ban it? Note that we are calling this "analysis" -
not C compilation.

Now the compiler starts the "real" compilation, passing through the
translation phases one by one. When it gets to phase 7, it reads all
this stored analysis information. (Nothing in the standards says the
compiler can't pull in extra information - it is quite normal, for
example, to pull in code snippets as part of the compilation process.)
For each translation unit, it produces two outputs (in one "fat" object
file) - one part is a relatively dumb translation that does not make use
of the analysis, the other uses the analysis information to generate
more optimal code. Both parts make up the "translator output" for the
translation unit. Again, can you point to anything in the C standards
that would forbid this?

Then we come to phase 8. The compiler (or linker) reads all the
"translator output" files needed for the complete program. It checks
that it has the same set of input files as were used during the
pre-compilation analysis. If they are all the same, then the analysis
information about the different units is valid, and thus the
optimisations using that extra information are valid. The "dumb
translation" versions can be used as a fallback if the analysis was not
valid - otherwise they are thrown out, and the more optimised versions
are linked together.

There is nothing in the description of the translation phases that
hinders this. All the compiler has to do is ensure that the final
program - not any individual translation units - has correct observable
behaviour.

I would also refer you to section 1 of the C standards - "Scope". In
particular, note that "This document does /not/ specify the mechanism by
which C programs are transformed for use by a data-processing system".
(Emphasis mine.) The workings of the compiler are not part of the standard.

> I'm just saying that certain information leakage is clearly permitted,
> regardless of how the phases are integrated.
>
>> The standard also does not say what the output of "translation" is - it
>> does not have to be assembly or machine code. It can happily be an
>> internal format, as used by gcc and clang/llvm. It does not define what
>> "linking" is, or how the translated translation units are "collected
>> into a program image" - combining the partially compiled units,
>> optimising, and then generating a program image is well within that
>> definition.
>>
>>> (That can be inferred
>>> from the rules which forbid semantic analysis across translation
>>> units, only linkage.)
>>
>> The rules do not forbid semantic analysis across translation units -
>> they merely do not /require/ it. You are making an inference without
>> any justification that I can see.
>
> Translation phase 7 is clearly about a single translation unit in
> isolation:
>
> "The resulting tokens are syntactically and semantically analyzed
> and translated as a translation unit."
>
> Not: "as a combination of multiple translation uints".

The point is that many things are local to a translation unit, such as
statics, type definitions, and so on. These are valid within the
translation unit (within their scope, of course), and independent of
identically named items in other translation units. It is about
defining a kind of "unit of compilation" for the language semantics - it
is /not/ restricting the behaviour of a compiler.

LTO does not change the language semantics in any way. The language
semantics determine the observable behaviour of the program, and we have
already established that this must be unchanged. Generated instructions
for a target are not part of the language semantics.

>
> 5.1.1.1 clearly refers to "[t]he separate translation units of a
> program".

It does so all in terms of what a compiler /may/ do.

And there is never any specification of the result of a "translation".
It can happily be byte-code, or internal toolchain-specific formats.

>
> LTO pretends that the program is still divided into the same translation
> units, while minging them together in ways contrary to all those
> chapter 5 descriptions.

No.

>
> The conforming way to obtain LTO is to actually combine multiple
> preprocessing translation units into one.
>

You could do that if you like (after manipulating things to handle
statics, type definitions, etc.).

And you would then find that if "foo()" in "foo.c" called "bar()" in
"bar.c", the call to "bar()" might be inlined, or omitted, or otherwise
optimised, just as it could be if they were both defined in the same
translation unit.

The result would be the same kind of object code as you get with LTO -
one in which the observable behaviour is as expected, but you might get
different details in the generated code.

I don't know why you would think that this kind of combination of units
is conforming, but LTO is not. It's all the same thing in principle -
the only difference is that real-world implementations of LTO are
designed to be scalable, do as much as possible in parallel, and avoid
re-doing work for files that don't change.

Some link-time optimisation or "whole program optimisation" toolchains
are aimed at small code bases (such as might fit into a small
microcontroller) and combine all the source code together then handle it
all at once. Again, the principles and the semantics are not any
different from gcc LTO - it's just a different way of splitting up the work.

>>> That's why we can have a real world security issue caused by zeroing
>>> being optimized away.
>>
>> No, it is not. We have real-world security issues for all sorts of
>> reasons, including people mistakenly thinking they can force particular
>> types of code generation by calling functions in different source files.
>
> In fact, that code generation is forced, when people do not use LTO,
> which is not enabled by default.
>

No, it is not.

The C standards don't talk about LTO, or whether or not it is enabled,
or what is "default", or even what kind of code generation you get.

If the compiler knows that a function call will not have or affect
observable behaviour, it can omit that call. It does not matter how it
knows this. LTO is a very practical way to get this information, but it
might not be the only way. Profile-guided optimisation information may
provide the same information. So could attributes given in the function
declaration (and a future C standard will likely support such attributes).

But if the compiler doesn't know for sure that it is safe to omit the
call, then it must generate it. Correctness trumps optimisation!

>>> The rules spelled out in ISO C allow us to unit test a translation
>>> unit by linking it to some harness, and be sure it has exactly the
>>> same behaviors when linked to the production program.
>>
>> No, they don't.
>>
>> If the unit you are testing calls something outside that unit, you may
>> get different behaviours when testing and when used in production.
>
> Yes; if you do nonconforming things.


Click here to read the complete article
Re: A Famous Security Bug

<utktul$35ng8$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36094&group=comp.lang.c#36094

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 21:41:43 +0000
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <utktul$35ng8$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 21:41:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a033777650dc9b21c103a31b20734e74";
logging-data="3333640"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18nqw3cIGS0nUudc1IAenUy"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:v++5ShG6pbWrAXF6iFRzpbuH1aY=
In-Reply-To: <utkea9$31sr2$1@dont-email.me>
Content-Language: en-GB
 by: bart - Fri, 22 Mar 2024 21:41 UTC

On 22/03/2024 17:14, James Kuyper wrote:
> On 3/21/24 14:13, Anton Shepelev wrote:
> ...
>> I think this behavior (of a C compiler) rather stupid. In a
>> low-level imperative language, the compiled program shall
>> do whatever the programmer commands it to do.
>
> C is NOT that low a level of language. The standard explicitly allows
> implementations to use any method they find convenient to produce
> observable behavior which is consistent with the requirements of the
> standard. Despite describing how that behavior might be produced by the
> abstract machine, it explicitly allows an implementation to achieve that
> behavior by other means.
>
> If you want to tell a system not only what a program must do, but also
> how it must do it, you need to use a lower-level language than C.

Which one?

I don't think anyone seriously wants to switch to assembly for the sort
of tasks they want to use C for.

I agree with AS that a program should do what it's told by the
programmer and the compiler should not get too smart.

When /I/ implement such a language, then that's pretty much what happens.

However, people also expect a reasonable amount of optimisation, which
can involve take some short-cuts or not doing precisely what the
programmer wrote, in the detail.

So the line isn't clearly defined as to what is or isn't acceptable.

But in this example where somebody has clearly requested an object to be
zeroed, ignoring that instruction has crossed the line to unacceptable IMO.

Re: A Famous Security Bug

<875xxdzvxj.fsf@nosuchdomain.example.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36097&group=comp.lang.c#36097

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 16:30:32 -0700
Organization: None to speak of
Lines: 30
Message-ID: <875xxdzvxj.fsf@nosuchdomain.example.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="82e56ab6d287cd951989ac0885bf2be2";
logging-data="3381784"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Suya+NOLlLmOJQgVITmv7"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:lLTx6xJMcGuzGL/5OAr8jjzZ/iY=
sha1:hACIEongP4rDHdenbJzxS7sgMqE=
 by: Keith Thompson - Fri, 22 Mar 2024 23:30 UTC

bart <bc@freeuk.com> writes:
> On 22/03/2024 17:14, James Kuyper wrote:
[...]
>> If you want to tell a system not only what a program must do, but
>> also how it must do it, you need to use a lower-level language than
>> C.
>
> Which one?

Good question.

> I don't think anyone seriously wants to switch to assembly for the
> sort of tasks they want to use C for.

Agreed. What some people seem to be looking for is a language that's
about as portable as C, but where every language construct is required
to result in generated code that performs the specified operation.
There's a lot of handwaving in that description. "C without
optimization", maybe?

I'm not aware that any such language exists, at least in the mainstream
(and I've looked at a *lot* of programming languages). I conclude that
there just isn't enough demand for that kind of thing.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: A Famous Security Bug

<20240322170425.543@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36099&group=comp.lang.c#36099

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 00:09:46 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <20240322170425.543@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<875xxdzvxj.fsf@nosuchdomain.example.com>
Injection-Date: Sat, 23 Mar 2024 00:09:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bc8ead67574eda43cc8acb80cc4a36a2";
logging-data="3396061"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/R4qB7/NrgkBr6hsaGt1q/fudUZ4tNvNs="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:Rfn5gtReileWhhBWWsiLgv5XB8s=
 by: Kaz Kylheku - Sat, 23 Mar 2024 00:09 UTC

On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> bart <bc@freeuk.com> writes:
>> On 22/03/2024 17:14, James Kuyper wrote:
> [...]
>>> If you want to tell a system not only what a program must do, but
>>> also how it must do it, you need to use a lower-level language than
>>> C.
>>
>> Which one?
>
> Good question.
>
>> I don't think anyone seriously wants to switch to assembly for the
>> sort of tasks they want to use C for.
>
> Agreed. What some people seem to be looking for is a language that's
> about as portable as C, but where every language construct is required
> to result in generated code that performs the specified operation.
> There's a lot of handwaving in that description. "C without
> optimization", maybe?
>
> I'm not aware that any such language exists, at least in the mainstream
> (and I've looked at a *lot* of programming languages). I conclude that
> there just isn't enough demand for that kind of thing.

I think you can more or less get something like that with the following
strategy:

- all memory accesses through pointers are performed as written.
- local variables are aggressively optimized into registers.
- basic optimizations:
- constant folding, dead code elimination.
- basic control flow ones: jump threading and the like.
- basic data flow optimizations.
- peephole, good instruction selection.

In that environment, the way the programmer writes the code is the rest
of the optimization. Want loop unrolling? Write it yourself.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<utm06k$3glqc$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36108&group=comp.lang.c#36108

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 03:26:11 -0400
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <utm06k$3glqc$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Mar 2024 07:26:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f103032d536191836a86a1ef17ad2258";
logging-data="3692364"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+BMiRh6PoVfVlBOwR328mWrGm6JNLClGA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:TKl65HCb8NSFgQxxhO9xfHyZ+wY=
In-Reply-To: <utktul$35ng8$1@dont-email.me>
Content-Language: en-US
 by: James Kuyper - Sat, 23 Mar 2024 07:26 UTC

bart <bc@freeuk.com> writes:
> On 22/03/2024 17:14, James Kuyper wrote:
[...]
>> If you want to tell a system not only what a program must do, but
>> also how it must do it, you need to use a lower-level language than
>> C.
>
> Which one?

That's up to you. The point is, C is NOT that language.

> I don't think anyone seriously wants to switch to assembly for the
> sort of tasks they want to use C for.

Why not? Assembly provides the kind of control you're looking for; C
does not. If that kind of control is important to you, you have to find
a language which provides it. If not assembler or C, what would you use?

Re: A Famous Security Bug

<wwva5mpwbh0.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=36111&group=comp.lang.c#36111

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: invalid@invalid.invalid (Richard Kettlewell)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 09:20:43 +0000
Organization: terraraq NNTP server
Message-ID: <wwva5mpwbh0.fsf@LkoBDZeT.terraraq.uk>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me>
<20240322083037.20@kylheku.com> <utkgd2$32aj7$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="22858"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:P/OnGPUI0bm4/aCKEzTz4bCH3tk=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Sat, 23 Mar 2024 09:20 UTC

David Brown <david.brown@hesbynett.no> writes:
> I have tried to explain the reality of what the C standards say in a
> couple of posts (including one that I had not posted before you wrote
> this one). I have tried to make things as clear as possible, and
> hopefully you will see the point.
>
> If not, then you must accept that you interpret the C standards in a
> different manner from the main compile vendors, as well as some "big
> names" in this group. That is, of course, not proof in itself - but
> you must realise that for practical purposes you need to be aware of
> how others interpret the standard, both for your own coding and for
> the advice or recommendations you give to others.

Agreed that the ship has sailed on whether LTO is a valid optimization.
But it’s understandable why someone might reach a different conclusion.

- Phase 7 says the tokens are “semantically analyzed and translated as a
translation unit”.

- Phase 8 does not use either verb, “analyzed” or “translated”.

- At least two steps (in the abstract, as-if model) are explicitly
happening in the “as a translation unit” level but not in any wider
context.

- The result of those two steps (“translator output”) is than
“collected”.

- Unless you somehow understand that “collected” implicitly includes
further analysis and translation, it’s does not seem unnatural to
conclude that many of the whole-program optimizations done by LTO
implementations would be outside the spec.

This would be very easy to address, by replacing “collected” with a word
or phrase that makes clear that further analysis and translation can
happen outside the “as a translation unit” context.

Obviously this would violate the principle from the rationale that
existing code (that uses TU boundaries to get memset to “work”) is
important and existing implementations (LTO) are not, but C
standardization has never actually behaved as if that is true anyway.

--
https://www.greenend.org.uk/rjk/


devel / comp.lang.c / Re: A Famous Security Bug

Pages:123456
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor