Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Single tasking: Just Say No.


devel / comp.lang.misc / Re: Code gen - calling sequences

SubjectAuthor
* Code gen - calling sequencesJames Harris
+* Re: Code gen - calling sequencesBart
|`* Re: Code gen - calling sequencesDavid Brown
| +* Re: Code gen - calling sequencesBart
| |`* Re: Code gen - calling sequencesDavid Brown
| | +- Re: Code gen - calling sequencesBart
| | `- Re: Code gen - calling sequencesRod Pemberton
| `* Re: Code gen - calling sequencesJames Harris
|  `* Re: Code gen - calling sequencesDavid Brown
|   +* Re: Code gen - calling sequencesDmitry A. Kazakov
|   |`* Re: Code gen - calling sequencesDavid Brown
|   | +* Re: Code gen - calling sequencesDmitry A. Kazakov
|   | |`* Re: Code gen - calling sequencesDavid Brown
|   | | +* Re: Code gen - calling sequencesDmitry A. Kazakov
|   | | |`* Re: Code gen - calling sequencesDavid Brown
|   | | | `* Re: Code gen - calling sequencesDmitry A. Kazakov
|   | | |  `* Re: Code gen - calling sequencesDavid Brown
|   | | |   `* Re: Code gen - calling sequencesDmitry A. Kazakov
|   | | |    `* Re: Code gen - calling sequencesDavid Brown
|   | | |     `- Re: Code gen - calling sequencesantispam
|   | | +- Re: Code gen - calling sequencesBart
|   | | +* Re: Code gen - calling sequencesJames Harris
|   | | |`* Re: Code gen - calling sequencesDavid Brown
|   | | | +* Re: Code gen - calling sequencesJames Harris
|   | | | |`* Re: Code gen - calling sequencesBart
|   | | | | `* Re: Code gen - calling sequencesJames Harris
|   | | | |  `- Re: Code gen - calling sequencesBart
|   | | | `* Re: Code gen - calling sequencesRod Pemberton
|   | | |  `* Re: Code gen - calling sequencesJames Harris
|   | | |   +- Re: Code gen - calling sequencesDmitry A. Kazakov
|   | | |   `- Re: Code gen - calling sequencesDavid Brown
|   | | `- Re: Code gen - calling sequencesJames Harris
|   | `* Re: Code gen - calling sequencesJames Harris
|   |  `* Re: Code gen - calling sequencesDavid Brown
|   |   `- Re: Code gen - calling sequencesJames Harris
|   `* Re: Code gen - calling sequencesJames Harris
|    `* Re: Code gen - calling sequencesDavid Brown
|     `- Re: Code gen - calling sequencesJames Harris
`* Re: Code gen - calling sequencesRod Pemberton
 `* Re: Code gen - calling sequencesJames Harris
  +* Re: Code gen - calling sequencesBart
  |+* Re: Code gen - calling sequencesJames Harris
  ||+* Re: Code gen - calling sequencesBart
  |||`- Re: Code gen - calling sequencesBart
  ||`- Re: Code gen - calling sequencesDavid Brown
  |`* Re: Code gen - calling sequencesDavid Brown
  | +* Re: Code gen - calling sequencesAndy Walker
  | |`- Re: Code gen - calling sequencesDavid Brown
  | `* Re: Code gen - calling sequencesBart
  |  +* Re: Code gen - calling sequencesDavid Brown
  |  |+* Re: Code gen - calling sequencesDmitry A. Kazakov
  |  ||+* Re: Code gen - calling sequencesBart
  |  |||`* Re: Code gen - calling sequencesDmitry A. Kazakov
  |  ||| `* Re: Code gen - calling sequencesBart
  |  |||  `* Re: Code gen - calling sequencesDmitry A. Kazakov
  |  |||   `* Re: Code gen - calling sequencesBart
  |  |||    `- Re: Code gen - calling sequencesDmitry A. Kazakov
  |  ||`- Re: Code gen - calling sequencesDavid Brown
  |  |`- Re: Code gen - calling sequencesJames Harris
  |  `* Re: Code gen - calling sequencesantispam
  |   `* Re: Code gen - calling sequencesBart
  |    `* Re: Code gen - calling sequencesantispam
  |     `- Re: Code gen - calling sequencesBart
  `- Re: Code gen - calling sequencesRod Pemberton

Pages:123
Re: Code gen - calling sequences

<sgg0ol$che$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=596&group=comp.lang.misc#596

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (Bart)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Sun, 29 Aug 2021 14:06:27 +0100
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <sgg0ol$che$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org>
<sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me>
<sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me>
<sgfu91$qdg$1@z-news.wcss.wroc.pl>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 29 Aug 2021 13:06:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7fc5023b536d3ef58b0d98a2b34c015f";
logging-data="12846"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19C4y+Ej+Gy90eyRccht4IcvGWA5Jaa23Q="
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:eAhyyqC6IqN0/o/y8JVulhmChYA=
In-Reply-To: <sgfu91$qdg$1@z-news.wcss.wroc.pl>
X-Antivirus-Status: Clean
Content-Language: en-GB
X-Antivirus: AVG (VPS 210829-4, 29/8/2021), Outbound message
 by: Bart - Sun, 29 Aug 2021 13:06 UTC

On 29/08/2021 13:24, antispam@math.uni.wroc.pl wrote:
> Bart <bc@freeuk.com> wrote:

>> A rule of thumb I've sometimes observed is that, for x64 anyway, 1 line
>> of source code maps to about 10 bytes of binary machine code.
>
> Depends on the language. For C it may be lower, for some other
> languages much higher.
>
>> So 10 million lines of code represents a single 100MB program,
>> approximately.
>
> I work on a program when executable is 64 M. However, significant
> part of executable code is in loadable modules that take another
> 64 M. Guess how big is the source?

By my metric it would be about 6M lines of source code, if most of the
64KB was executable x64 code (rather than initialised data, embedded
data files, or other exe overheads).

That assumes a certain proportion of declaration lines to lines of
executable code.

Now you're going to tell me it's either a lot fewer or a lot more.

If the language is C, then I guess that could be anything: you can have
macros that expand to many times there size, and instantiated at
multiple sites; include files that can do the same trick. Or lot of
boilerplate code that reduces to nothing.

Or there is lots of inlining that pushes the size the other way again.

>> And it might be faster than you think: on a decent machine, unoptimised
>> code (or mildly optimised like mine) can probably be generated at
>> 5-10MB/second, using a single core. So there is plenty of capacity to do
>> interprocedural optimisation without it taking forever.
>
> Well, there is also issue of memory size. SmartEiffel used (uses???)
> whole-program optimization and compiled very fast. But for really
> large program it used to run out of memory. I am not sure if this is
> still problem on modern machines, but resonable estimate is that keeping
> all needed info in memory you may need 1000 times of memory as for source.
> So you need to carefully optimize space use...

3 compilers of mine I've just tested use memory equivalent to 15x (C
compiler), 20x (Interpreter), and 80x (my systems language) the source size.

But they all use persistent data structures, especially the last which
creates arrays of tokens, a bad idea I've since dropped. All those
include the source itself.

All the memory is recovered on program termination. If it becomes an
issue, then unneeded data structured can be destroyed earlier.

But if we say 40x source size, then capacity of 8GB means /currently/
being able to deal with source code of something over 10M lines,
depending one code density.

It just means being more resourceful, and reintroducing long-forgotten
techniques of working with memory-limited hardware.

ATM, 10M lines is 200 times the size of my typical projects.

Re: Code gen - calling sequences

<sgg13d$hck$2@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=598&group=comp.lang.misc#598

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!Hx95GBhnJb0Xc8StPhH8AA.user.46.165.242.91.POSTED!not-for-mail
From: mailbox@dmitry-kazakov.de (Dmitry A. Kazakov)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Sun, 29 Aug 2021 15:12:13 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sgg13d$hck$2@gioia.aioe.org>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org>
<sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me>
<sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me>
<sgfkej$qhf$1@dont-email.me> <sgfopa$uon$1@gioia.aioe.org>
<sgfpjv$rfn$1@dont-email.me> <sgfrbk$4b1$1@gioia.aioe.org>
<sgfvpk$5qt$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="17812"; posting-host="Hx95GBhnJb0Xc8StPhH8AA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Dmitry A. Kazakov - Sun, 29 Aug 2021 13:12 UTC

On 2021-08-29 14:49, Bart wrote:
> On 29/08/2021 12:34, Dmitry A. Kazakov wrote:
>> On 2021-08-29 13:04, Bart wrote:
>>
>>> BTW what peripheral device needs 200MB of code?
>>
>> Modern protocols are extremely complicated as well as the end devices.
>> Consider a radiator thermostat. It is a very simple device. Yet it has
>> hundred parameters, a dozen of modes, a weekly schedule you must be
>> able to query and program. So you can imagine the complexity of its
>> protocol. If you are very lucky that would be a vendor-specific
>> protocol. If it is a "standard" protocol you are in a deep trouble.
>> The standard protocols are gigantic piles of cra*p. You can take a
>> look on AMQP or any of ASN.1 based protocols  to get an impression.
>> ASN.1 description of certificate files is almost comical, if you do
>> not need to implement it.
>>
>> Worse, you could not throw the useless stuff out, because you must
>> certify your implementation of the protocol.
>>
>> On top of that come configuration stuff you must address in the GUI,
>> in the persistent storage. The on-line data you have to handle and log
>> and so on. Procedures to replace defective device, flash the device's
>> firmware.
>>
>> Then you have not just one device, you have an array of, e.g. several
>> radiator thermostats and a dozen of other device types, e.g. shutter
>> contacts, wall panels, sensors etc.
>>
>
> By my measure, 200MB would equate to (very roughly) 20M lines of code

You must count the language run-time and other system libraries. E.g.
libc is 1.6MB, SQLite3 is 1.3MB, GTK is about 25MB and so on.

> Exactly how complicated is that thermostat again? How tall was the pile
> of documents that constitutes the datasheet?

Datasheet has nothing to do with technical documentation. Typically, if
exists, it is many thousands of pages.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Re: Code gen - calling sequences

<sgg2km$pji$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=600&group=comp.lang.misc#600

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (Bart)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Sun, 29 Aug 2021 14:38:28 +0100
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <sgg2km$pji$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org>
<sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me>
<sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me>
<sgfkej$qhf$1@dont-email.me> <sgfopa$uon$1@gioia.aioe.org>
<sgfpjv$rfn$1@dont-email.me> <sgfrbk$4b1$1@gioia.aioe.org>
<sgfvpk$5qt$1@dont-email.me> <sgg13d$hck$2@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 29 Aug 2021 13:38:30 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7fc5023b536d3ef58b0d98a2b34c015f";
logging-data="26226"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX188q4z1rXo9Ur1vrL51ChLxzCtCCGGPUjc="
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:W7WsHXbhFXT2KDU5i7W4qlAGSdU=
In-Reply-To: <sgg13d$hck$2@gioia.aioe.org>
X-Antivirus-Status: Clean
Content-Language: en-GB
X-Antivirus: AVG (VPS 210829-4, 29/8/2021), Outbound message
 by: Bart - Sun, 29 Aug 2021 13:38 UTC

On 29/08/2021 14:12, Dmitry A. Kazakov wrote:
> On 2021-08-29 14:49, Bart wrote:
>> On 29/08/2021 12:34, Dmitry A. Kazakov wrote:
>>> On 2021-08-29 13:04, Bart wrote:
>>>
>>>> BTW what peripheral device needs 200MB of code?
>>>
>>> Modern protocols are extremely complicated as well as the end
>>> devices. Consider a radiator thermostat. It is a very simple device.
>>> Yet it has hundred parameters, a dozen of modes, a weekly schedule
>>> you must be able to query and program. So you can imagine the
>>> complexity of its protocol. If you are very lucky that would be a
>>> vendor-specific protocol. If it is a "standard" protocol you are in a
>>> deep trouble. The standard protocols are gigantic piles of cra*p. You
>>> can take a look on AMQP or any of ASN.1 based protocols  to get an
>>> impression. ASN.1 description of certificate files is almost comical,
>>> if you do not need to implement it.
>>>
>>> Worse, you could not throw the useless stuff out, because you must
>>> certify your implementation of the protocol.
>>>
>>> On top of that come configuration stuff you must address in the GUI,
>>> in the persistent storage. The on-line data you have to handle and
>>> log and so on. Procedures to replace defective device, flash the
>>> device's firmware.
>>>
>>> Then you have not just one device, you have an array of, e.g. several
>>> radiator thermostats and a dozen of other device types, e.g. shutter
>>> contacts, wall panels, sensors etc.
>>>
>>
>> By my measure, 200MB would equate to (very roughly) 20M lines of code
>
> You must count the language run-time and other system libraries. E.g.
> libc is 1.6MB, SQLite3 is 1.3MB, GTK is about 25MB and so on.

GTK would be statically linked into an application (which I thought you
said was to do with peripherals)?

That doesn't make any sense. So if 50 apps all needed GTK, each would
carry their own copies. And if several are running at the same time,
there will be multiple copies of the code in memory.

(I've just downloaded the GTK runtime, which was rather elusive to find.

There are about 100 DLLs totalling 55MB, out of a total installation of
9000 files. So even if statically incorporated into an application, it
would still need a home directory with all the other junk.

However, suppose 50MB of that 200MB /was/ GTK. It seems GTK itself
already is logically divided into dozens of separate libraries.

This is the point I made some posts ago.

Re: Code gen - calling sequences

<sgg3oi$1ph0$1@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=601&group=comp.lang.misc#601

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!news.niel.me!aioe.org!Hx95GBhnJb0Xc8StPhH8AA.user.46.165.242.91.POSTED!not-for-mail
From: mailbox@dmitry-kazakov.de (Dmitry A. Kazakov)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Sun, 29 Aug 2021 15:57:38 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sgg3oi$1ph0$1@gioia.aioe.org>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org>
<sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me>
<sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me>
<sgfkej$qhf$1@dont-email.me> <sgfopa$uon$1@gioia.aioe.org>
<sgfpjv$rfn$1@dont-email.me> <sgfrbk$4b1$1@gioia.aioe.org>
<sgfvpk$5qt$1@dont-email.me> <sgg13d$hck$2@gioia.aioe.org>
<sgg2km$pji$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="58912"; posting-host="Hx95GBhnJb0Xc8StPhH8AA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Dmitry A. Kazakov - Sun, 29 Aug 2021 13:57 UTC

On 2021-08-29 15:38, Bart wrote:
> On 29/08/2021 14:12, Dmitry A. Kazakov wrote:
>> On 2021-08-29 14:49, Bart wrote:
>>> On 29/08/2021 12:34, Dmitry A. Kazakov wrote:
>>>> On 2021-08-29 13:04, Bart wrote:
>>>>
>>>>> BTW what peripheral device needs 200MB of code?
>>>>
>>>> Modern protocols are extremely complicated as well as the end
>>>> devices. Consider a radiator thermostat. It is a very simple device.
>>>> Yet it has hundred parameters, a dozen of modes, a weekly schedule
>>>> you must be able to query and program. So you can imagine the
>>>> complexity of its protocol. If you are very lucky that would be a
>>>> vendor-specific protocol. If it is a "standard" protocol you are in
>>>> a deep trouble. The standard protocols are gigantic piles of cra*p.
>>>> You can take a look on AMQP or any of ASN.1 based protocols  to get
>>>> an impression. ASN.1 description of certificate files is almost
>>>> comical, if you do not need to implement it.
>>>>
>>>> Worse, you could not throw the useless stuff out, because you must
>>>> certify your implementation of the protocol.
>>>>
>>>> On top of that come configuration stuff you must address in the GUI,
>>>> in the persistent storage. The on-line data you have to handle and
>>>> log and so on. Procedures to replace defective device, flash the
>>>> device's firmware.
>>>>
>>>> Then you have not just one device, you have an array of, e.g.
>>>> several radiator thermostats and a dozen of other device types, e.g.
>>>> shutter contacts, wall panels, sensors etc.
>>>>
>>>
>>> By my measure, 200MB would equate to (very roughly) 20M lines of code
>>
>> You must count the language run-time and other system libraries. E.g.
>> libc is 1.6MB, SQLite3 is 1.3MB, GTK is about 25MB and so on.
>
> GTK would be statically linked into an application (which I thought you
> said was to do with peripherals)?

GTK cannot be linked statically.

> That doesn't make any sense. So if 50 apps all needed GTK, each would
> carry their own copies. And if several are running at the same time,
> there will be multiple copies of the code in memory.

You run 50 GUIs at a time? But no, GTK is linked dynamically due to some
licensing decisions, I believe. I do not remember.

> However, suppose 50MB of that 200MB /was/ GTK.

No it is not only GTK. It was an example that 200MB is very modest
assuming the number of protocols a typical application uses. Each
protocol comes with several libraries each of them might be 1MB or so.
And as I said on top of that there are layers of application code
necessary to run the protocol stack, to configure, to store/restore
configurations, to visualize etc.

It seems that you think that a typical application reads from the
keyboard and prints on printer. It is not so, many decades, actually.

> It seems GTK itself
> already is logically divided into dozens of separate libraries.

Yes, it is.

> This is the point I made some posts ago.

Maybe. My comment was that 200MB of code is not that much.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Re: Code gen - calling sequences

<sgg739$pje$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=608&group=comp.lang.misc#608

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Sun, 29 Aug 2021 16:54:32 +0200
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <sgg739$pje$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org>
<sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me>
<sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me>
<sgfkej$qhf$1@dont-email.me> <sgfopa$uon$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 29 Aug 2021 14:54:33 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4d2087e7145ef416b70be102073fa784";
logging-data="26222"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19RIsKN/txibqkqADTBHaqZ41lGaV2v/HI="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:RKNcJFUX62BhWl4u0NNsvjE0mRE=
In-Reply-To: <sgfopa$uon$1@gioia.aioe.org>
Content-Language: en-GB
 by: David Brown - Sun, 29 Aug 2021 14:54 UTC

On 29/08/2021 12:50, Dmitry A. Kazakov wrote:
> On 2021-08-29 11:36, David Brown wrote:
>> On 29/08/2021 02:01, Bart wrote:
>
>>> So 10 million lines of code represents a single 100MB program,
>>> approximately.
>>
>> The biggest single executable I see on my machine (without digging too
>> hard) is 25 MB.  I have also found a shared library at 125 MB.
>
> If you use GCC and generic instances put in a shared library, you easily
> come to such numbers. GCC generates lots of stuff.
>
> Funny thing, you cannot even build some of such shared libraries under
> Windows because the number of exported symbols easily exceeds 2**16-1
> (Windows limit). You must split the library into parts...
>

I didn't know of that limit. I did know that Windows was still limited
by its 16-bit ancestry, but not that specific one.

>> I also did not mean to imply that these big builds result in a single
>> binary - they are often split into multiple "shared" libraries.  (I put
>> "shared" in quotations, because the libraries are typically dedicated to
>> the program rather than shared by other applications.)  This can be
>> convenient during development, building and testing.
>
> 100-200MB is a medium-sized production application: peripheral devices,
> HTTP server, database, cloud connectivity, user management, things start
> to explode quickly.
>

Re: Code gen - calling sequences

<sggc23$9mn$1@z-news.wcss.wroc.pl>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=611&group=comp.lang.misc#611

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!newsfeed.neostrada.pl!unt-exc-02.news.neostrada.pl!newsfeed.pionier.net.pl!pwr.wroc.pl!news.wcss.wroc.pl!not-for-mail
From: antispam@math.uni.wroc.pl
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Sun, 29 Aug 2021 16:19:15 +0000 (UTC)
Organization: Politechnika Wroclawska
Lines: 90
Message-ID: <sggc23$9mn$1@z-news.wcss.wroc.pl>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org> <sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me> <sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me> <sgfu91$qdg$1@z-news.wcss.wroc.pl> <sgg0ol$che$1@dont-email.me>
NNTP-Posting-Host: hera.math.uni.wroc.pl
X-Trace: z-news.wcss.wroc.pl 1630253955 9943 156.17.86.1 (29 Aug 2021 16:19:15 GMT)
X-Complaints-To: abuse@news.pwr.wroc.pl
NNTP-Posting-Date: Sun, 29 Aug 2021 16:19:15 +0000 (UTC)
Cancel-Lock: sha1:eUTs/0pHJLXaji3ubxOGw8kYbd0=
User-Agent: tin/2.4.3-20181224 ("Glen Mhor") (UNIX) (Linux/4.19.0-10-amd64 (x86_64))
X-Received-Bytes: 5240
 by: antispam@math.uni.wroc.pl - Sun, 29 Aug 2021 16:19 UTC

Bart <bc@freeuk.com> wrote:
> On 29/08/2021 13:24, antispam@math.uni.wroc.pl wrote:
> > Bart <bc@freeuk.com> wrote:
>
> >> A rule of thumb I've sometimes observed is that, for x64 anyway, 1 line
> >> of source code maps to about 10 bytes of binary machine code.
> >
> > Depends on the language. For C it may be lower, for some other
> > languages much higher.
> >
> >> So 10 million lines of code represents a single 100MB program,
> >> approximately.
> >
> > I work on a program when executable is 64 M. However, significant
> > part of executable code is in loadable modules that take another
> > 64 M. Guess how big is the source?
>
> By my metric it would be about 6M lines of source code, if most of the
> 64KB was executable x64 code (rather than initialised data, embedded
> data files, or other exe overheads).
>
> That assumes a certain proportion of declaration lines to lines of
> executable code.
>
> Now you're going to tell me it's either a lot fewer or a lot more.

40 M in executable is "statically" linked code from outside, probably
corresponding to 0.5M lines os source. 24 M corresponds to about 80 K
lines. 64 M in loadable modules corresponds to 210 K lines (actual
code lines is closer to 120 K, rest is comments and empty lines).

It is hard to distinguish between executable code and data. Due
to semantics initialized data needs executable code to perform
initialization. There are dispatch tables, all data and code is
tagged (has identifying headers). There is runtime type info.
OTOH, there is lot of code due to compiler aggressivly optimizing
for speed at cost of code size. There is exception handling code
inserted by compiler.
> If the language is C, then I guess that could be anything: you can have
> macros that expand to many times there size, and instantiated at
> multiple sites; include files that can do the same trick. Or lot of
> boilerplate code that reduces to nothing.
>
> Or there is lots of inlining that pushes the size the other way again.

Compiler may compile the same code multiple times, each time with
different assumptions about type (effectively producing several
specialized variants from the same code).

> > Well, there is also issue of memory size. SmartEiffel used (uses???)
> > whole-program optimization and compiled very fast. But for really
> > large program it used to run out of memory. I am not sure if this is
> > still problem on modern machines, but resonable estimate is that keeping
> > all needed info in memory you may need 1000 times of memory as for source.
> > So you need to carefully optimize space use...
>
> 3 compilers of mine I've just tested use memory equivalent to 15x (C
> compiler), 20x (Interpreter), and 80x (my systems language) the source size.
>
> But they all use persistent data structures, especially the last which
> creates arrays of tokens, a bad idea I've since dropped. All those
> include the source itself.

ATM I have to keep parse tree of large part of program in memory.
The parse tree is about 8 times larger than corresponding source.
Representation of parse tree is unoptimized and in principle
packed representation could be smaller. OTOH this is just parse
tree, without any extra data like types or source locations.
Once compiler collects enough data to do interesting optimizations,
data structures may be much larger...
> All the memory is recovered on program termination. If it becomes an
> issue, then unneeded data structured can be destroyed earlier.
>
> But if we say 40x source size, then capacity of 8GB means /currently/
> being able to deal with source code of something over 10M lines,
> depending one code density.
>
> It just means being more resourceful, and reintroducing long-forgotten
> techniques of working with memory-limited hardware.
>
> ATM, 10M lines is 200 times the size of my typical projects.

I deal with code written by other folks. And I like generating
code. You may easily end up with quite large amount of code
to compile.

--
Waldek Hebisch

Re: Code gen - calling sequences

<sggim3$od9$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=613&group=comp.lang.misc#613

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (Bart)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Sun, 29 Aug 2021 19:12:16 +0100
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <sggim3$od9$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org>
<sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me>
<sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me>
<sgfu91$qdg$1@z-news.wcss.wroc.pl> <sgg0ol$che$1@dont-email.me>
<sggc23$9mn$1@z-news.wcss.wroc.pl>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 29 Aug 2021 18:12:19 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7fc5023b536d3ef58b0d98a2b34c015f";
logging-data="25001"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/MfCrLdmXov7BzQ2wN4qxFYfZfTIqHM2Y="
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:psWQF5XzGqEagEIexvCMzMK0yM4=
In-Reply-To: <sggc23$9mn$1@z-news.wcss.wroc.pl>
X-Antivirus-Status: Clean
Content-Language: en-GB
X-Antivirus: AVG (VPS 210829-8, 29/8/2021), Outbound message
 by: Bart - Sun, 29 Aug 2021 18:12 UTC

On 29/08/2021 17:19, antispam@math.uni.wroc.pl wrote:
> Bart <bc@freeuk.com> wrote:

>>> I work on a program when executable is 64 M. However, significant
>>> part of executable code is in loadable modules that take another
>>> 64 M. Guess how big is the source?
>>
>> By my metric it would be about 6M lines of source code, if most of the
>> 64KB was executable x64 code (rather than initialised data, embedded
>> data files, or other exe overheads).
>>
>> That assumes a certain proportion of declaration lines to lines of
>> executable code.
>>
>> Now you're going to tell me it's either a lot fewer or a lot more.
>
> 40 M in executable is "statically" linked code from outside, probably
> corresponding to 0.5M lines os source. 24 M corresponds to about 80 K
> lines. 64 M in loadable modules corresponds to 210 K lines (actual
> code lines is closer to 120 K, rest is comments and empty lines).

Those are some very large ratios between code lines and bytes of output,
some 80:1, 300:1 and (assuming 150K for /some/ blank lines and
comments), about 400:1.

The largest I've come across is 2500:1, for a program (not mine) with
some very deeply nested macros.

It makes it harder to get an idea of the true complexity of a 1MB
program for example; would it be 100K lines (my 10:1 code), or 2.5K
lines (your 400:1 code), or something between the two?

But I think that even C code is typically more like mine than yours. If
I take the 230Kloc file sqlite3.c, which is very comment-heavy, and
strip the comments but leaving blank lines, then I get 170Kloc.

I compile that to a 1.1MB object file, which is between 6:1 and 7:1
bytes per line of source.

If I take one of my 740KLoc benchmark programs (fannkuch() repeated
10,000 times), I get executables of 6MB to 8MB, so bytes:lines ratios of
8:1 to 11:1 (optimising on/off).

If you applied that 400:1 ratio to the 10Mloc programs David was talking
about, then you'd end up with 4GB of code per 10Mloc. My 40Kloc compiler
would be 16MB in size instead of 0.4MB!

So I'd say that your programs are rather atypical.

> It is hard to distinguish between executable code and data. Due
> to semantics initialized data needs executable code to perform
> initialization. There are dispatch tables, all data and code is
> tagged (has identifying headers).

That sounds more like my interpreted languages. If I take that same
740Kloc benchmark, which is 670Kloc in this language, it uses 30MB of
64-bit bytecode, so 45:1 here, ignoring all other requirements.

> ATM I have to keep parse tree of large part of program in memory.
> The parse tree is about 8 times larger than corresponding source.

I think only 8 times larger is pretty good. Although it does depend on
whether you like long or short identifiers...

Re: Code gen - calling sequences

<sgi3vd$c3c$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=630&group=comp.lang.misc#630

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 10:13:32 +0200
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <sgi3vd$c3c$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 30 Aug 2021 08:13:33 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="15a0b6fced2eb349af46ab413288d451";
logging-data="12396"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sY3ncHfx04kAr54uZesc+LAEnklJjy2A="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:dU15dbnA26XZucGeuKikf0/B+7I=
In-Reply-To: <sgdhjh$l3$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Mon, 30 Aug 2021 08:13 UTC

On 28/08/2021 16:35, James Harris wrote:
> On 24/08/2021 21:56, David Brown wrote:
>> On 24/08/2021 21:06, Bart wrote:
>>> On 24/08/2021 18:25, James Harris wrote:
>>>
>>>> These days why use calling conventions at all? Perhaps they are only
>>>> needed for when there's complete ignorance of the callee. The
>>>> traditional concept of calling conventions may be pass\acute/e. ;-)
>>
>> James, aren't you using Linux?  The compose key makes it easy to write
>> letters like é - it's just compose, ´, e - "passé".  (It's even easier
>> if you have a non-English keyboard layout, in Windows or Linux, as these
>> usually have "dead keys" for accents.)
>
> Thanks, I've now enabled the compose key though I wrote passé in the way
> I did as it's the way I am thinking of for my language - which, as it
> was unfamiliar to others was why I added the smiley.
>

I don't imagine anyone is going to want to write "pass\acute/e" as an
identifier in any language. And the last thing anyone needs is another
way to write that kind of thing.

There are, I think, only two sensible options here:

1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.

If you desperately want to allow some way to write non-ASCII characters
without UTF-8, then please do not invent your own new way to do it.
There are more than enough standards here already - use HTMl/XML names,
or Unicode descriptions.

Re: Code gen - calling sequences

<sgi5dr$h90$1@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=631&group=comp.lang.misc#631

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!Hx95GBhnJb0Xc8StPhH8AA.user.46.165.242.91.POSTED!not-for-mail
From: mailbox@dmitry-kazakov.de (Dmitry A. Kazakov)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 10:38:18 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sgi5dr$h90$1@gioia.aioe.org>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="17696"; posting-host="Hx95GBhnJb0Xc8StPhH8AA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Dmitry A. Kazakov - Mon, 30 Aug 2021 08:38 UTC

On 2021-08-30 10:13, David Brown wrote:

> There are, I think, only two sensible options here:
>
> 1. Disallow any identifier letters outside of ASCII.
> 2. Make everything UTF-8.

Yes. Though people preferring #2 are usually English speakers who are
not really aware of the consequences. Like having E, Ε, Е three
different identifies. One could try to maintain language-defined
homographs in order to prevent mess, introducing even bigger mess...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Re: Code gen - calling sequences

<sgi9lp$eff$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=633&group=comp.lang.misc#633

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 11:50:48 +0200
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <sgi9lp$eff$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 30 Aug 2021 09:50:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="15a0b6fced2eb349af46ab413288d451";
logging-data="14831"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Jhm+JsxHujJTA6N31dIWubFX7eV1/bdw="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:jYGdQkwjjm1ifuw8cOx7h6GxC5c=
In-Reply-To: <sgi5dr$h90$1@gioia.aioe.org>
Content-Language: en-GB
 by: David Brown - Mon, 30 Aug 2021 09:50 UTC

On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
> On 2021-08-30 10:13, David Brown wrote:
>
>> There are, I think, only two sensible options here:
>>
>> 1. Disallow any identifier letters outside of ASCII.
>> 2. Make everything UTF-8.
>
> Yes. Though people preferring #2 are usually English speakers who are
> not really aware of the consequences. Like having E, Ε, Е three
> different identifies. One could try to maintain language-defined
> homographs in order to prevent mess, introducing even bigger mess...
>

I'm an English speaker, and a Norwegian speaker (we have three extra
letters, åøæ). And I am well aware of the potential complication of
different Unicode code points with very similar (or even identical) glyphs.

It can also be difficult for people to type, which can quickly be a pain
for collaboration. How would you type "bøk", for example? That's
"book" in Norwegian, and I have a key labelled "ø". James, on Linux,
can use compose + / + o to get the letter. But for you on Windows, with
a German keyboard layout (I'm guessing from your email address), I
expect you are stuck with copy-and-paste from my post, or using the
"character map" utility, or typing "alt+0248".

Then there is the question of displaying the characters. I have a font
that includes vast numbers of obscure symbols, so I could use ↀ for the
Roman numeral for 1000 (using the traditional symbol, rather than the
modern replacement of M). Other people reading this might not see it.

All in all, non-ASCII letters in identifiers can pose a lot of
challenges. But they are nonetheless important for people around the
world, and despite the disadvantages, UTF-8 is far and away the best
choice. You simply have to trust programmers to be sensible in their
usage. (You need to to that anyway, even with ASCII - in many fonts, l,
1 and I can be hard to distinguish, as can O and 0.)

Re: Code gen - calling sequences

<sgift1$1p16$2@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=641&group=comp.lang.misc#641

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!Hx95GBhnJb0Xc8StPhH8AA.user.46.165.242.91.POSTED!not-for-mail
From: mailbox@dmitry-kazakov.de (Dmitry A. Kazakov)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 13:37:05 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sgift1$1p16$2@gioia.aioe.org>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="58406"; posting-host="Hx95GBhnJb0Xc8StPhH8AA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Dmitry A. Kazakov - Mon, 30 Aug 2021 11:37 UTC

On 2021-08-30 11:50, David Brown wrote:
> On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
>> On 2021-08-30 10:13, David Brown wrote:
>>
>>> There are, I think, only two sensible options here:
>>>
>>> 1. Disallow any identifier letters outside of ASCII.
>>> 2. Make everything UTF-8.
>>
>> Yes. Though people preferring #2 are usually English speakers who are
>> not really aware of the consequences. Like having E, Ε, Е three
>> different identifies. One could try to maintain language-defined
>> homographs in order to prevent mess, introducing even bigger mess...
>
> I'm an English speaker, and a Norwegian speaker (we have three extra
> letters, åøæ). And I am well aware of the potential complication of
> different Unicode code points with very similar (or even identical) glyphs.
>
> It can also be difficult for people to type, which can quickly be a pain
> for collaboration. How would you type "bøk", for example? That's
> "book" in Norwegian, and I have a key labelled "ø". James, on Linux,
> can use compose + / + o to get the letter. But for you on Windows, with
> a German keyboard layout (I'm guessing from your email address), I
> expect you are stuck with copy-and-paste from my post, or using the
> "character map" utility, or typing "alt+0248".

Right, character map is what I use.

Germans have it easy way, you can drop diacritical marks ä=ae ö=oe ü=ue
and the ligature SZ ß=ss.

> Then there is the question of displaying the characters. I have a font
> that includes vast numbers of obscure symbols, so I could use ↀ for the
> Roman numeral for 1000 (using the traditional symbol, rather than the
> modern replacement of M). Other people reading this might not see it.

It is a lesser problem now than it was before. I remember the time
Windows was unable to display most of special symbols.

> All in all, non-ASCII letters in identifiers can pose a lot of
> challenges. But they are nonetheless important for people around the
> world, and despite the disadvantages, UTF-8 is far and away the best
> choice. You simply have to trust programmers to be sensible in their
> usage. (You need to to that anyway, even with ASCII - in many fonts, l,
> 1 and I can be hard to distinguish, as can O and 0.)

Actually, this is again sort of Europocentric POV. In reality, if you
have a truly international team with speakers outside Western Europe,
you must agree on some strict rules regarding comments and identifiers.

You might be able to remember a German or even a Czech word. Cyrillic
would be rather more challenging. But what would you do with Armenian or
Chinese?

And the least common denominator is English.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Re: Code gen - calling sequences

<sgihgc$u1$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=643&group=comp.lang.misc#643

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: james.harris.1@gmail.com (James Harris)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 13:04:28 +0100
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <sgihgc$u1$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 30 Aug 2021 12:04:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f3cbc23dbd01a4b94ff0965a2b8ec696";
logging-data="961"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2l/vD7FcfTDxS3wIm5tq107DZYTasYc0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:PhGAzvgHGIgPchMV6P6eSZlWBlU=
In-Reply-To: <sgi3vd$c3c$1@dont-email.me>
Content-Language: en-GB
 by: James Harris - Mon, 30 Aug 2021 12:04 UTC

On 30/08/2021 09:13, David Brown wrote:
> On 28/08/2021 16:35, James Harris wrote:
>> On 24/08/2021 21:56, David Brown wrote:
>>> On 24/08/2021 21:06, Bart wrote:
>>>> On 24/08/2021 18:25, James Harris wrote:
>>>>
>>>>> These days why use calling conventions at all? Perhaps they are only
>>>>> needed for when there's complete ignorance of the callee. The
>>>>> traditional concept of calling conventions may be pass\acute/e. ;-)
>>>
>>> James, aren't you using Linux?  The compose key makes it easy to write
>>> letters like é - it's just compose, ´, e - "passé".  (It's even easier
>>> if you have a non-English keyboard layout, in Windows or Linux, as these
>>> usually have "dead keys" for accents.)
>>
>> Thanks, I've now enabled the compose key though I wrote passé in the way
>> I did as it's the way I am thinking of for my language - which, as it
>> was unfamiliar to others was why I added the smiley.
>>
>
> I don't imagine anyone is going to want to write "pass\acute/e" as an
> identifier in any language.

It's for string literals!

IMO programs and identifiers should use ascii, even in non-English
languages.

--
James Harris

Re: Code gen - calling sequences

<sgj73n$olh$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=651&group=comp.lang.misc#651

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 20:13:10 +0200
Organization: A noiseless patient Spider
Lines: 78
Message-ID: <sgj73n$olh$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 30 Aug 2021 18:13:11 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="542e4aadc720c2c61e42fa7c4cb638f0";
logging-data="25265"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19nIm71xR2sndu2i11Xz3H6VKaLV85wbvw="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:JluOEcpIioUqMva5mMP+OqWdJU4=
In-Reply-To: <sgift1$1p16$2@gioia.aioe.org>
Content-Language: en-GB
 by: David Brown - Mon, 30 Aug 2021 18:13 UTC

On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
> On 2021-08-30 11:50, David Brown wrote:
>> On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
>>> On 2021-08-30 10:13, David Brown wrote:
>>>
>>>> There are, I think, only two sensible options here:
>>>>
>>>> 1. Disallow any identifier letters outside of ASCII.
>>>> 2. Make everything UTF-8.
>>>
>>> Yes. Though people preferring #2 are usually English speakers who are
>>> not really aware of the consequences. Like having E, Ε, Е three
>>> different identifies. One could try to maintain language-defined
>>> homographs in order to prevent mess, introducing even bigger mess...
>>
>> I'm an English speaker, and a Norwegian speaker (we have three extra
>> letters, åøæ).  And I am well aware of the potential complication of
>> different Unicode code points with very similar (or even identical)
>> glyphs.
>>
>> It can also be difficult for people to type, which can quickly be a pain
>> for collaboration.  How would you type "bøk", for example?  That's
>> "book" in Norwegian, and I have a key labelled "ø".  James, on Linux,
>> can use compose + / + o to get the letter.  But for you on Windows, with
>> a German keyboard layout (I'm guessing from your email address), I
>> expect you are stuck with copy-and-paste from my post, or using the
>> "character map" utility, or typing "alt+0248".
>
> Right, character map is what I use.
>
> Germans have it easy way, you can drop diacritical marks ä=ae ö=oe ü=ue
> and the ligature SZ ß=ss.
>

You can do that too in Norwegian (though people are not always
consistent about their choices of transliteration), if you can't use the
proper letters (you can also substitute the Swedish versions). But the
preference is to use the correct letters.

>> Then there is the question of displaying the characters.  I have a font
>> that includes vast numbers of obscure symbols, so I could use ↀ for the
>> Roman numeral for 1000 (using the traditional symbol, rather than the
>> modern replacement of M).  Other people reading this might not see it.
>
> It is a lesser problem now than it was before. I remember the time
> Windows was unable to display most of special symbols.

Slowly, in some ways, Windows has been catching up with the *nix world.

>
>> All in all, non-ASCII letters in identifiers can pose a lot of
>> challenges.  But they are nonetheless important for people around the
>> world, and despite the disadvantages, UTF-8 is far and away the best
>> choice.  You simply have to trust programmers to be sensible in their
>> usage.  (You need to to that anyway, even with ASCII - in many fonts, l,
>> 1 and I can be hard to distinguish, as can O and 0.)
>
> Actually, this is again sort of Europocentric POV. In reality, if you
> have a truly international team with speakers outside Western Europe,
> you must agree on some strict rules regarding comments and identifiers.
>

If you have an international team, then it is standard practice to keep
everything in English. But most teams are not international. Why
should a group of Greek or Japanese programmers be forced to write
everything in a foreign language? You can view the keywords as fixed -
almost like symbols, rather than words - but they may prefer to have
other parts written in their own language.

> You might be able to remember a German or even a Czech word. Cyrillic
> would be rather more challenging. But what would you do with Armenian or
> Chinese?
>
> And the least common denominator is English.
>

It is the least common denominator for most international groups, but
not for most national teams.

Re: Code gen - calling sequences

<sgj79k$pvn$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=652&group=comp.lang.misc#652

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 20:16:19 +0200
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <sgj79k$pvn$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgihgc$u1$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 30 Aug 2021 18:16:20 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="542e4aadc720c2c61e42fa7c4cb638f0";
logging-data="26615"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX189MDJv0oV5bxsRW3VHddSY7j2zLR2iM30="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:BesaWMqp8GcbxwPOOMhphohXgBw=
In-Reply-To: <sgihgc$u1$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Mon, 30 Aug 2021 18:16 UTC

On 30/08/2021 14:04, James Harris wrote:
> On 30/08/2021 09:13, David Brown wrote:
>> On 28/08/2021 16:35, James Harris wrote:
>>> On 24/08/2021 21:56, David Brown wrote:
>>>> On 24/08/2021 21:06, Bart wrote:
>>>>> On 24/08/2021 18:25, James Harris wrote:
>>>>>
>>>>>> These days why use calling conventions at all? Perhaps they are only
>>>>>> needed for when there's complete ignorance of the callee. The
>>>>>> traditional concept of calling conventions may be pass\acute/e. ;-)
>>>>
>>>> James, aren't you using Linux?  The compose key makes it easy to write
>>>> letters like é - it's just compose, ´, e - "passé".  (It's even easier
>>>> if you have a non-English keyboard layout, in Windows or Linux, as
>>>> these
>>>> usually have "dead keys" for accents.)
>>>
>>> Thanks, I've now enabled the compose key though I wrote passé in the way
>>> I did as it's the way I am thinking of for my language - which, as it
>>> was unfamiliar to others was why I added the smiley.
>>>
>>
>> I don't imagine anyone is going to want to write "pass\acute/e" as an
>> identifier in any language.
>
> It's for string literals!
>
> IMO programs and identifiers should use ascii, even in non-English
> languages.
>

See the rest of the thread for a discussion on non-ASCII identifiers.
(I am not suggesting that you implement them, or don't implement them -
that's your choice. Some languages go one way, others go the other way.)

But don't make up your own language for special characters in strings or
comments. Again, UTF-8 is far and away the best option. If you feel
that is a problem, then at least stick to an existing standard -
HTML/XML character entities would almost certainly be the most
convenient choice: "pass&eacute;".

Re: Code gen - calling sequences

<sgjafr$3lr$1@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=653&group=comp.lang.misc#653

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!Hx95GBhnJb0Xc8StPhH8AA.user.46.165.242.91.POSTED!not-for-mail
From: mailbox@dmitry-kazakov.de (Dmitry A. Kazakov)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 21:10:52 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sgjafr$3lr$1@gioia.aioe.org>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="3771"; posting-host="Hx95GBhnJb0Xc8StPhH8AA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Dmitry A. Kazakov - Mon, 30 Aug 2021 19:10 UTC

On 2021-08-30 20:13, David Brown wrote:
> On 30/08/2021 13:37, Dmitry A. Kazakov wrote:

>> It is a lesser problem now than it was before. I remember the time
>> Windows was unable to display most of special symbols.
>
> Slowly, in some ways, Windows has been catching up with the *nix world.

I must defend Windows. Linux adopted UTF-8 very late. I well remember
the mess it had with 8-bit code pages.

BTW, there still exist file utilities to check filenames in Linux. I had
an old filesystem with some file names in German encoded in Latin-1. It
was connected to a FreeNAS (BSD-based). These files caused mysterious
FreeNAS crashes when a remote host tried to browse files over a network
share. Once I fixed the names it almost stopped crashing. I ditched
FreeNAS anyway in favor of Ubuntu.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Re: Code gen - calling sequences

<sgjatm$idn$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=654&group=comp.lang.misc#654

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 21:18:13 +0200
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <sgjatm$idn$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me> <sgjafr$3lr$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 30 Aug 2021 19:18:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="542e4aadc720c2c61e42fa7c4cb638f0";
logging-data="18871"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/s1Mok1KHrKCUsKwF1wuT3mE27dYWYbGY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:JsYTqRlTE/Kc+RFlvBWuyqz8aNc=
In-Reply-To: <sgjafr$3lr$1@gioia.aioe.org>
Content-Language: en-GB
 by: David Brown - Mon, 30 Aug 2021 19:18 UTC

On 30/08/2021 21:10, Dmitry A. Kazakov wrote:
> On 2021-08-30 20:13, David Brown wrote:
>> On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
>
>>> It is a lesser problem now than it was before. I remember the time
>>> Windows was unable to display most of special symbols.
>>
>> Slowly, in some ways, Windows has been catching up with the *nix world.
>
> I must defend Windows. Linux adopted UTF-8 very late. I well remember
> the mess it had with 8-bit code pages.

Windows also had a mess with 8-bit code pages.

Windows /was/ earlier with Unicode, that's true - unfortunately, they
picked UCS-2 and then got stuck with that instead of UTF-8. Linux
picked UTF-8 by laziness, as pretty much everything involving strings
(except displaying them) just works as before. There is no need to
re-invent everything in a 16-bit manner, as Windows did, and there are
no problems when it turns out 16 bits are not enough.

>
> BTW, there still exist file utilities to check filenames in Linux. I had
> an old filesystem with some file names in German encoded in Latin-1. It
> was connected to a FreeNAS (BSD-based). These files caused mysterious
> FreeNAS crashes when a remote host tried to browse files over a network
> share. Once I fixed the names it almost stopped crashing. I ditched
> FreeNAS anyway in favor of Ubuntu.
>

FreeNAS is BSD, which is not Linux. Not that BSD has any problems with
non-ASCII filenames either. An application might be made ASCII only,
however, regardless of the system.

Re: Code gen - calling sequences

<sgjc00$phv$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=655&group=comp.lang.misc#655

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (Bart)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 20:36:27 +0100
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <sgjc00$phv$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 30 Aug 2021 19:36:32 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d34df35c3798e2c34a1b8f95ab27c825";
logging-data="26175"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184usvS6Od/8fTn6vXNf4cznO2PN+PZxdg="
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:5CzZO4qhusN6VxNjR0Jyxu+IRMs=
In-Reply-To: <sgj73n$olh$1@dont-email.me>
X-Antivirus-Status: Clean
Content-Language: en-GB
X-Antivirus: AVG (VPS 210830-0, 30/8/2021), Outbound message
 by: Bart - Mon, 30 Aug 2021 19:36 UTC

On 30/08/2021 19:13, David Brown wrote:
> On 30/08/2021 13:37, Dmitry A. Kazakov wrote:

>> Actually, this is again sort of Europocentric POV. In reality, if you
>> have a truly international team with speakers outside Western Europe,
>> you must agree on some strict rules regarding comments and identifiers.
>>
>
> If you have an international team, then it is standard practice to keep
> everything in English. But most teams are not international. Why
> should a group of Greek or Japanese programmers be forced to write
> everything in a foreign language? You can view the keywords as fixed -
> almost like symbols, rather than words - but they may prefer to have
> other parts written in their own language.
>
>> You might be able to remember a German or even a Czech word. Cyrillic
>> would be rather more challenging. But what would you do with Armenian or
>> Chinese?
>>
>> And the least common denominator is English.
>>
>
> It is the least common denominator for most international groups, but
> not for most national teams.

If they are using a mainstream language, then it's about more than using
Unicode in identifiers:

* Keywords are likely to be in English still

* Standard type names will be English-based (and, in C, codes like %ll
and -LL and INT_MAX)

* The function names in the standard library will probably be English-based

* Compiler option names may be English based (eg. --version)

* Error messages from the compiler may be in English (I don't know how
internationalised such programs are)

* Most of the exported functions and enums of general-purpose libraries
are likely to be in English (eg. SDL_BUTTON_LEFT)

So I'd say it's hard to get away from English even if they wanted.

But string literals and comments in source code: they can be anything;
the language just needs to allow UTF8.

Re: Code gen - calling sequences

<sgjc1o$rrf$1@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=656&group=comp.lang.misc#656

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!Hx95GBhnJb0Xc8StPhH8AA.user.46.165.242.91.POSTED!not-for-mail
From: mailbox@dmitry-kazakov.de (Dmitry A. Kazakov)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 21:37:29 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sgjc1o$rrf$1@gioia.aioe.org>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me> <sgjafr$3lr$1@gioia.aioe.org>
<sgjatm$idn$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="28527"; posting-host="Hx95GBhnJb0Xc8StPhH8AA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Dmitry A. Kazakov - Mon, 30 Aug 2021 19:37 UTC

On 2021-08-30 21:18, David Brown wrote:
> On 30/08/2021 21:10, Dmitry A. Kazakov wrote:
>> On 2021-08-30 20:13, David Brown wrote:
>>> On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
>>
>>>> It is a lesser problem now than it was before. I remember the time
>>>> Windows was unable to display most of special symbols.
>>>
>>> Slowly, in some ways, Windows has been catching up with the *nix world.
>>
>> I must defend Windows. Linux adopted UTF-8 very late. I well remember
>> the mess it had with 8-bit code pages.
>
> Windows also had a mess with 8-bit code pages.

Oh, yes.

If I correctly remember, you needed "professional" rather than "home" in
order to switch the system default.

> Windows /was/ earlier with Unicode, that's true - unfortunately, they
> picked UCS-2 and then got stuck with that instead of UTF-8.

Worse, later they changed UCS-2 to UTF-16 under the rug. All system
calls are duplicated, one ASCII A-call, another UTF-16 W-call.

> Linux
> picked UTF-8 by laziness, as pretty much everything involving strings
> (except displaying them) just works as before. There is no need to
> re-invent everything in a 16-bit manner, as Windows did, and there are
> no problems when it turns out 16 bits are not enough.

It is UTF-16 now. But of course, UTF-16 is a monstrosity compared with
UTF-8. Fortunately third party libraries ignore the mess. E.g. GTK port
for Windows converts all filenames to UTF-8.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Re: Code gen - calling sequences

<sgjcus$jh$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=657&group=comp.lang.misc#657

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: james.harris.1@gmail.com (James Harris)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 20:52:59 +0100
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <sgjcus$jh$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgihgc$u1$1@dont-email.me>
<sgj79k$pvn$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 30 Aug 2021 19:53:00 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f3cbc23dbd01a4b94ff0965a2b8ec696";
logging-data="625"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+gRnmi62A95lxUxNm2xSughvR7C27bCvU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:dHwSwTD22cKgnyJRXcOSHtavsK4=
In-Reply-To: <sgj79k$pvn$1@dont-email.me>
Content-Language: en-GB
 by: James Harris - Mon, 30 Aug 2021 19:52 UTC

On 30/08/2021 19:16, David Brown wrote:
> On 30/08/2021 14:04, James Harris wrote:
>> On 30/08/2021 09:13, David Brown wrote:

....

>>> I don't imagine anyone is going to want to write "pass\acute/e" as an
>>> identifier in any language.
>>
>> It's for string literals!
>>
>> IMO programs and identifiers should use ascii, even in non-English
>> languages.
>>
>
> See the rest of the thread for a discussion on non-ASCII identifiers.
> (I am not suggesting that you implement them, or don't implement them -
> that's your choice. Some languages go one way, others go the other way.)
>
> But don't make up your own language for special characters in strings or
> comments. Again, UTF-8 is far and away the best option. If you feel
> that is a problem, then at least stick to an existing standard -
> HTML/XML character entities would almost certainly be the most
> convenient choice: "pass&eacute;".

Any UTF is no good for source code - e.g. for reasons Dmitry mentioned.
In addition, characters which people cannot identify or recognise should
not be part of source code because they make it unreadable.

I am considering allowing external identifier names to include unusual
characters so as to link with routines which use such characters - but
the programmer would have to write the identifiers in ascii characters.

I doubt I'd use HTML entities as they are a mess (e.g. having multiple
names for the same character) but I would need the names to come from an
online database.

--
James Harris

Re: Code gen - calling sequences

<sgjda9$31b$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=658&group=comp.lang.misc#658

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: james.harris.1@gmail.com (James Harris)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Mon, 30 Aug 2021 20:59:05 +0100
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <sgjda9$31b$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg9eo0$1os5$2@gioia.aioe.org>
<sgaut4$rij$1@dont-email.me> <sgbgm9$9l7$1@dont-email.me>
<sgdnqs$bd2$1@dont-email.me> <sgeiom$5jg$1@dont-email.me>
<sgfkej$qhf$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 30 Aug 2021 19:59:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f3cbc23dbd01a4b94ff0965a2b8ec696";
logging-data="3115"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+jc81gJ315y82S8AVV9qPu6cbOJpoy3wI="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:jlns7pIzWD++ISjfs3VnlSqwmaA=
In-Reply-To: <sgfkej$qhf$1@dont-email.me>
Content-Language: en-GB
 by: James Harris - Mon, 30 Aug 2021 19:59 UTC

On 29/08/2021 10:36, David Brown wrote:
> On 29/08/2021 02:01, Bart wrote:
>> On 28/08/2021 17:21, David Brown wrote:
>>> On 27/08/2021 22:07, Bart wrote:
>>
>>> As James suggested, the object files are basically just the internal
>>> representation of the compilation before code generation.
>>
>> Then 'object file' is a complete misnomer.
>
> Yes, that's a fair comment. "Linking" is also a misnomer in link-time
> optimisation. The names are historical, rather than technically accurate.

This is a first: three of us in agreement!

In my outline design the IR does a lot of the heavy lifting, including
being the preferred form for distributing software.

--
James Harris

Re: Code gen - calling sequences

<sgkm6j$me6$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=668&group=comp.lang.misc#668

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Tue, 31 Aug 2021 09:36:51 +0200
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <sgkm6j$me6$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me> <sgjafr$3lr$1@gioia.aioe.org>
<sgjatm$idn$1@dont-email.me> <sgjc1o$rrf$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 31 Aug 2021 07:36:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="49ac68b1e4db6024a98609a80b6239eb";
logging-data="22982"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/H9zoKPYTcXPlfIsXO+L+M0fA34nP1+I0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:yExQMnQJsP+wTwFte4IBJgJGfIs=
In-Reply-To: <sgjc1o$rrf$1@gioia.aioe.org>
Content-Language: en-GB
 by: David Brown - Tue, 31 Aug 2021 07:36 UTC

On 30/08/2021 21:37, Dmitry A. Kazakov wrote:
> On 2021-08-30 21:18, David Brown wrote:

>> Windows /was/ earlier with Unicode, that's true - unfortunately, they
>> picked UCS-2 and then got stuck with that instead of UTF-8.
>
> Worse, later they changed UCS-2 to UTF-16 under the rug. All system
> calls are duplicated, one ASCII A-call, another UTF-16 W-call.
>
>> Linux
>> picked UTF-8 by laziness, as pretty much everything involving strings
>> (except displaying them) just works as before.  There is no need to
>> re-invent everything in a 16-bit manner, as Windows did, and there are
>> no problems when it turns out 16 bits are not enough.
>
> It is UTF-16 now. But of course, UTF-16 is a monstrosity compared with
> UTF-8. Fortunately third party libraries ignore the mess. E.g. GTK port
> for Windows converts all filenames to UTF-8.
>

My understanding (which may be wrong, as I don't do much Windows
programming) is that there is a gradual move to UTF-8 support in
Windows. These things take time of course, and while there is no doubt
that Microsoft backed the wrong horse here with 16-bit encodings, they
made the right choice at the time. I blame MS for a lot of bad things,
but not this one! And they are not alone - Java, QT and Python are
other big players that picked UCS-2, leading to much regret and slow
progress towards a changeover to UTF-8.

Re: Code gen - calling sequences

<sgkt23$1og2$1@gioia.aioe.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=670&group=comp.lang.misc#670

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!Hx95GBhnJb0Xc8StPhH8AA.user.46.165.242.91.POSTED!not-for-mail
From: mailbox@dmitry-kazakov.de (Dmitry A. Kazakov)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Tue, 31 Aug 2021 11:33:55 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sgkt23$1og2$1@gioia.aioe.org>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me> <sgjafr$3lr$1@gioia.aioe.org>
<sgjatm$idn$1@dont-email.me> <sgjc1o$rrf$1@gioia.aioe.org>
<sgkm6j$me6$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="57858"; posting-host="Hx95GBhnJb0Xc8StPhH8AA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Dmitry A. Kazakov - Tue, 31 Aug 2021 09:33 UTC

On 2021-08-31 09:36, David Brown wrote:

> My understanding (which may be wrong, as I don't do much Windows
> programming) is that there is a gradual move to UTF-8 support in
> Windows.

I think you are right. Actually they could proclaim A-calls UTF-8 as
they did with W-calls. That would break some legacy code, only French
will be annoyed. Germans will be apathic, small European countries
resigned, I guess...

> These things take time of course, and while there is no doubt
> that Microsoft backed the wrong horse here with 16-bit encodings, they
> made the right choice at the time.

> I blame MS for a lot of bad things,
> but not this one! And they are not alone - Java, QT and Python are
> other big players that picked UCS-2, leading to much regret and slow
> progress towards a changeover to UTF-8.

I believe that UTF-8 was introduced later. It is impossible that
everybody was wrong. E.g. Ada also adopted UCS-2 in 1995. Later on Ada
added UCS-4. Just same mess as with Windows, alas. But most Ada
programmers ignore UCS-2/4 and use UTF-8 where the standard mandates
Latin-1.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Re: Code gen - calling sequences

<sgl2fb$2k8$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=672&group=comp.lang.misc#672

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Tue, 31 Aug 2021 13:06:18 +0200
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <sgl2fb$2k8$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me> <sgjafr$3lr$1@gioia.aioe.org>
<sgjatm$idn$1@dont-email.me> <sgjc1o$rrf$1@gioia.aioe.org>
<sgkm6j$me6$1@dont-email.me> <sgkt23$1og2$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 31 Aug 2021 11:06:19 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="49ac68b1e4db6024a98609a80b6239eb";
logging-data="2696"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/FCwnFyYIVq42ztOovkSEzaAZDFZVet0w="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:goOf65ExZoq6N+F7Vi85bvFSqlU=
In-Reply-To: <sgkt23$1og2$1@gioia.aioe.org>
Content-Language: en-GB
 by: David Brown - Tue, 31 Aug 2021 11:06 UTC

On 31/08/2021 11:33, Dmitry A. Kazakov wrote:
> On 2021-08-31 09:36, David Brown wrote:
>
>> My understanding (which may be wrong, as I don't do much Windows
>> programming) is that there is a gradual move to UTF-8 support in
>> Windows.
>
> I think you are right. Actually they could proclaim A-calls UTF-8 as
> they did with W-calls. That would break some legacy code, only French
> will be annoyed. Germans will be apathic, small European countries
> resigned, I guess...

You are just listing the advantages :-)

>
>> These things take time of course, and while there is no doubt
>> that Microsoft backed the wrong horse here with 16-bit encodings, they
>> made the right choice at the time.
>
>> I blame MS for a lot of bad things,
>> but not this one!  And they are not alone - Java, QT and Python are
>> other big players that picked UCS-2, leading to much regret and slow
>> progress towards a changeover to UTF-8.
>
> I believe that UTF-8 was introduced later.

Yes. Unicode was first conceives as 16-bit, with UCS-2. Then they
started extending it beyond 16-bit, and had to make UCS-4. UTF-16 was
developed as a way to access the rest of the characters with 16-bit code
units, and then I think UTF-8 came after that. (UTF-32 is the same as
UCS-4.)

> It is impossible that
> everybody was wrong.

They were not wrong at the time - it was later changes that made them
wrong. It is a sometimes unfortunate fact of life that backwards
compatibility is king, and it's hard to undo decisions even when we know
things could have been better. (That's why x86 is popular, despite
being an appallingly bad architecture, it's why we have Windows, it's
why we have qwerty keyboards, it's why we all use English with its silly
inconsistent spelling.)

> E.g. Ada also adopted UCS-2 in 1995. Later on Ada
> added UCS-4. Just same mess as with Windows, alas. But most Ada
> programmers ignore UCS-2/4 and use UTF-8 where the standard mandates
> Latin-1.
>

Re: Code gen - calling sequences

<sglstf$1ft$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=687&group=comp.lang.misc#687

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: james.harris.1@gmail.com (James Harris)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Tue, 31 Aug 2021 19:37:35 +0100
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <sglstf$1ft$1@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 31 Aug 2021 18:37:35 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0afa2f044f78014862f417d8919390f7";
logging-data="1533"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+dob98p98KRbpOeUDtfS8TcOzgx3xEOWg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:3Y3Rht230w1FMNFxE6OvR/BVPSk=
In-Reply-To: <sgj73n$olh$1@dont-email.me>
Content-Language: en-GB
 by: James Harris - Tue, 31 Aug 2021 18:37 UTC

On 30/08/2021 19:13, David Brown wrote:
> On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
>> On 2021-08-30 11:50, David Brown wrote:

....

>>> All in all, non-ASCII letters in identifiers can pose a lot of
>>> challenges.  But they are nonetheless important for people around the
>>> world, and despite the disadvantages, UTF-8 is far and away the best
>>> choice.  You simply have to trust programmers to be sensible in their
>>> usage.  (You need to to that anyway, even with ASCII - in many fonts, l,
>>> 1 and I can be hard to distinguish, as can O and 0.)
>>
>> Actually, this is again sort of Europocentric POV. In reality, if you
>> have a truly international team with speakers outside Western Europe,
>> you must agree on some strict rules regarding comments and identifiers.
>>
>
> If you have an international team, then it is standard practice to keep
> everything in English. But most teams are not international. Why
> should a group of Greek or Japanese programmers be forced to write
> everything in a foreign language? You can view the keywords as fixed -
> almost like symbols, rather than words - but they may prefer to have
> other parts written in their own language.

AISI: Have the master copy of /all/ programs in American English, and
support translation of identifier names, comments, string literals etc
to other languages.

--
James Harris

Re: Code gen - calling sequences

<sgncsd$bsj$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=692&group=comp.lang.misc#692

  copy link   Newsgroups: comp.lang.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.misc
Subject: Re: Code gen - calling sequences
Date: Wed, 1 Sep 2021 10:16:13 +0200
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <sgncsd$bsj$2@dont-email.me>
References: <sg3a2l$dqb$1@dont-email.me> <sg3g0m$nta$1@dont-email.me>
<sg3md3$687$1@dont-email.me> <sgdhjh$l3$1@dont-email.me>
<sgi3vd$c3c$1@dont-email.me> <sgi5dr$h90$1@gioia.aioe.org>
<sgi9lp$eff$1@dont-email.me> <sgift1$1p16$2@gioia.aioe.org>
<sgj73n$olh$1@dont-email.me> <sglstf$1ft$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 1 Sep 2021 08:16:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bd3032bcc3f4b28e1491a96e584c5814";
logging-data="12179"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+/ahp7EyOGkZ+Z1Z4TSvj82cB4zD1B59o="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:GypEzJzzI8RjC2lKDWbV8wpNx64=
In-Reply-To: <sglstf$1ft$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Wed, 1 Sep 2021 08:16 UTC

On 31/08/2021 20:37, James Harris wrote:
> On 30/08/2021 19:13, David Brown wrote:
>> On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
>>> On 2021-08-30 11:50, David Brown wrote:
>
> ...
>
>>>> All in all, non-ASCII letters in identifiers can pose a lot of
>>>> challenges.  But they are nonetheless important for people around the
>>>> world, and despite the disadvantages, UTF-8 is far and away the best
>>>> choice.  You simply have to trust programmers to be sensible in their
>>>> usage.  (You need to to that anyway, even with ASCII - in many
>>>> fonts, l,
>>>> 1 and I can be hard to distinguish, as can O and 0.)
>>>
>>> Actually, this is again sort of Europocentric POV. In reality, if you
>>> have a truly international team with speakers outside Western Europe,
>>> you must agree on some strict rules regarding comments and identifiers.
>>>
>>
>> If you have an international team, then it is standard practice to keep
>> everything in English.  But most teams are not international.  Why
>> should a group of Greek or Japanese programmers be forced to write
>> everything in a foreign language?  You can view the keywords as fixed -
>> almost like symbols, rather than words - but they may prefer to have
>> other parts written in their own language.
>
> AISI: Have the master copy of /all/ programs in American English, and
> support translation of identifier names, comments, string literals etc
> to other languages.
>

Why would anyone choose the dialect of one particular ex colony, rather
than using /real/ English?

I know that in the USA it is common to think that America is the only
country, or at least the only one worth considering, but the rest of the
world begs to differ.


devel / comp.lang.misc / Re: Code gen - calling sequences

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor