Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

In the long run, every program becomes rococco, and then rubble. -- Alan Perlis


computers / news.software.nntp / Encoding madness

SubjectAuthor
* Encoding madnessNigel Reed
+* Re: Encoding madnessRichard Kettlewell
|+- Re: Encoding madnessFranck
|+* Re: Encoding madnessAdam H. Kerman
||`* Re: Encoding madnessOlivier Miakinen
|| `- Re: Encoding madnessAdam H. Kerman
|`- Re: Encoding madnessRichard
+* Re: Encoding madnessFranck
|`- Re: Encoding madnessFranck
+* Re: Encoding madnessJulien ÉLIE
|+- Re: Encoding madnessFranck
|+* Re: Encoding madnessAdam W.
||`* Re: Encoding madnessRuss Allbery
|| `* Re: Encoding madnessAdam W.
||  `* Re: Encoding madnessAdam H. Kerman
||   `* Re: Encoding madnessAdam W.
||    `* Re: Encoding madnessAdam H. Kerman
||     `* Re: Encoding madnessRuss Allbery
||      `* Re: Encoding madnessAdam W.
||       +* Re: Encoding madnessRuss Allbery
||       |`- Re: Encoding madnessUrs Janßen
||       `- Re: Encoding madnessMichael Bäuerle
|+* Re: Encoding madnessNigel Reed
||`- Re: Encoding madnessTom Furie
|`- Re: Encoding madnessBilly G. (go-while)
+* Re: Encoding madnessnews
|+* Re: Encoding madnessJulien ÉLIE
||+* Re: Encoding madnessAdam H. Kerman
|||`* Re: Encoding madnessJulien ÉLIE
||| `* Re: Encoding madnessRuss Allbery
|||  `* Re: Encoding madnessAdam H. Kerman
|||   `* Re: Encoding madnessRuss Allbery
|||    `* Re: Encoding madnessAdam H. Kerman
|||     `* Re: Encoding madnessRuss Allbery
|||      `- Re: Encoding madnessAdam H. Kerman
||`- Re: Encoding madnessOlivier Miakinen
|`- Re: Encoding madnessRuss Allbery
+* Re: Encoding madnessRuss Allbery
|`* Re: Encoding madnessAdam H. Kerman
| `* Re: Encoding madnessRuss Allbery
|  `* Re: Encoding madnessAdam H. Kerman
|   `- Re: Encoding madnessJulien ÉLIE
`- Re: Encoding madnessJulien ÉLIE

Pages:12
Encoding madness

<20230411014437.0aef1026@wibble.sysadmininc.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1626&group=news.software.nntp#1626

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.endofthelinebbs.com!.POSTED.47.189.156.66!not-for-mail
From: sysop@endofthelinebbs.com (Nigel Reed)
Newsgroups: news.software.nntp
Subject: Encoding madness
Date: Tue, 11 Apr 2023 01:44:37 -0500
Organization: End Of The Line BBS
Message-ID: <20230411014437.0aef1026@wibble.sysadmininc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: www.sysadmininc.com; posting-host="47.189.156.66";
logging-data="3670381"; mail-complaints-to="usenet@www.sysadmininc.com"
X-Newsreader: Claws Mail 4.1.1git47 (GTK 3.24.33; x86_64-pc-linux-gnu)
 by: Nigel Reed - Tue, 11 Apr 2023 06:44 UTC

Hi all,

I'm trying to sync up the active and newsgroups file from 15 peers and
it's proving to be a bit of a challenge.

The first bit is done, which is mainly getting rid of groups that have
invalid names (those that end in a period, contain illegal characters,
and the like).

Next is a little more of a challenge. Trying to sync the descriptions.
It wouldn't be so bad if everyone used the same encoding, however the
majority are using ISO-8859-1, a couple are using UTF-8, some using
ASCII and one is Non-ISO extended-ASCII.

This becomes a problem when trying to do a diff or other operations
trying to match group names.

I know there isn't a standard encoding for the newsgroup file but that
may have been a bit of an oversight now that some people are trying to
run a clean server.

Going forward, maybe the powers that be can get their heads together
and enforce a certain coding standard for innd (and whatever else is
out there) that is at least maintained. Personally, I don't care which
one we end up with, ISO-8859 seems to be the far more popular (7
servers) followed by ASCII (4) then UTF-8 (3) .

I guess we'd need all the servers to want to agree and update their
files accordingly.

Somehow, I feel I'll be shot down here since it's been like this since
1986.

Thoughts?
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23

Re: Encoding madness

<wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1627&group=news.software.nntp#1627

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: invalid@invalid.invalid (Richard Kettlewell)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 08:49:41 +0100
Organization: terraraq NNTP server
Message-ID: <wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="29692"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:BhO+s1lUt/a7njmv8NBGmwwjR8w=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Tue, 11 Apr 2023 07:49 UTC

Nigel Reed <sysop@endofthelinebbs.com> writes:
> I know there isn't a standard encoding for the newsgroup file but that
> may have been a bit of an oversight now that some people are trying to
> run a clean server.
>
> Going forward, maybe the powers that be can get their heads together
> and enforce a certain coding standard for innd (and whatever else is
> out there) that is at least maintained. Personally, I don't care which
> one we end up with, ISO-8859 seems to be the far more popular (7
> servers) followed by ASCII (4) then UTF-8 (3) .

If there’s going to be a global choice of encoding then it has to be
UTF-8.

> I guess we'd need all the servers to want to agree and update their
> files accordingly.

That’s the hard bit...

--
https://www.greenend.org.uk/rjk/

Re: Encoding madness

<DLEZ+IfTYqQ@news.spitfire-nntp.fr>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1628&group=news.software.nntp#1628

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!pasdenom.info!news.usenet.ovh!news.spitfire-nntp.fr!.POSTED!not-for-mail
Message-ID: <DLEZ+IfTYqQ@news.spitfire-nntp.fr>
Date: Tue, 11 Apr 2023 09:51:12 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.9.1
Subject: Re: Encoding madness
Newsgroups: news.software.nntp
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
Content-Language: fr
From: franck@email.invalid (Franck)
In-Reply-To: <20230411014437.0aef1026@wibble.sysadmininc.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 11 Apr 2023 07:51:12 +0000 (UTC)
Injection-Info: news.spitfire-nntp.fr; logging-data="C643511701e1f85bc";
mail-complaints-to="abuse(at)spitfire-nntp.fr"
Cancel-Lock: sha1:TL0CR9nxUEXjiQuVs4DfEibe7G4=
sha256:6f0MGe5w5b60L9dfCRBLfzBEEiaq9f7Vs9v2jbSGZqQ=
sha1:LyEXUYChZRg9WM7E1eMIlFUqktE=
sha256:63NSyp/wbzBC13CbpKCV1tK6eAzs+4WgBUzj/ZWn65Q=
Organization: Home of Spitfire News Server, Montpellier (France)
 by: Franck - Tue, 11 Apr 2023 07:51 UTC

Hello,

> I'm trying to sync up the active and newsgroups file from 15 peers and
> it's proving to be a bit of a challenge.

Hum, I rather say : "It's a f***ing challenge"!

> Next is a little more of a challenge. Trying to sync the descriptions.

> I know there isn't a standard encoding for the newsgroup file but that
> may have been a bit of an oversight now that some people are trying to
> run a clean server.
Since my server has a graphical interface that displays group names and
descriptions, I need to be able to know which encoding is used.

To solve the problem, I use a small file that records the hierarchies
with the encoding used. Either I get it from the cmsg checkgroup of some
hierarchies that mention it (notably fr), or I fix it by hand.

This file is very simple. It mentions the hierarchy and the encoding
used, separated by a TAB.

Maybe adding an identical file to "active" and "newsgroups" ones would
do the trick?

##
## This file lists the charsets used by hierarchies.
##
## Format: hierarchie<TAB>charset
cn Big5
fr UTF-8

And so on.

?

Franck

Re: Encoding madness

<CWk1+5dhgSo@news.spitfire-nntp.fr>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1629&group=news.software.nntp#1629

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!news.spitfire-nntp.fr!.POSTED!not-for-mail
Message-ID: <CWk1+5dhgSo@news.spitfire-nntp.fr>
Date: Tue, 11 Apr 2023 09:52:41 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.9.1
Subject: Re: Encoding madness
Content-Language: fr
Newsgroups: news.software.nntp
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk>
From: franck@email.invalid (Franck)
In-Reply-To: <wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Apr 2023 07:52:41 +0000 (UTC)
Injection-Info: news.spitfire-nntp.fr; logging-data="C643511701e1f85bc";
mail-complaints-to="abuse(at)spitfire-nntp.fr"
Cancel-Lock: sha1:4hEZjtuil1geQRmrriUcF0fdgo8=
sha256:rfc+350Yzc2c4Hlw+YHBx0xCBVvTZDU6psY78UkErJo=
sha1:X48YR8EC9kPeiSnmA7o99dolwbI=
sha256:3job4spZNgAuQFbV5tEKRcX07VGpp8DXH7WYi4KmV2o=
Organization: Home of Spitfire News Server, Montpellier (France)
 by: Franck - Tue, 11 Apr 2023 07:52 UTC

Hello,

> If there’s going to be a global choice of encoding then it has to be
> UTF-8.

+1

Re: Encoding madness

<Cceo2hUjGT8@news.spitfire-nntp.fr>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1630&group=news.software.nntp#1630

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!news.spitfire-nntp.fr!.POSTED!not-for-mail
Message-ID: <Cceo2hUjGT8@news.spitfire-nntp.fr>
Date: Tue, 11 Apr 2023 10:04:12 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.9.1
Subject: Re: Encoding madness
Newsgroups: news.software.nntp
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<DLEZ+IfTYqQ@news.spitfire-nntp.fr>
Content-Language: fr
From: franck@email.invalid (Franck)
In-Reply-To: <DLEZ+IfTYqQ@news.spitfire-nntp.fr>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 11 Apr 2023 08:04:12 +0000 (UTC)
Injection-Info: news.spitfire-nntp.fr; logging-data="C643513747f9665b9";
mail-complaints-to="abuse(at)spitfire-nntp.fr"
Cancel-Lock: sha1:hNPE2NbTnyD3sUNx9N8t5nBeJ7U=
sha256:icj4pODasMsg3n1AxcTfsH96YLmia6q5r7qgFijCLM4=
sha1:eBFfGjM17CnlrDn+EUZFnmF4M4Y=
sha256:B2qUBVNg7ODOi58axE0InEAHAa28RfTg5vCXWl1maEU=
Organization: Home of Spitfire News Server, Montpellier (France)
 by: Franck - Tue, 11 Apr 2023 08:04 UTC

Hello,

> Since my server has a graphical interface that displays group names and
> descriptions, I need to be able to know which encoding is used.

It looks like : https://i.ibb.co/LYcsQVQ/Console.png

Re: Encoding madness

<u13bpj$188ar$3@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1631&group=news.software.nntp#1631

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 12:12:03 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u13bpj$188ar$3@news.trigofacile.com>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Apr 2023 10:12:03 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="1319259"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.9.1
Cancel-Lock: sha1:IpmRG7QC43E8UZyezqNlsULr9GI= sha256:inIudjpqJ9HcQGIpHJ+nDSXiwm3cn7QL9eqz1cWOqvk=
sha1:F/sgGmVIFplXlIARB9GyVsMwX/4= sha256:nMXAY/ofjf1jX0dwngz19c553dpOrnWJkSLNRj8358w=
In-Reply-To: <20230411014437.0aef1026@wibble.sysadmininc.com>
 by: Julien ÉLIE - Tue, 11 Apr 2023 10:12 UTC

Hi Nigel,

> Next is a little more of a challenge. Trying to sync the descriptions.
> It wouldn't be so bad if everyone used the same encoding, however the
> majority are using ISO-8859-1, a couple are using UTF-8, some using
> ASCII and one is Non-ISO extended-ASCII.

FWIW, the descriptions encoded in UTF-8 from the ftp.isc.org newsgroup
file are here:
http://usenet.trigofacile.com/hierarchies/data/newsgroups.utf8

It may facilitate your life :-)

The conversions I found out to work are:
- cn.* and han.* are encoded in gb18030;
- fido7.*, medlux.* and relcom.* in koi8-r;
- ukr.* in koi8-u;
- nctu.*, ncu.* and tw.* in big5;
- scout.forum.chinese and scout.forum.korean in big5;
- eternal-september.*, fido.* and fr.* in utf-8;
- all the others fit well in cp1252.

> Going forward, maybe the powers that be can get their heads together
> and enforce a certain coding standard for innd (and whatever else is
> out there) that is at least maintained.

UTF-8 is the expected encoding for the descriptions returned by a LIST
NEWSGROUPS command in the NNTP protocol.

--
Julien ÉLIE

« Et maintenant, la balle est dans le camp des slalomeurs. »

Re: Encoding madness

<DI5qTE/AFXc@news.spitfire-nntp.fr>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1632&group=news.software.nntp#1632

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!pasdenom.info!news.usenet.ovh!news.spitfire-nntp.fr!.POSTED!not-for-mail
Message-ID: <DI5qTE/AFXc@news.spitfire-nntp.fr>
Date: Tue, 11 Apr 2023 14:34:50 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.9.1
Subject: Re: Encoding madness
Content-Language: fr
Newsgroups: news.software.nntp
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<u13bpj$188ar$3@news.trigofacile.com>
From: franck@email.invalid (Franck)
In-Reply-To: <u13bpj$188ar$3@news.trigofacile.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Apr 2023 12:34:50 +0000 (UTC)
Injection-Info: news.spitfire-nntp.fr; logging-data="C643553ea692c351a";
mail-complaints-to="abuse(at)spitfire-nntp.fr"
Cancel-Lock: sha1:qiRf/J2RJeovB8HJJIZ+YR+rMrk=
sha256:n4n+FygdzjueY6qrysRJnF+qff/Ya8SXcKasx5J5xlM=
sha1:WyD5iCCy6THCnzm0zIH6XCBkvLg=
sha256:mk2wRZPGjY4j13ph5dLgsGyz2jrlv+vdNDU65rLdE1E=
Organization: Home of Spitfire News Server, Montpellier (France)
 by: Franck - Tue, 11 Apr 2023 12:34 UTC

Salut Julien.

> FWIW, the descriptions encoded in UTF-8 from the ftp.isc.org newsgroup
> file are here:
>   http://usenet.trigofacile.com/hierarchies/data/newsgroups.utf8

Et tu le dis maintenant!?!?!? ;-)

Why not to put it at ftp.isc.org?

I had looked on your site, notably the List of Usenet public managed
hierarchies, but I had not found this list, reason why I coded it in
SNS. Maybe I not looked so well...

> It may facilitate your life :-)

I think so, I'll use it instead of the one listed at ftp.isc.org and
will remove some lines of code in SNS.

Franck

Re: Encoding madness

<u13te4$ae2$1$arnold@news.chmurka.net>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1633&group=news.software.nntp#1633

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.chmurka.net!.POSTED.s.v.chmurka.net!not-for-mail
From: gof-cut-this-news@cut-this-chmurka.net.invalid (Adam W.)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 15:13:09 -0000 (UTC)
Organization: news.chmurka.net
Message-ID: <u13te4$ae2$1$arnold@news.chmurka.net>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <u13bpj$188ar$3@news.trigofacile.com>
NNTP-Posting-Host: s.v.chmurka.net
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Apr 2023 15:13:09 -0000 (UTC)
Injection-Info: news.chmurka.net; posting-account="arnold"; posting-host="s.v.chmurka.net:172.24.44.20";
logging-data="10690"; mail-complaints-to="abuse-news.(at).chmurka.net"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.32-v7+ (armv7l))
Cancel-Lock: sha1:eVCnsDsHV4Nu/gGvGVegAMrKdOc=
 by: Adam W. - Tue, 11 Apr 2023 15:13 UTC

Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> wrote:

> The conversions I found out to work are:

In Poland (pl.*) we traditionally used iso-8859-2 for posting (now I think
utf-8 has become a de facto standard, but iso-8859-2 is still accepted),
but I can see that group descriptions for pl.* are just transliterated
(there are no national characters used, all are 7-bit, or us-ascii).

The main question is if currently used readers can handle utf-8 in group
descriptions. If yes, I'd stick with utf-8. If not, then I think it would
be safest to transliterate the descriptions to us-ascii (if it can be done
for all encodings; for Polish national characters it's perfectly fine, but
I don't know how it works with non-Latin alphabets like Russian or
Chinese).

Re: Encoding madness

<u13ut8$2lff$1@cabale.usenet-fr.net>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1634&group=news.software.nntp#1634

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.nntp4.net!news.gegeweb.eu!gegeweb.org!usenet-fr.net!.POSTED!not-for-mail
From: om+news@miakinen.net (Olivier Miakinen)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 17:38:16 +0200
Organization: There's no cabale
Lines: 27
Message-ID: <u13ut8$2lff$1@cabale.usenet-fr.net>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk> <u13tj6$2llds$1@dont-email.me>
NNTP-Posting-Host: 200.89.28.93.rev.sfr.net
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: cabale.usenet-fr.net 1681227496 87535 93.28.89.200 (11 Apr 2023 15:38:16 GMT)
X-Complaints-To: abuse@usenet-fr.net
NNTP-Posting-Date: Tue, 11 Apr 2023 15:38:16 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
Firefox/52.0 SeaMonkey/2.49.4
In-Reply-To: <u13tj6$2llds$1@dont-email.me>
 by: Olivier Miakinen - Tue, 11 Apr 2023 15:38 UTC

Hello Adam,

Le 11/04/2023 17:15, Adam H. Kerman a écrit :
>
>>If there’s going to be a global choice of encoding then it has to be
>>UTF-8.
>
> ASCII's advantage over UTF-8 is its universality.

I would have said exactly the opposite : UTF-8's advantage over ASCII
is its universality, because UTF-8 can express any character from any
language.

But of course ASCII's advantage over UTF-8 is that it is recognized by
all usenet softwares.

> [...]
>
> Gee whiz. The ASCII apostrophe was used ambiguously as single close quote
> AND as the combining diacritical mark for the acute accent since 1967.

Oh? Which software does that weird thing? Surely it is not a standard use
of ASCII.

--
Olivier Miakinen

Re: Encoding madness

<20230411114357.7e326ac7@wibble.sysadmininc.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1635&group=news.software.nntp#1635

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.endofthelinebbs.com!.POSTED.47.189.156.66!not-for-mail
From: sysop@endofthelinebbs.com (Nigel Reed)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 11:43:57 -0500
Organization: End Of The Line BBS
Message-ID: <20230411114357.7e326ac7@wibble.sysadmininc.com>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<u13bpj$188ar$3@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: www.sysadmininc.com; posting-host="47.189.156.66";
logging-data="3670381"; mail-complaints-to="usenet@www.sysadmininc.com"
X-Newsreader: Claws Mail 4.1.1git47 (GTK 3.24.33; x86_64-pc-linux-gnu)
 by: Nigel Reed - Tue, 11 Apr 2023 16:43 UTC

On Tue, 11 Apr 2023 12:12:03 +0200
Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> wrote:

> > Going forward, maybe the powers that be can get their heads together
> > and enforce a certain coding standard for innd (and whatever else is
> > out there) that is at least maintained.
>
> UTF-8 is the expected encoding for the descriptions returned by a
> LIST NEWSGROUPS command in the NNTP protocol.

That's good to know. I've been converting everything to fit into my
newsgroups file which is ISO-8859 so it looks like I've been going the
wrong way. Back to the drawing board now that my scripts are almost
done lol.

--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23

Re: Encoding madness

<u142p9$2m3va$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1636&group=news.software.nntp#1636

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ahk@chinet.com (Adam H. Kerman)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 16:44:25 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <u142p9$2m3va$1@dont-email.me>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk> <u13tj6$2llds$1@dont-email.me> <u13ut8$2lff$1@cabale.usenet-fr.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Apr 2023 16:44:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="34e68e3891eed7817c8a8e0390cf771b";
logging-data="2822122"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+8Frjxl2CO4cRhDfA+DCQWMdxNncOgJdY="
Cancel-Lock: sha1:y4Glo2tM/DtONGoD/amesqPOOBA=
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
 by: Adam H. Kerman - Tue, 11 Apr 2023 16:44 UTC

Olivier Miakinen <om+news@miakinen.net> wrote:
>04/11/2023 17:15, Adam H. Kerman wrote:

>>>If there’s going to be a global choice of encoding then it has to be
>>>UTF-8.

>>ASCII's advantage over UTF-8 is its universality.

>I would have said exactly the opposite : UTF-8's advantage over ASCII
>is its universality, because UTF-8 can express any character from any
>language.

Well, yes, if one's set up displays UTF-8, but every setup can use
ASCII.

>But of course ASCII's advantage over UTF-8 is that it is recognized by
>all usenet softwares.

>>[...]

>>Gee whiz. The ASCII apostrophe was used ambiguously as single close quote
>>AND as the combining diacritical mark for the acute accent since 1967.

>Oh? Which software does that weird thing? Surely it is not a standard use
>of ASCII.

ASCII was the 7-bit encoding used for teletypewriters, an improvement
over 5-bit Baudot code. Backspace/overstrike sequences were the way
diacritic marks were combined with the alphabetic character on a
teletypewriter. No, generally this wasn't implemented in computer
software.

But the notion that ASCII wasn't intended for Latin alphabets beyond
English used in America was always wrong.

Re: Encoding madness

<u144bp$c6i$1@freeq.furie.org.uk>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1637&group=news.software.nntp#1637

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.furie.org.uk!.POSTED.2001:470:1f1d:50e::11c!not-for-mail
From: tom@furie.org.uk (Tom Furie)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 17:11:21 -0000 (UTC)
Organization: Little to None
Message-ID: <u144bp$c6i$1@freeq.furie.org.uk>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<u13bpj$188ar$3@news.trigofacile.com>
<20230411114357.7e326ac7@wibble.sysadmininc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Apr 2023 17:11:21 -0000 (UTC)
Injection-Info: freeq.furie.org.uk; posting-host="2001:470:1f1d:50e::11c";
logging-data="12498"; mail-complaints-to="usenet@furie.org.uk"
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:WW/NEvvniXWYlZSQtyBmIp9lriE=
 by: Tom Furie - Tue, 11 Apr 2023 17:11 UTC

On 2023-04-11, Nigel Reed <sysop@endofthelinebbs.com> wrote:
> On Tue, 11 Apr 2023 12:12:03 +0200
> Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> wrote:
>
>> UTF-8 is the expected encoding for the descriptions returned by a
>> LIST NEWSGROUPS command in the NNTP protocol.
>
> That's good to know. I've been converting everything to fit into my
> newsgroups file which is ISO-8859 so it looks like I've been going the
> wrong way. Back to the drawing board now that my scripts are almost
> done lol.

At least the conversion from ISO-8859 to UTF-8 will be much more
straightforward than the conversion from <whatever encoding> to ISO-8859
;)

Cheers,
Tom

Re: Encoding madness

<877cuiw68y.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1638&group=news.software.nntp#1638

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 12:47:41 -0700
Organization: The Eyrie
Message-ID: <877cuiw68y.fsf@hope.eyrie.org>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<87fs96wd4j.fsf@hope.eyrie.org> <u145ij$2mbgj$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="401"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:Fxw8I8C85sBzKUhLtMFladHLmfI=
 by: Russ Allbery - Tue, 11 Apr 2023 19:47 UTC

"Adam H. Kerman" <ahk@chinet.com> writes:

> If you do that, may I request that ASCII equivalents be substituted for
> UTF-8 punctuation in brief descriptions? Pretty please?

The goal of all of that machinery is that the hierarchy administrators
should be canonical for the newsgroups entries for their hierarchy.
Encoding is one of those things where we need to standardize in order to,
say, comply with the NNTP standard, but I'm not willing to make any other
editorial judgments because it gets into too much annoying work. So this
is something you should take up with the hierarchy administrators.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: Encoding madness

<u14ecu$2mso6$5@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1639&group=news.software.nntp#1639

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ahk@chinet.com (Adam H. Kerman)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 20:02:38 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <u14ecu$2mso6$5@dont-email.me>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <87fs96wd4j.fsf@hope.eyrie.org> <u145ij$2mbgj$1@dont-email.me> <877cuiw68y.fsf@hope.eyrie.org>
Injection-Date: Tue, 11 Apr 2023 20:02:38 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="34e68e3891eed7817c8a8e0390cf771b";
logging-data="2847494"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/vCFvA29RXa3WjB6VhjIYp6qGdaIVNyUY="
Cancel-Lock: sha1:NNYaVueiBHLkz015d438WOMh9vI=
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
 by: Adam H. Kerman - Tue, 11 Apr 2023 20:02 UTC

Russ Allbery <eagle@eyrie.org> wrote:
>"Adam H. Kerman" <ahk@chinet.com> writes:

>>If you do that, may I request that ASCII equivalents be substituted for
>>UTF-8 punctuation in brief descriptions? Pretty please?

>The goal of all of that machinery is that the hierarchy administrators
>should be canonical for the newsgroups entries for their hierarchy.
>Encoding is one of those things where we need to standardize in order to,
>say, comply with the NNTP standard, but I'm not willing to make any other
>editorial judgments because it gets into too much annoying work. So this
>is something you should take up with the hierarchy administrators.

I apologize for suggesting additional programming work for you. I change
my request to asking for an amendment to your README in which you might
urge a proponent or hierarchy administrator not to use UTF-8 punctuation
for which ASCII punctuation would suffice, to avoid needlessly turning a
description into UTF-8.

Re: Encoding madness

<u13tj6$2llds$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1640&group=news.software.nntp#1640

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ahk@chinet.com (Adam H. Kerman)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 15:15:50 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <u13tj6$2llds$1@dont-email.me>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Apr 2023 15:15:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="34e68e3891eed7817c8a8e0390cf771b";
logging-data="2807228"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/JOLCqBRdY7ZvbWEyg19yL3su1ebYijYI="
Cancel-Lock: sha1:LvbrnyaP5aCoQjbA8xHTUOvwZUI=
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
 by: Adam H. Kerman - Tue, 11 Apr 2023 15:15 UTC

Richard Kettlewell <invalid@invalid.invalid> wrote:
>Nigel Reed <sysop@endofthelinebbs.com> writes:

>>I know there isn't a standard encoding for the newsgroup file but that
>>may have been a bit of an oversight now that some people are trying to
>>run a clean server.

>>Going forward, maybe the powers that be can get their heads together
>>and enforce a certain coding standard for innd (and whatever else is
>>out there) that is at least maintained. Personally, I don't care which
>>one we end up with, ISO-8859 seems to be the far more popular (7
>>servers) followed by ASCII (4) then UTF-8 (3) .

>If there’s going to be a global choice of encoding then it has to be
>UTF-8.

ASCII's advantage over UTF-8 is its universality.

You yourself just used the UTF-8 character code for single close quote
ambiguously as an apostrophe. The character is has been used ambiguously
in the current version of Unicode, replacing another character code that
was used to indicate a glottal stop as a letter modifier.

Gee whiz. The ASCII apostrophe was used ambiguously as single close quote
AND as the combining diacritical mark for the acute accent since 1967.
Where is the UTF-8 advantage if there continues to be ambiguously-used
character codes for such common punctuation marks?

If there's going to be a global choice, then stop using UTF-8 character
codes to substitue for ASCII in plain text communication. Use open and
close single and double quotes ONLY in typography, not email and not
Usenet. This thwarts communication. It makes a difference as ASCII is
universal and UTF-8 is not.

I'm pointing out again that ASCII had combining characters but it didn't
include all possible diacritical marks like umlaut, but it has acute,
grave, circumflex, tilde, slash, cedilla, and I'm sure I've forgotten.

Teletypewriters could perform the combining action with a backspace/
overstrike sequence but terminals didn't usually display them.

>>I guess we'd need all the servers to want to agree and update their
>>files accordingly.

>That’s the hard bit...

Re: Encoding madness

<u14jqj$296$1$arnold@news.chmurka.net>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1641&group=news.software.nntp#1641

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.chmurka.net!.POSTED.s.v.chmurka.net!not-for-mail
From: gof-cut-this-news@cut-this-chmurka.net.invalid (Adam W.)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 21:35:15 -0000 (UTC)
Organization: news.chmurka.net
Message-ID: <u14jqj$296$1$arnold@news.chmurka.net>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <u13bpj$188ar$3@news.trigofacile.com> <u13te4$ae2$1$arnold@news.chmurka.net> <87bkjuwd27.fsf@hope.eyrie.org>
NNTP-Posting-Host: s.v.chmurka.net
Injection-Date: Tue, 11 Apr 2023 21:35:15 -0000 (UTC)
Injection-Info: news.chmurka.net; posting-account="arnold"; posting-host="s.v.chmurka.net:172.24.44.20";
logging-data="2342"; mail-complaints-to="abuse-news.(at).chmurka.net"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.32-v7+ (armv7l))
Cancel-Lock: sha1:aZJKiTVsdX8sgcv9DOiTrcZJE4w=
 by: Adam W. - Tue, 11 Apr 2023 21:35 UTC

Russ Allbery <eagle@eyrie.org> wrote:

> It definitely cannot. It's rare to find a language where that can be done
> without losing information (only in Europe, essentially).

Well, to be honest, you lose some information, but it's very rare and can
usually be deduced from context.

Re: Encoding madness

<1681245447.bystand@zzo38computer.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1642&group=news.software.nntp#1642

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: news@zzo38computer.org.invalid
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 15:00:29 -0700
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <1681245447.bystand@zzo38computer.org>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
MIME-Version: 1.0
Injection-Info: dont-email.me; posting-host="18bb7c95cb2ade6da2882704bfd96bc5";
logging-data="2870094"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19FnRyXnhSbwdXDYKXwVTpL"
User-Agent: bystand/1.3.0pre1
Cancel-Lock: sha1:Pyilmis3dXi+WfF4s/nJox+Z1JQ=
 by: news@zzo38computer.org.invalid - Tue, 11 Apr 2023 22:00 UTC

Nigel Reed <sysop@endofthelinebbs.com> wrote:
> This becomes a problem when trying to do a diff or other operations
> trying to match group names.

My opinion is that newsgroup names should be purely ASCII (there are many
benefits to this, and using non-ASCII characters in newsgroup names and
domain names and commands and configuration files can cause many problems,
including security issues (especially if any Unicode-based encoding is
used; non-Unicode has less security issues, but still is not worth it to
use non-ASCII in these cases), comparisons, input, etc).

Descriptions for non-English newsgroups, and non-English articles in such
newsgroups, should probably use the appropriate encodings for those
languages, rather than ASCII only (it is not as problematic to use
non-ASCII characters in newsgroup descriptions, and there are clearly
benefits to doing so in some cases, so it should be permitted in
descriptions and in articles; however, if the text is English only then
it is almost always more beneficial to stay to ASCII only, I think).

> Going forward, maybe the powers that be can get their heads together
> and enforce a certain coding standard for innd (and whatever else is
> out there) that is at least maintained. Personally, I don't care which
> one we end up with, ISO-8859 seems to be the far more popular (7
> servers) followed by ASCII (4) then UTF-8 (3) .

Well, fortunately, ISO-8859-1 and UTF-8 are both supersets of ASCII, so
if you use ASCII as much as possible then it will still work. (But, I
really hate Unicode; it is full of problems, including Han unification
and other complications; and it is a stateful character set even though
the encoding is stateless. TRON character code is better in some ways
(especially for Japanese text), and I have done some work using this.)

However, also, enforcing a certain coding standard (regardless of what it
might be, whether it is Unicode or TRON or something else) can be a problem
when you will need other encodings for a reason not previously known by
whoever enforced them. Making recommendations can be helpful though, but I
think that ASCII should be used when possible, and in some contexts (e.g.
the names of the commands, etc, in the computer programming) should be
required to be ASCII only.

It might also be worth to mention what character encodings it uses in the
CAPABILITIES, on servers where that is applicable. (The RFC says that it
should be UTF-8, but I think that this is a mistake in the design of the
protocol. Capabilities and commands should be pure ASCII, but this should
not mean that any text in articles, descriptions, MOTD, etc has to be pure
ASCII; it can use other character sets, including the possibility of ones
which might be incompatible with Unicode, and including TRON codes too.)

Many people, they just want to put Unicode in everything, without actually
understanding Unicode or international text or security or anything else,
and this just makes a mess (especially since Unicode itself is messy, but
even if using something else, just putting it in without any consideration,
does not substitute for actual understanding). So, don't do that, please.

(And, for Usenet client programs intended for PC (if using DOS or other
text-mode programs), use of PC character set may be beneficial.)

(I run a NNTP server with my own newsgroups, which are not (currently)
considered part of Usenet, and currently have no need for non-ASCII
descriptions, but in future if it does, then I will consider what to do.
However, I also don't use INN, anyways.)

--
Don't laugh at the moon when it is day time in France.

Re: Encoding madness

<u14mq3$2nkma$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1643&group=news.software.nntp#1643

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ahk@chinet.com (Adam H. Kerman)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 22:26:11 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <u14mq3$2nkma$1@dont-email.me>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <u13te4$ae2$1$arnold@news.chmurka.net> <87bkjuwd27.fsf@hope.eyrie.org> <u14jqj$296$1$arnold@news.chmurka.net>
Injection-Date: Tue, 11 Apr 2023 22:26:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="595da4453ea6f81ea9a07e161a9c1d7b";
logging-data="2872010"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3cNPPIu44Pg46EEz57b9lgds0PU+i57w="
Cancel-Lock: sha1:qShlG+J0iWpSuaGb7UssnOMe3DM=
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
 by: Adam H. Kerman - Tue, 11 Apr 2023 22:26 UTC

Adam W. <gof-cut-this-news@cut-this-chmurka.net.invalid> wrote:
>Russ Allbery <eagle@eyrie.org> wrote:

>>It definitely cannot. It's rare to find a language where that can be done
>>without losing information (only in Europe, essentially).

>Well, to be honest, you lose some information, but it's very rare and can
>usually be deduced from context.

In a language that doesn't use the Latin alphabet? C'mon.

Re: Encoding madness

<u1444s$28hn1$3@news.xmission.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1644&group=news.software.nntp#1644

  copy link   Newsgroups: news.software.nntp
Followup: alt.flame
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: legalize+jeeves@mail.xmission.com (Richard)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Followup-To: alt.flame
Date: Tue, 11 Apr 2023 17:07:40 -0000 (UTC)
Organization: multi-cellular, biological
Sender: legalize+jeeves@mail.xmission.com
Message-ID: <u1444s$28hn1$3@news.xmission.com>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk>
Reply-To: (Richard) legalize+jeeves@mail.xmission.com
Injection-Date: Tue, 11 Apr 2023 17:07:40 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:2607:fa18:0:beef::4";
logging-data="2377441"; mail-complaints-to="abuse@xmission.com"
X-Reply-Etiquette: No copy by email, please
Mail-Copies-To: never
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: legalize@shell.xmission.com (Richard)
 by: Richard - Tue, 11 Apr 2023 17:07 UTC

[Please do not mail me a copy of your followup]

Richard Kettlewell <invalid@invalid.invalid> spake the secret code
<wwvh6tmg8oa.fsf@LkoBDZeT.terraraq.uk> thusly:

>If there's going to be a global choice of encoding then it has to be
>UTF-8.

You mean, so you can use a gratuitously fancy apostrophe character
instead of the ASCII ' character that serves exactly the same purpose
with fewer problems?

UTF-8 is great for non-Latin codepoints like Asian languages and
Klingon.

Where UTF-8 fails is in using fancy codepoints for the functional
equivalent of the same ASCII character.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Re: Encoding madness

<87fs96wd4j.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1645&group=news.software.nntp#1645

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 10:19:08 -0700
Organization: The Eyrie
Message-ID: <87fs96wd4j.fsf@hope.eyrie.org>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="29232"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:yPOqgZ+jE1xCvfQC9s8x4Dz8feo=
 by: Russ Allbery - Tue, 11 Apr 2023 17:19 UTC

Nigel Reed <sysop@endofthelinebbs.com> writes:

> Going forward, maybe the powers that be can get their heads together and
> enforce a certain coding standard for innd (and whatever else is out
> there) that is at least maintained. Personally, I don't care which one
> we end up with, ISO-8859 seems to be the far more popular (7 servers)
> followed by ASCII (4) then UTF-8 (3) .

It's been on my list for years to encode the ftp.isc.org newsgroups file
uniformly in UTF-8, which I think is a prerequisite for enforcing
something in innd, but it's a bunch of tedious work and I haven't found
the time yet.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: Encoding madness

<87bkjuwd27.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1646&group=news.software.nntp#1646

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 10:20:32 -0700
Organization: The Eyrie
Message-ID: <87bkjuwd27.fsf@hope.eyrie.org>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<u13bpj$188ar$3@news.trigofacile.com>
<u13te4$ae2$1$arnold@news.chmurka.net>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="29232"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:sm6sETOiXPUzRayoV2XxQLFyOsE=
 by: Russ Allbery - Tue, 11 Apr 2023 17:20 UTC

gof-cut-this-news@cut-this-chmurka.net.invalid (Adam W.) writes:

> The main question is if currently used readers can handle utf-8 in group
> descriptions. If yes, I'd stick with utf-8. If not, then I think it would
> be safest to transliterate the descriptions to us-ascii (if it can be done
> for all encodings;

It definitely cannot. It's rare to find a language where that can be done
without losing information (only in Europe, essentially).

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: Encoding madness

<u145ij$2mbgj$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1647&group=news.software.nntp#1647

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ahk@chinet.com (Adam H. Kerman)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Tue, 11 Apr 2023 17:32:03 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <u145ij$2mbgj$1@dont-email.me>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <87fs96wd4j.fsf@hope.eyrie.org>
Injection-Date: Tue, 11 Apr 2023 17:32:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="34e68e3891eed7817c8a8e0390cf771b";
logging-data="2829843"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xXC1VAazKrYRwiQJqhEosAYL4B6d/S1U="
Cancel-Lock: sha1:xQbYrG6SVhULVShFZhPx3963gNQ=
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
 by: Adam H. Kerman - Tue, 11 Apr 2023 17:32 UTC

Russ Allbery <eagle@eyrie.org> wrote:
>Nigel Reed <sysop@endofthelinebbs.com> writes:

>>Going forward, maybe the powers that be can get their heads together and
>>enforce a certain coding standard for innd (and whatever else is out
>>there) that is at least maintained. Personally, I don't care which one
>>we end up with, ISO-8859 seems to be the far more popular (7 servers)
>>followed by ASCII (4) then UTF-8 (3) .

>It's been on my list for years to encode the ftp.isc.org newsgroups file
>uniformly in UTF-8, which I think is a prerequisite for enforcing
>something in innd, but it's a bunch of tedious work and I haven't found
>the time yet.

If you do that, may I request that ASCII equivalents be substituted for
UTF-8 punctuation in brief descriptions? Pretty please?

Re: Encoding madness

<u15pee$1aneb$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1648&group=news.software.nntp#1648

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Wed, 12 Apr 2023 10:17:18 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u15pee$1aneb$1@news.trigofacile.com>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<1681245447.bystand@zzo38computer.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 12 Apr 2023 08:17:18 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="1400267"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.9.1
Cancel-Lock: sha1:IkFwtnTa65oan40IwpbuBRrrkDA= sha256:Zde84nN2BPnI4M80E6UjHTKCU/GCK1mV7QHPza1GNVQ=
sha1:BdJAq3T46uPr0iQlwt/uO63JZ2k= sha256:9BzBNAgSBGAZvF6nNJLQL+RpE6yMB6iocvN+PVUGUjs=
In-Reply-To: <1681245447.bystand@zzo38computer.org>
 by: Julien ÉLIE - Wed, 12 Apr 2023 08:17 UTC

Hi all,

> The RFC says that it
> should be UTF-8, but I think that this is a mistake in the design of the
> protocol. Capabilities and commands should be pure ASCII, but this should
> not mean that any text in articles, descriptions, MOTD, etc has to be pure
> ASCII; it can use other character sets, including the possibility of ones
> which might be incompatible with Unicode, and including TRON codes too.

We need an interoperable way to provide texts.
Please note RFC 2277 (BCP 18) about charsets:

Protocols MUST be able to use the UTF-8 charset, which consists of
the ISO 10646 coded character set combined with the UTF-8 character
encoding scheme, as defined in [10646] Annex R (published in
Amendment 2), for all text.

Protocols MAY specify, in addition, how to use other charsets or
other character encoding schemes for ISO 10646, such as UTF-16, but
lack of an ability to use UTF-8 is a violation of this policy; such a
violation would need a variance procedure ([BCP9] section 9) with
clear and solid justification in the protocol specification document
before being entered into or advanced upon the standards track.

For existing protocols or protocols that move data from existing
datastores, support of other charsets, or even using a default other
than UTF-8, may be a requirement. This is acceptable, but UTF-8
support MUST be possible.

> (I run a NNTP server with my own newsgroups, which are not (currently)
> considered part of Usenet, and currently have no need for non-ASCII
> descriptions, but in future if it does, then I will consider what to do.
> However, I also don't use INN, anyways.)

FWIW, INN does not enforce UTF-8 in the descriptions of newsgroups. You
can use any encoding you want for them.

--
Julien ÉLIE

« Ils ont refusé une offre de Normand ?!? » (Astérix)

Re: Encoding madness

<871qkpqhyy.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1649&group=news.software.nntp#1649

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Wed, 12 Apr 2023 07:43:17 -0700
Organization: The Eyrie
Message-ID: <871qkpqhyy.fsf@hope.eyrie.org>
References: <20230411014437.0aef1026@wibble.sysadmininc.com>
<u14jqj$296$1$arnold@news.chmurka.net> <u14mq3$2nkma$1@dont-email.me>
<u166re$nl1$1$arnold@news.chmurka.net> <u168us$31asn$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: hope.eyrie.org;
logging-data="25020"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:vyJpq060NnUoqIUyHu6tJEPgyro=
 by: Russ Allbery - Wed, 12 Apr 2023 14:43 UTC

"Adam H. Kerman" <ahk@chinet.com> writes:
> Adam W. <gof-cut-this-news@cut-this-chmurka.net.invalid> wrote:
>> Adam H. Kerman <ahk@chinet.com> wrote:

>>>> Well, to be honest, you lose some information, but it's very rare and
>>>> can usually be deduced from context.

>>> In a language that doesn't use the Latin alphabet? C'mon.

>> No, I'm only talking about Polish.

> You are. Russ wasn't.

Yeah, but I understood Adam was only talking about Polish in his reply.

It's common in a lot of European languages to be able to transliterate to
ASCII using various schemes without losing *too* much information. German
has a standard scheme, some Scandinavian languages have an old scheme that
used to be used when it was hard to find anything other than ASCII, some
other European languages are still comprehensible if all the diacritic
marks are stripped even though it looks weird, etc. Apparently Polish is
one of those (I know very little about Polish, sadly).

Arguably, English is itself a case of being able to transliterate to ASCII
without losing too much information, depending on how you feel about the
correct spelling of Zoë and naïve (I think everyone but the New Yorker has
given up on coöperate), or how much you care about reproducing English
poetry containing words like learnèd.

But if one gets too far beyond Europe, or even farther into eastern Europe
and non-Romance languages, the transliterations get more and more dubious
or simply nonexistent.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: Encoding madness

<u16idd$e96$1$arnold@news.chmurka.net>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1650&group=news.software.nntp#1650

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!weretis.net!feeder8.news.weretis.net!news.chmurka.net!.POSTED.s.v.chmurka.net!not-for-mail
From: gof-cut-this-news@cut-this-chmurka.net.invalid (Adam W.)
Newsgroups: news.software.nntp
Subject: Re: Encoding madness
Date: Wed, 12 Apr 2023 15:23:26 -0000 (UTC)
Organization: news.chmurka.net
Message-ID: <u16idd$e96$1$arnold@news.chmurka.net>
References: <20230411014437.0aef1026@wibble.sysadmininc.com> <u14jqj$296$1$arnold@news.chmurka.net> <u14mq3$2nkma$1@dont-email.me> <u166re$nl1$1$arnold@news.chmurka.net> <u168us$31asn$2@dont-email.me> <871qkpqhyy.fsf@hope.eyrie.org>
NNTP-Posting-Host: s.v.chmurka.net
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 12 Apr 2023 15:23:26 -0000 (UTC)
Injection-Info: news.chmurka.net; posting-account="arnold"; posting-host="s.v.chmurka.net:172.24.44.20";
logging-data="14630"; mail-complaints-to="abuse-news.(at).chmurka.net"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.32-v7+ (armv7l))
Cancel-Lock: sha1:/WTx14RcMYFsbW9d0lSg3zT8LD8=
 by: Adam W. - Wed, 12 Apr 2023 15:23 UTC

Russ Allbery <eagle@eyrie.org> wrote:

> Yeah, but I understood Adam was only talking about Polish in his reply.

Yes, exactly. That's the only non-English language I know.

> German has a standard scheme,

Do you mean substituting umlauts with their Latin equivalents and adding
"e"?

ä = ae
ö = oe
ü = ue

At least that's what I found:

https://blogs.transparent.com/german/writing-the-letters-%E2%80%9Ca%E2%80%9D-%E2%80%9Co%E2%80%9D-and-%E2%80%9Cu%E2%80%9D-without-a-german-keyboard/

I also know that their ß (scharfes S) can be substituted with ss.

> some other European languages are still comprehensible if all the
> diacritic marks are stripped even though it looks weird, etc.
> Apparently Polish is one of those (I know very little about Polish,
> sadly).

It is. There are some word plays, because some words have different
meanings with and without diacritics (for example, "łaska" and "laska"
mean different things), but they're rare and correct meaning can be
deduced from context.

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor