Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

No wonder Clairol makes so much money selling shampoo. Lather, Rinse, Repeat is an infinite loop!


computers / alt.os.linux.slackware / Re: Why SOME chars nonASCII?

SubjectAuthor
* Why SOME chars nonASCII?qhsgrant
`* Re: Why SOME chars nonASCII?Eli the Bearded
 +* Re: Why SOME chars nonASCII?Mike Spencer
 |+* Re: Why SOME chars nonASCII?Eli the Bearded
 ||+- Re: Why SOME chars nonASCII?Mike Spencer
 ||`* Re: Why SOME chars nonASCII?Sylvain Robitaille
 || `* Re: Why SOME chars nonASCII?Eli the Bearded
 ||  `- Re: Why SOME chars nonASCII?Sylvain Robitaille
 |`- Re: Why SOME chars nonASCII?Richmond
 `* Re: Why SOME chars nonASCII?Richard Kettlewell
  `* Re: Why SOME chars nonASCII?Eli the Bearded
   `- Re: Why SOME chars nonASCII?Richard Kettlewell

1
Why SOME chars nonASCII?

<sb0fdv$jtd$1@gioia.aioe.org>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=416&group=alt.os.linux.slackware#416

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!aioe.org!dI73xztL/dAfdMKxMh0rkQ.user.gioia.aioe.org.POSTED!not-for-mail
From: qhsgrant@outlook.com
Newsgroups: alt.os.linux.slackware
Subject: Why SOME chars nonASCII?
Date: Wed, 23 Jun 2021 23:17:19 +0000 (UTC)
Organization: yes please
Lines: 5
Message-ID: <sb0fdv$jtd$1@gioia.aioe.org>
NNTP-Posting-Host: dI73xztL/dAfdMKxMh0rkQ.user.gioia.aioe.org
X-Complaints-To: abuse@aioe.org
X-Notice: Filtered by postfilter v. 0.9.2
 by: qhsgrant@outlook.com - Wed, 23 Jun 2021 23:17 UTC

It seems absurd to me that a recent [few years] fad is to make some
chars 2-bytes, amongst existing one-byte-ASCII-strings.
What is the motive for this?
-- CRG

Re: Why SOME chars nonASCII?

<eli$2106232057@qaz.wtf>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=418&group=alt.os.linux.slackware#418

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: Thu, 24 Jun 2021 00:57:09 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2106232057@qaz.wtf>
References: <sb0fdv$jtd$1@gioia.aioe.org>
Injection-Date: Thu, 24 Jun 2021 00:57:09 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="11118"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Thu, 24 Jun 2021 00:57 UTC

In alt.os.linux.slackware, <qhsgrant@outlook.com> wrote:
> It seems absurd to me that a recent [few years] fad is to make some
> chars 2-bytes, amongst existing one-byte-ASCII-strings.
> What is the motive for this?

Mostly it is because of all of the non-English languages that don't fit
in the seven bits of ASCII. Even the eight bit ISO-8859-x family doesn't
cover lots of well-used languages. UTF-8 gives you most living languages
and many dead ones. UTF-8 isn't strictly "2-bytes", it is a variable
width encoding with ASCII compatibility for ASCII characters. High bit
sequences can be two, three, or four octets.

C encoding verifier I wrote:

/* Bit patterns for legitimate UTF-8:
*
* non-highbit:
* 0bbbbbbb
* two octet highbit:
* 110bbbbb 10bbbbbb
* three octet highbit:
* 1110bbbb 10bbbbbb 10bbbbbb
* four octet highbit:
* 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb
*/

/* low bit (no highbit)
* 0bbbbbbb
* note that null is low bit
*/
#define UTF8_LOWBIT(oct) (0x00 == ((oct) & 0x80))

/* any continuation octet
* 10bbbbbb
*/
#define UTF8_CONTINUATION(oct) (0x80 == ((oct) & 0xC0))

/* start of two octet
* 110bbbbb
*/
#define UTF8_SEQUENCE_2(oct) (0xC0 == ((oct) & 0xE0))

/* start of three octet
* 1110bbbb
*/
#define UTF8_SEQUENCE_3(oct) (0xE0 == ((oct) & 0xF0))

/* start of four octet
* 11110bbb
*/
#define UTF8_SEQUENCE_4(oct) (0xF0 == ((oct) & 0xF8))

/* checks a string str of length len for legit UTF-8 bit patterns.
* null will not terminate the string -- those are legit 7bit ASCII.
* returns byte offset of first non-legit sequence or -1 if 100% okay.
*/
int
check_utf8(str, len)
unsigned char* str;
int len;
{ int seq, pos, run, octet;
run = 0;

for(pos = 0; pos < len; pos++) {
octet = str[pos];

/* start of a sequence */
if(run == 0) {
seq = pos;
}

if(UTF8_LOWBIT(octet)) {
if( run != 0 ) {
/* whoops, wanted highbit there */
return seq;
}
continue;
}

if(UTF8_CONTINUATION(octet)) {
if( run ) {
/* one of our expected run */
run --;
continue;
}
/* whoops, not the right spot for this */
return seq;
}

if( run ) {
/* whoops, should have had a continuation octet above */
return seq;
}

if(UTF8_SEQUENCE_2(octet)) {
run = 1; /* one more */
continue;
}

if(UTF8_SEQUENCE_3(octet)) {
run = 2; /* two more */
continue;
}

if(UTF8_SEQUENCE_4(octet)) {
run = 3; /* three more */
continue;
}

/* yikes! fall through! */
return seq;
}

return -1;
} /* check_utf8() */

https://github.com/Eli-the-Bearded/eli-mailx/blob/master/utf-8.c

Elijah
------
using K&R style to match the rest of mailx

Re: Why SOME chars nonASCII?

<87k0mkkqef.fsf@bogus.nodomain.nowhere>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=420&group=alt.os.linux.slackware#420

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mds@bogus.nodomain.nowhere (Mike Spencer)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: 23 Jun 2021 22:22:32 -0300
Organization: Bridgewater Institute for Advanced Study - Blacksmith Shop
Lines: 38
Message-ID: <87k0mkkqef.fsf@bogus.nodomain.nowhere>
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf>
Injection-Info: reader02.eternal-september.org; posting-host="9270457200ae38251cb095b1e531bb8a";
logging-data="20499"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+BJOLA3bq032orozZ3PVfa2AIA9TfKVok="
Cancel-Lock: sha1:f6hGD/7YixLhREycGuKguqzRJzc=
X-Newsreader: Gnus v5.7/Emacs 20.7
X-Clacks-Overhead: 4GH GNU Terry Pratchett
 by: Mike Spencer - Thu, 24 Jun 2021 01:22 UTC

Eli the Bearded <*@eli.users.panix.com> writes:

> In alt.os.linux.slackware, <qhsgrant@outlook.com> wrote:
>
>> It seems absurd to me that a recent [few years] fad is to make some
>> chars 2-bytes, amongst existing one-byte-ASCII-strings.
>> What is the motive for this?
>
> Mostly it is because of all of the non-English languages that don't fit
> in the seven bits of ASCII. Even the eight bit ISO-8859-x family doesn't
> cover lots of well-used languages.

Several of my correspondents (using Mac or Windoes) writing in
English do this in their own text and in text/articles copied from the
net.

Oddly, the non-ASCII chars are almost all punctuation: left & right
double & single quotes, em dash, ellipses and the degree symbol. Very
occasionally, there are French or Spanish names with non-ASCII chars
but the big nuisance is the punctuation. And of course, they send it
as quoted-printable.

I have an Emacs macro that finds the QP strings for the punctuation
and reverts them to ASCII before rmail-decode-quoted-printable but
it's a PITA.

> UTF-8 gives you most living languages and many dead ones. UTF-8
> isn't strictly "2-bytes", it is a variable width encoding with ASCII
> compatibility for ASCII characters. High bit sequences can be two,
> three, or four octets.
>
> C encoding verifier I wrote:
>
> [snip]
--
Mike Spencer Nova Scotia, Canada

Re: Why SOME chars nonASCII?

<eli$2106232206@qaz.wtf>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=421&group=alt.os.linux.slackware#421

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: Thu, 24 Jun 2021 02:19:09 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2106232206@qaz.wtf>
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf> <87k0mkkqef.fsf@bogus.nodomain.nowhere>
Injection-Date: Thu, 24 Jun 2021 02:19:09 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="1551"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Thu, 24 Jun 2021 02:19 UTC

In alt.os.linux.slackware, Mike Spencer <mds@bogus.nodomain.nowhere> wrote:
> Several of my correspondents (using Mac or Windoes) writing in
> English do this in their own text and in text/articles copied from the
> net.

Yes, a related problem. With UTF-8 comes a lot more punctuation options,
and a large number of programs silently "correct" things. Some people
believe very dearly the fancy punctionation is better, others believe
the opposite.

> Oddly, the non-ASCII chars are almost all punctuation: left & right
> double & single quotes, em dash, ellipses and the degree symbol. Very
> occasionally, there are French or Spanish names with non-ASCII chars
> but the big nuisance is the punctuation. And of course, they send it
> as quoted-printable.

Quoted printable is how to make UTF-8 seven bit safe and _mostly_
readable. That's the real goal of QP, making it _mostly_ readable if you
don't have software that can display it. Base64 is not readable and gets
used sometimes.

> I have an Emacs macro that finds the QP strings for the punctuation
> and reverts them to ASCII before rmail-decode-quoted-printable but
> it's a PITA.

I have vim settings for the same purpose, and a simple Perl script
for use outside of vim. The Perl script will look for my vim
configuration first, and if it doesn't find it use a built in set of
rules.

https://qaz.wtf/tmp/textify

I basically only try to fix punctuation issues I've encountered.
I do not try to replace accented vowels, for example.

Elijah
------
knows German rules for that, but not, say, French ones

Re: Why SOME chars nonASCII?

<87bl7vltov.fsf@bogus.nodomain.nowhere>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=422&group=alt.os.linux.slackware#422

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mds@bogus.nodomain.nowhere (Mike Spencer)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: 24 Jun 2021 02:26:08 -0300
Organization: Bridgewater Institute for Advanced Study - Blacksmith Shop
Lines: 11
Message-ID: <87bl7vltov.fsf@bogus.nodomain.nowhere>
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf> <87k0mkkqef.fsf@bogus.nodomain.nowhere> <eli$2106232206@qaz.wtf>
Injection-Info: reader02.eternal-september.org; posting-host="9270457200ae38251cb095b1e531bb8a";
logging-data="30670"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ihVnrgEAK4qZjTNqZWIZFr+rx1A/AiGM="
Cancel-Lock: sha1:jzWuTn4iesCmDVOkX2gN+7paIGs=
X-Newsreader: Gnus v5.7/Emacs 20.7
X-Clacks-Overhead: 4GH GNU Terry Pratchett
 by: Mike Spencer - Thu, 24 Jun 2021 05:26 UTC

Eli the Bearded <*@eli.users.panix.com> writes:

> I basically only try to fix punctuation issues I've encountered.
> I do not try to replace accented vowels, for example.

Same. After undoing QP, the UTF8 punctuation apears in Emacs as 3
escaped octal digits making for hard reading.

--
Mike Spencer Nova Scotia, Canada

Re: Why SOME chars nonASCII?

<871r8rhf4y.fsf@LkoBDZeT.terraraq.uk>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=425&group=alt.os.linux.slackware#425

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.nntp.terraraq.uk!not-for-mail
From: invalid@invalid.invalid (Richard Kettlewell)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: Thu, 24 Jun 2021 08:54:05 +0100
Organization: terraraq NNTP server
Message-ID: <871r8rhf4y.fsf@LkoBDZeT.terraraq.uk>
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: mantic.terraraq.uk; posting-host="nntp.terraraq.uk:2a00:1098:0:86:1000:3f:0:2";
logging-data="6920"; mail-complaints-to="usenet@mantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:Rgf4fKWSKWl8Ko5V3t9310IXyPg=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Thu, 24 Jun 2021 07:54 UTC

Eli the Bearded <*@eli.users.panix.com> writes:

> In alt.os.linux.slackware, <qhsgrant@outlook.com> wrote:
>> It seems absurd to me that a recent [few years] fad is to make some
>> chars 2-bytes, amongst existing one-byte-ASCII-strings.
>> What is the motive for this?
>
> Mostly it is because of all of the non-English languages that don't fit
> in the seven bits of ASCII. Even the eight bit ISO-8859-x family doesn't
> cover lots of well-used languages. UTF-8 gives you most living languages
> and many dead ones. UTF-8 isn't strictly "2-bytes", it is a variable
> width encoding with ASCII compatibility for ASCII characters. High bit
> sequences can be two, three, or four octets.
>
> C encoding verifier I wrote:
[...]
> https://github.com/Eli-the-Bearded/eli-mailx/blob/master/utf-8.c

That has several bugs...

1) It accepts non-minimal sequences such as F0808080.

2) It accepts sequences mapping to UTF-16 surrogates, such as EDA080.

3) It accepts sequences mapping outside the Unicode code point range,
such as F7808080.

See https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf D92 for the
specification.

--
https://www.greenend.org.uk/rjk/

Re: Why SOME chars nonASCII?

<eli$2107011643@qaz.wtf>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=434&group=alt.os.linux.slackware#434

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: Thu, 1 Jul 2021 20:44:00 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2107011643@qaz.wtf>
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf> <871r8rhf4y.fsf@LkoBDZeT.terraraq.uk>
Injection-Date: Thu, 1 Jul 2021 20:44:00 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="4777"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Thu, 1 Jul 2021 20:44 UTC

In alt.os.linux.slackware, Richard Kettlewell <invalid@invalid.invalid> wrote:
> Eli the Bearded <*@eli.users.panix.com> writes:
>> C encoding verifier I wrote:
>> https://github.com/Eli-the-Bearded/eli-mailx/blob/master/utf-8.c
>
> That has several bugs...
>
> 1) It accepts non-minimal sequences such as F0808080.
> 2) It accepts sequences mapping to UTF-16 surrogates, such as EDA080.
> 3) It accepts sequences mapping outside the Unicode code point range,
> such as F7808080.

Interesting critique. I may fix those, but I'm not sure they'll ever be
relevant to the level of strictness I need. I'm looking to catch
mislabeled "charset"s not devious attacks.

> See https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf D92 for the
> specification.

The Unicode website was briefly down, so I had put off responding to
this until I could check that. Would have been nice if you had included
a page number since that document is large and doesn't include a TOC.
Page 123 as numbered in document, page 54 as numbered by my PDF reader.

Elijah
------
recalls now non-minimal UTF-8 being used escape Apache document root once

Re: Why SOME chars nonASCII?

<87a6n5ds0x.fsf@LkoBDZeT.terraraq.uk>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=435&group=alt.os.linux.slackware#435

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!news.niel.me!news.gegeweb.eu!gegeweb.org!nntp.terraraq.uk!.POSTED.nntp.terraraq.uk!not-for-mail
From: invalid@invalid.invalid (Richard Kettlewell)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: Fri, 02 Jul 2021 09:44:14 +0100
Organization: terraraq NNTP server
Message-ID: <87a6n5ds0x.fsf@LkoBDZeT.terraraq.uk>
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf>
<871r8rhf4y.fsf@LkoBDZeT.terraraq.uk> <eli$2107011643@qaz.wtf>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: mantic.terraraq.uk; posting-host="nntp.terraraq.uk:2a00:1098:0:86:1000:3f:0:2";
logging-data="28473"; mail-complaints-to="usenet@mantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:jlaqoi6fKw4M5JQnu0o+M0HrlRI=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Fri, 2 Jul 2021 08:44 UTC

Eli the Bearded <*@eli.users.panix.com> writes:
> Richard Kettlewell <invalid@invalid.invalid> wrote:
>> Eli the Bearded <*@eli.users.panix.com> writes:
>>> C encoding verifier I wrote:
>>> https://github.com/Eli-the-Bearded/eli-mailx/blob/master/utf-8.c
>>
>> That has several bugs...
>>
>> 1) It accepts non-minimal sequences such as F0808080.
>> 2) It accepts sequences mapping to UTF-16 surrogates, such as EDA080.
>> 3) It accepts sequences mapping outside the Unicode code point range,
>> such as F7808080.
>
> Interesting critique. I may fix those, but I'm not sure they'll ever be
> relevant to the level of strictness I need. I'm looking to catch
> mislabeled "charset"s not devious attacks.

It was advertized as checking for “legitimate UTF-8”, not “UTF-8 but
also some other stuff that is not UTF-8”.

>> See https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf D92 for the
>> specification.
>
> The Unicode website was briefly down, so I had put off responding to
> this until I could check that. Would have been nice if you had
> included a page number since that document is large and doesn't
> include a TOC. Page 123 as numbered in document, page 54 as numbered
> by my PDF reader.

I did’t think D92 would be hard to search for.

--
https://www.greenend.org.uk/rjk/

Re: Why SOME chars nonASCII?

<slrnsec734.bn1.syl@elvira.therockgarden.ca>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=447&group=alt.os.linux.slackware#447

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!fdc2.netnews.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
Newsgroups: alt.os.linux.slackware
From: syl@encs.concordia.ca (Sylvain Robitaille)
Subject: Re: Why SOME chars nonASCII?
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf>
<87k0mkkqef.fsf@bogus.nodomain.nowhere> <eli$2106232206@qaz.wtf>
User-Agent: slrn/1.0.2 (Linux)
Message-ID: <slrnsec734.bn1.syl@elvira.therockgarden.ca>
Lines: 19
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 07 Jul 2021 21:28:04 UTC
Date: Wed, 07 Jul 2021 21:28:04 GMT
X-Received-Bytes: 1418
 by: Sylvain Robitaille - Wed, 7 Jul 2021 21:28 UTC

On 2021-06-24, Eli the Bearded wrote:

>> I have an Emacs macro that finds the QP strings for the punctuation
>> and reverts them to ASCII before rmail-decode-quoted-printable but
>> it's a PITA.
>
> I have vim settings for the same purpose, ...

Care to share your vim settings? I see that your Perl script reads it
in, or defaults to its own, but I'm certainly curious about what you've
done in vim ...

--
----------------------------------------------------------------------
Sylvain Robitaille syl@encs.concordia.ca
Systems analyst / AITS Concordia University
Faculty of Engineering and Computer Science Montreal, Quebec, Canada
----------------------------------------------------------------------

Re: Why SOME chars nonASCII?

<84bl7ddci5.fsf@example.com>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=448&group=alt.os.linux.slackware#448

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!aioe.org!4+1/sO+hjgXu65daMoAjkg.user.gioia.aioe.org.POSTED!not-for-mail
From: richmond@criptext.com (Richmond)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: Wed, 07 Jul 2021 22:45:22 +0100
Organization: Frantic
Lines: 18
Message-ID: <84bl7ddci5.fsf@example.com>
References: <sb0fdv$jtd$1@gioia.aioe.org> <eli$2106232057@qaz.wtf>
<87k0mkkqef.fsf@bogus.nodomain.nowhere>
NNTP-Posting-Host: 4+1/sO+hjgXu65daMoAjkg.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:1NsBn9iXwR6dnYehsgD1PaN6ACI=
X-Notice: Filtered by postfilter v. 0.9.2
 by: Richmond - Wed, 7 Jul 2021 21:45 UTC

Mike Spencer <mds@bogus.nodomain.nowhere> writes:

> Several of my correspondents (using Mac or Windoes) writing in
> English do this in their own text and in text/articles copied from the
> net.
>
> Oddly, the non-ASCII chars are almost all punctuation: left & right
> double & single quotes, em dash, ellipses and the degree symbol. Very
> occasionally, there are French or Spanish names with non-ASCII chars
> but the big nuisance is the punctuation. And of course, they send it
> as quoted-printable.
>

Surely as most of the web is utf-8 it is good to use that as a standard.

There is no £ in seven bit ascii, there is in extended ascii, and in
iso, but it causes confusion when email programs do not state the
encoding used.

Re: Why SOME chars nonASCII?

<eli$2107081314@qaz.wtf>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=456&group=alt.os.linux.slackware#456

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: alt.os.linux.slackware
Subject: Re: Why SOME chars nonASCII?
Date: Thu, 8 Jul 2021 17:15:26 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2107081314@qaz.wtf>
References: <sb0fdv$jtd$1@gioia.aioe.org> <87k0mkkqef.fsf@bogus.nodomain.nowhere> <eli$2106232206@qaz.wtf> <slrnsec734.bn1.syl@elvira.therockgarden.ca>
Injection-Date: Thu, 8 Jul 2021 17:15:26 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="131"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Thu, 8 Jul 2021 17:15 UTC

In alt.os.linux.slackware, Sylvain Robitaille <syl@encs.concordia.ca> wrote:
> On 2021-06-24, Eli the Bearded wrote:
> > I have vim settings for the same purpose, ...
> Care to share your vim settings? I see that your Perl script reads it
> in, or defaults to its own, but I'm certainly curious about what you've
> done in vim ...

The complete vim settings are basically the same as in the perl script,
but here:

base64 -d <<_B64_VIMRC > highbit_vimrc
IiBzbWFydCBxdW90ZXMKbWFwISDigJkgJwptYXAhIOKAmCAnCm1hcCEg4oCcICIKbWFwISDi
gJ0gIgptYXAhIOKAsyAiCiIgYnVsbGV0Cm1hcCEg4pePICoKIiBlbGxpcHNpcwptYXAhIOKA
piAuLi4KIiBuLWRhc2gKbWFwISDigJMgLS0KIiBtLWRhc2gKbWFwISDigJQgLS0KIiBVKzIy
MTIgbWludXMKbWFwISDiiJIgLQoiIFUrMjAxMCBoeXBoZW4KbWFwISDigJAgLQoiIGx5bngg
YnJva2VuIFVURi04Cm1hcCEgw6LCgMKcICIKbWFwISDDosKAwp0gIgptYXAhIMOiwoDCmSAn
Cm1hcCEgw6LCgMKUIC0tCm1hcCEgw6LCgMKmIC4uLgoiCiIgZmluZCBub24tYXNjaWkKbWFw
IDxGNT4gL1teCSAtfl08Y3I+Cg==
_B64_VIMRC

Elijah
------
yay for multiple encodings raw in one file

Re: Why SOME chars nonASCII?

<slrnsepi9k.p29.syl@elvira.therockgarden.ca>

 copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=487&group=alt.os.linux.slackware#487

 copy link   Newsgroups: alt.os.linux.slackware
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx09.iad.POSTED!not-for-mail
Newsgroups: alt.os.linux.slackware
From: syl@encs.concordia.ca (Sylvain Robitaille)
Subject: Re: Why SOME chars nonASCII?
References: <sb0fdv$jtd$1@gioia.aioe.org>
<87k0mkkqef.fsf@bogus.nodomain.nowhere> <eli$2106232206@qaz.wtf>
<slrnsec734.bn1.syl@elvira.therockgarden.ca> <eli$2107081314@qaz.wtf>
User-Agent: slrn/1.0.2 (Linux)
Message-ID: <slrnsepi9k.p29.syl@elvira.therockgarden.ca>
Lines: 24
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 12 Jul 2021 22:59:00 UTC
Date: Mon, 12 Jul 2021 22:59:00 GMT
X-Received-Bytes: 1750
 by: Sylvain Robitaille - Mon, 12 Jul 2021 22:59 UTC

On 2021-07-08, Eli the Bearded wrote:

> The complete vim settings are basically the same as in the perl script,
> but here:
>
> base64 -d <<_B64_VIMRC > highbit_vimrc
> IiBzbWFydCBxdW90ZXMKbWFwISDigJkgJwptYXAhIOKAmCAnCm1hcCEg4oCcICIKbWFwISDi
> gJ0gIgptYXAhIOKAsyAiCiIgYnVsbGV0Cm1hcCEg4pePICoKIiBlbGxpcHNpcwptYXAhIOKA
> piAuLi4KIiBuLWRhc2gKbWFwISDigJMgLS0KIiBtLWRhc2gKbWFwISDigJQgLS0KIiBVKzIy
> MTIgbWludXMKbWFwISDiiJIgLQoiIFUrMjAxMCBoeXBoZW4KbWFwISDigJAgLQoiIGx5bngg
> YnJva2VuIFVURi04Cm1hcCEgw6LCgMKcICIKbWFwISDDosKAwp0gIgptYXAhIMOiwoDCmSAn
> Cm1hcCEgw6LCgMKUIC0tCm1hcCEgw6LCgMKmIC4uLgoiCiIgZmluZCBub24tYXNjaWkKbWFw
> IDxGNT4gL1teCSAtfl08Y3I+Cg==
> _B64_VIMRC

Beautiful. Thank you.

--
----------------------------------------------------------------------
Sylvain Robitaille syl@encs.concordia.ca
Systems analyst / AITS Concordia University
Faculty of Engineering and Computer Science Montreal, Quebec, Canada
----------------------------------------------------------------------

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor