Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Without followers, evil cannot spread. -- Spock, "And The Children Shall Lead", stardate 5029.5


devel / comp.lang.fortran / Re: Reading UTF8-files

SubjectAuthor
* Reading UTF8-filesArjan
`* Re: Reading UTF8-filesArjen Markus
 `* Re: Reading UTF8-fileseugene_...@yahoo.com
  +* Re: Reading UTF8-filesLynn McGuire
  |`* Re: Reading UTF8-filesGary Scott
  | `- Re: Reading UTF8-filesGary Scott
  `* Re: Reading UTF8-filesArjen Markus
   `* Re: Reading UTF8-filesGary Scott
    `* Re: Reading UTF8-filesLynn McGuire
     `* Re: Reading UTF8-filesGary Scott
      `* Re: Reading UTF8-filesPeter Klausler US
       `- Re: Reading UTF8-filesGary Scott

1
Reading UTF8-files

<7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3438&group=comp.lang.fortran#3438

  copy link   Newsgroups: comp.lang.fortran
X-Received: by 2002:a05:622a:199e:b0:400:7965:cf4 with SMTP id u30-20020a05622a199e00b0040079650cf4mr1921426qtc.9.1687951471254;
Wed, 28 Jun 2023 04:24:31 -0700 (PDT)
X-Received: by 2002:a05:6830:4784:b0:6af:a3de:5d26 with SMTP id
df4-20020a056830478400b006afa3de5d26mr8683104otb.7.1687951471144; Wed, 28 Jun
2023 04:24:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.fortran
Date: Wed, 28 Jun 2023 04:24:30 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=131.224.251.101; posting-account=1_ygcgoAAADwWPAPx__WrwRZW-DpAL--
NNTP-Posting-Host: 131.224.251.101
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
Subject: Reading UTF8-files
From: arjan.van.dijk@rivm.nl (Arjan)
Injection-Date: Wed, 28 Jun 2023 11:24:31 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1398
 by: Arjan - Wed, 28 Jun 2023 11:24 UTC

Somehow, my compiler (GNU Fortran (MinGW.org GCC-6.3.0-1) 6.3.0) has issues reading ASCII-files in UTF8 format. The ASCII-files read fine in Notepad++, but my programs give undesired results. At the moment, I convert the UTF8-files to ANSI, but this requires my intervention. Is there a way to just read the ASCII-files without bothering about if they are UTF8 or ANSI?

Re: Reading UTF8-files

<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3439&group=comp.lang.fortran#3439

  copy link   Newsgroups: comp.lang.fortran
X-Received: by 2002:a05:622a:290:b0:401:dfc4:6f8f with SMTP id z16-20020a05622a029000b00401dfc46f8fmr666178qtw.13.1687954022889;
Wed, 28 Jun 2023 05:07:02 -0700 (PDT)
X-Received: by 2002:a05:6808:2007:b0:3a1:ef89:a49d with SMTP id
q7-20020a056808200700b003a1ef89a49dmr2183715oiw.2.1687954022706; Wed, 28 Jun
2023 05:07:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.fortran
Date: Wed, 28 Jun 2023 05:07:02 -0700 (PDT)
In-Reply-To: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.231.213.2; posting-account=A91wAAoAAADgBUxBX6QqsrSD26GLhVp8
NNTP-Posting-Host: 136.231.213.2
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
Subject: Re: Reading UTF8-files
From: arjen.markus895@gmail.com (Arjen Markus)
Injection-Date: Wed, 28 Jun 2023 12:07:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Arjen Markus - Wed, 28 Jun 2023 12:07 UTC

On Wednesday, June 28, 2023 at 1:24:33 PM UTC+2, Arjan wrote:
> Somehow, my compiler (GNU Fortran (MinGW.org GCC-6.3.0-1) 6.3.0) has issues reading ASCII-files in UTF8 format. The ASCII-files read fine in Notepad++, but my programs give undesired results. At the moment, I convert the UTF8-files to ANSI, but this requires my intervention. Is there a way to just read the ASCII-files without bothering about if they are UTF8 or ANSI?

You should be able to open them with the keyword "ENCODING='UTF-8'". Mind you, on Windows many editors add a "BOM" to the file, that is a byte-order marker that the editors will hide but are two bytes at teh start of the file. This may mess up things if you do not skip over them.

Regards,

Arjen

Re: Reading UTF8-files

<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3440&group=comp.lang.fortran#3440

  copy link   Newsgroups: comp.lang.fortran
X-Received: by 2002:a05:6214:18e4:b0:635:e24c:a868 with SMTP id ep4-20020a05621418e400b00635e24ca868mr273791qvb.8.1687960030364;
Wed, 28 Jun 2023 06:47:10 -0700 (PDT)
X-Received: by 2002:a05:6808:1a17:b0:3a1:cd17:efac with SMTP id
bk23-20020a0568081a1700b003a1cd17efacmr3361335oib.8.1687960030195; Wed, 28
Jun 2023 06:47:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.fortran
Date: Wed, 28 Jun 2023 06:47:09 -0700 (PDT)
In-Reply-To: <0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=192.55.55.54; posting-account=mg7VqQoAAAAi2CO4mRsFK-kGNNgIJxiC
NNTP-Posting-Host: 192.55.55.54
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com> <0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
Subject: Re: Reading UTF8-files
From: eugene_epshteyn@yahoo.com (eugene_...@yahoo.com)
Injection-Date: Wed, 28 Jun 2023 13:47:10 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1301
 by: eugene_...@yahoo.com - Wed, 28 Jun 2023 13:47 UTC

> two bytes at the start of the file

Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark

Re: Reading UTF8-files

<u7iee0$1sum0$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3441&group=comp.lang.fortran#3441

  copy link   Newsgroups: comp.lang.fortran
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: lynnmcguire5@gmail.com (Lynn McGuire)
Newsgroups: comp.lang.fortran
Subject: Re: Reading UTF8-files
Date: Wed, 28 Jun 2023 18:06:39 -0500
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <u7iee0$1sum0$1@dont-email.me>
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 28 Jun 2023 23:06:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="890643aacefe623445dc1025f82a09b7";
logging-data="1997504"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19EQvmu/iUvewrFUmLBZZQ35ezKHF97Z2w="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:I5CTRbkzC9rbewY6miOQC/qTp7M=
In-Reply-To: <a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
Content-Language: en-US
 by: Lynn McGuire - Wed, 28 Jun 2023 23:06 UTC

On 6/28/2023 8:47 AM, eugene_...@yahoo.com wrote:
>> two bytes at the start of the file
>
> Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark

That is gross. Any one opening any text file should expect UTF-8 now.

Lynn

Re: Reading UTF8-files

<u7ighe$1t3m6$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3442&group=comp.lang.fortran#3442

  copy link   Newsgroups: comp.lang.fortran
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: garylscott@sbcglobal.net (Gary Scott)
Newsgroups: comp.lang.fortran
Subject: Re: Reading UTF8-files
Date: Wed, 28 Jun 2023 18:42:37 -0500
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <u7ighe$1t3m6$1@dont-email.me>
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
<u7iee0$1sum0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Jun 2023 23:42:38 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="74c93e0f9c4e4319718b0cc8d538af41";
logging-data="2002630"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jUgtCcg7X04M+syC2r7QH5qbKY9tMaL0="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:ueD1dDgm1+NBM3NxNxPz8Mt110M=
In-Reply-To: <u7iee0$1sum0$1@dont-email.me>
Content-Language: en-US
 by: Gary Scott - Wed, 28 Jun 2023 23:42 UTC

On 6/28/2023 6:06 PM, Lynn McGuire wrote:
> On 6/28/2023 8:47 AM, eugene_...@yahoo.com wrote:
>>>   two bytes at the start of the file
>>
>> Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark
>
> That is gross.  Any one opening any text file should expect UTF-8 now.
>
> Lynn
>
All of my text files will be plain 8-bit extended ascii, or else.

Re: Reading UTF8-files

<u7igj5$1t3m6$2@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3443&group=comp.lang.fortran#3443

  copy link   Newsgroups: comp.lang.fortran
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: garylscott@sbcglobal.net (Gary Scott)
Newsgroups: comp.lang.fortran
Subject: Re: Reading UTF8-files
Date: Wed, 28 Jun 2023 18:43:33 -0500
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <u7igj5$1t3m6$2@dont-email.me>
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
<u7iee0$1sum0$1@dont-email.me> <u7ighe$1t3m6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Jun 2023 23:43:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="74c93e0f9c4e4319718b0cc8d538af41";
logging-data="2002630"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fiuObtLyIcEIka/f9XdCjnk4MpIBetJ8="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:7+WH69gyOyX2OeH5Fj8Ymppfl20=
In-Reply-To: <u7ighe$1t3m6$1@dont-email.me>
Content-Language: en-US
 by: Gary Scott - Wed, 28 Jun 2023 23:43 UTC

On 6/28/2023 6:42 PM, Gary Scott wrote:
> On 6/28/2023 6:06 PM, Lynn McGuire wrote:
>> On 6/28/2023 8:47 AM, eugene_...@yahoo.com wrote:
>>>>   two bytes at the start of the file
>>>
>>> Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark
>>
>> That is gross.  Any one opening any text file should expect UTF-8 now.
>>
>> Lynn
>>
> All of my text files will be plain 8-bit extended ascii, or else.
....or ebcdic

Re: Reading UTF8-files

<b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3444&group=comp.lang.fortran#3444

  copy link   Newsgroups: comp.lang.fortran
X-Received: by 2002:ad4:57a7:0:b0:634:dbb5:a34c with SMTP id g7-20020ad457a7000000b00634dbb5a34cmr312648qvx.8.1688022847155;
Thu, 29 Jun 2023 00:14:07 -0700 (PDT)
X-Received: by 2002:a05:6830:121a:b0:6b8:6f86:8aa4 with SMTP id
r26-20020a056830121a00b006b86f868aa4mr3051701otp.6.1688022846936; Thu, 29 Jun
2023 00:14:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.fortran
Date: Thu, 29 Jun 2023 00:14:06 -0700 (PDT)
In-Reply-To: <a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.231.213.2; posting-account=A91wAAoAAADgBUxBX6QqsrSD26GLhVp8
NNTP-Posting-Host: 136.231.213.2
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com> <a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com>
Subject: Re: Reading UTF8-files
From: arjen.markus895@gmail.com (Arjen Markus)
Injection-Date: Thu, 29 Jun 2023 07:14:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1977
 by: Arjen Markus - Thu, 29 Jun 2023 07:14 UTC

On Wednesday, June 28, 2023 at 3:47:12 PM UTC+2, eugene wrote:
> > two bytes at the start of the file
>
> Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark

So, simply opening a text file that is encoded in UTF-8 requires a delicate procedure:
- There may be a byte-order mark. There may not be one.
- If there is, it may be two bytes, in which case there are stil two possibilities you need to check.
- The third possibility is that there are three bytes that confer the same information about the byte ordering.

And there is no way to accurately predict which situation you are facing .... I think I am getting old.

Regards,

Arjen

Re: Reading UTF8-files

<u7k1p3$25ue5$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3445&group=comp.lang.fortran#3445

  copy link   Newsgroups: comp.lang.fortran
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: garylscott@sbcglobal.net (Gary Scott)
Newsgroups: comp.lang.fortran
Subject: Re: Reading UTF8-files
Date: Thu, 29 Jun 2023 08:42:58 -0500
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <u7k1p3$25ue5$1@dont-email.me>
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
<b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Jun 2023 13:43:00 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="198749e44f7e09fffe43f158147e786b";
logging-data="2292165"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Q5k7g4gk0r51jyeqp9zpfixBmWBXStKU="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:bWLNJ7tSXmDMwcE2vOfKwoE0DMM=
Content-Language: en-US
In-Reply-To: <b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com>
 by: Gary Scott - Thu, 29 Jun 2023 13:42 UTC

On 6/29/2023 2:14 AM, Arjen Markus wrote:
> On Wednesday, June 28, 2023 at 3:47:12 PM UTC+2, eugene wrote:
>>> two bytes at the start of the file
>>
>> Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark
>
> So, simply opening a text file that is encoded in UTF-8 requires a delicate procedure:
> - There may be a byte-order mark. There may not be one.
> - If there is, it may be two bytes, in which case there are stil two possibilities you need to check.
> - The third possibility is that there are three bytes that confer the same information about the byte ordering.
>
> And there is no way to accurately predict which situation you are facing ... I think I am getting old.
>

It's not so hard, just always read the first 3 bytes and figure out what
they are and proceed as required.

> Regards,
>
> Arjen

Re: Reading UTF8-files

<u7ktn0$28vna$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3446&group=comp.lang.fortran#3446

  copy link   Newsgroups: comp.lang.fortran
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: lynnmcguire5@gmail.com (Lynn McGuire)
Newsgroups: comp.lang.fortran
Subject: Re: Reading UTF8-files
Date: Thu, 29 Jun 2023 16:39:44 -0500
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <u7ktn0$28vna$1@dont-email.me>
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
<b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com>
<u7k1p3$25ue5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Jun 2023 21:39:44 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="890643aacefe623445dc1025f82a09b7";
logging-data="2391786"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/DUb0Fy7GZQ/ZdoSluzq4ORQ1ui/0pMXk="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:hgJRB4zFy8eV/oMNmievK9XjzaU=
In-Reply-To: <u7k1p3$25ue5$1@dont-email.me>
Content-Language: en-US
 by: Lynn McGuire - Thu, 29 Jun 2023 21:39 UTC

On 6/29/2023 8:42 AM, Gary Scott wrote:
> On 6/29/2023 2:14 AM, Arjen Markus wrote:
>> On Wednesday, June 28, 2023 at 3:47:12 PM UTC+2, eugene wrote:
>>>> two bytes at the start of the file
>>>
>>> Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark
>>
>> So, simply opening a text file that is encoded in UTF-8 requires a
>> delicate procedure:
>> - There may be a byte-order mark. There may not be one.
>> - If there is, it may be two bytes, in which case there are stil two
>> possibilities you need to check.
>> - The third possibility is that there are three bytes that confer the
>> same information about the byte ordering.
>>
>> And there is no way to accurately predict which situation you are
>> facing ... I think I am getting old.
>>
>
> It's not so hard, just always read the first 3 bytes and figure out what
> they are and proceed as required.
>
>
>> Regards,
>>
>> Arjen

But those 3 bytes are not required to be there for UTF-8. So, those
three bytes will probably not be there.

We are assuming that all text files are UTF-8 now.

Lynn

Re: Reading UTF8-files

<u7mm9c$2i8cb$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3447&group=comp.lang.fortran#3447

  copy link   Newsgroups: comp.lang.fortran
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: garylscott@sbcglobal.net (Gary Scott)
Newsgroups: comp.lang.fortran
Subject: Re: Reading UTF8-files
Date: Fri, 30 Jun 2023 08:45:15 -0500
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <u7mm9c$2i8cb$1@dont-email.me>
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
<b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com>
<u7k1p3$25ue5$1@dont-email.me> <u7ktn0$28vna$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 13:45:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="985fbff50ee7e2641e0ea13f2c4082b6";
logging-data="2695563"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18TjwPF5MtQbtuXkCYAXsc6pRWuvMcJ468="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:HdnAMuSeTLrDpXlGywwbaW2hzzA=
Content-Language: en-US
In-Reply-To: <u7ktn0$28vna$1@dont-email.me>
 by: Gary Scott - Fri, 30 Jun 2023 13:45 UTC

On 6/29/2023 4:39 PM, Lynn McGuire wrote:
> On 6/29/2023 8:42 AM, Gary Scott wrote:
>> On 6/29/2023 2:14 AM, Arjen Markus wrote:
>>> On Wednesday, June 28, 2023 at 3:47:12 PM UTC+2, eugene wrote:
>>>>> two bytes at the start of the file
>>>>
>>>> Or three bytes for UTF-8: https://en.wikipedia.org/wiki/Byte_order_mark
>>>
>>> So, simply opening a text file that is encoded in UTF-8 requires a
>>> delicate procedure:
>>> - There may be a byte-order mark. There may not be one.
>>> - If there is, it may be two bytes, in which case there are stil two
>>> possibilities you need to check.
>>> - The third possibility is that there are three bytes that confer the
>>> same information about the byte ordering.
>>>
>>> And there is no way to accurately predict which situation you are
>>> facing ... I think I am getting old.
>>>
>>
>> It's not so hard, just always read the first 3 bytes and figure out
>> what they are and proceed as required.
>>
>>
>>> Regards,
>>>
>>> Arjen
>
> But those 3 bytes are not required to be there for UTF-8.  So, those
> three bytes will probably not be there.
>
> We are assuming that all text files are UTF-8 now.
Of course, once you've checked that they arent there, you can either
rewind and read again or prepend the 3 characters to the next read
operation.
>
> Lynn
>
>

Re: Reading UTF8-files

<8ee45a42-a163-4e8d-a159-1b9d524501cen@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3448&group=comp.lang.fortran#3448

  copy link   Newsgroups: comp.lang.fortran
X-Received: by 2002:a05:620a:4691:b0:75e:c6ad:c98 with SMTP id bq17-20020a05620a469100b0075ec6ad0c98mr8059qkb.13.1688141504787;
Fri, 30 Jun 2023 09:11:44 -0700 (PDT)
X-Received: by 2002:a17:90a:fc9:b0:263:3727:6045 with SMTP id
67-20020a17090a0fc900b0026337276045mr1732278pjz.4.1688141504573; Fri, 30 Jun
2023 09:11:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.fortran
Date: Fri, 30 Jun 2023 09:11:43 -0700 (PDT)
In-Reply-To: <u7mm9c$2i8cb$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=12.154.207.45; posting-account=ZT-cAwoAAACx2tBPXm-WZoHIT8sjnGGy
NNTP-Posting-Host: 12.154.207.45
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com> <a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
<b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com> <u7k1p3$25ue5$1@dont-email.me>
<u7ktn0$28vna$1@dont-email.me> <u7mm9c$2i8cb$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8ee45a42-a163-4e8d-a159-1b9d524501cen@googlegroups.com>
Subject: Re: Reading UTF8-files
From: pklausler@nvidia.com (Peter Klausler US)
Injection-Date: Fri, 30 Jun 2023 16:11:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1695
 by: Peter Klausler US - Fri, 30 Jun 2023 16:11 UTC

On Friday, June 30, 2023 at 6:45:20 AM UTC-7, Gary Scott wrote:
> Of course, once you've checked that they arent there, you can either
> rewind and read again or prepend the 3 characters to the next read
> operation.

Hard to rewind a TTY or socket or pipe, though.

Re: Reading UTF8-files

<u7ph56$2vbpo$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=3449&group=comp.lang.fortran#3449

  copy link   Newsgroups: comp.lang.fortran
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: garylscott@sbcglobal.net (Gary Scott)
Newsgroups: comp.lang.fortran
Subject: Re: Reading UTF8-files
Date: Sat, 1 Jul 2023 10:36:05 -0500
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <u7ph56$2vbpo$1@dont-email.me>
References: <7f730695-40d1-44ea-9ade-e32466e5032cn@googlegroups.com>
<0329ba2a-e1d4-4a35-88c3-c6d6dcc1459cn@googlegroups.com>
<a57985f6-d136-4f77-8251-7ec5749e88afn@googlegroups.com>
<b437c60b-3521-4759-abff-fba50232da07n@googlegroups.com>
<u7k1p3$25ue5$1@dont-email.me> <u7ktn0$28vna$1@dont-email.me>
<u7mm9c$2i8cb$1@dont-email.me>
<8ee45a42-a163-4e8d-a159-1b9d524501cen@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 1 Jul 2023 15:36:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bc84b0ca2cf0176121d34b05040c15ea";
logging-data="3125048"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6GNhLVbnuOTpfVuQ2b6fw6vs0d0qHZkY="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:Dq1vnKeAOkigfNpTqRn1Rnyut/Q=
Content-Language: en-US
In-Reply-To: <8ee45a42-a163-4e8d-a159-1b9d524501cen@googlegroups.com>
 by: Gary Scott - Sat, 1 Jul 2023 15:36 UTC

On 6/30/2023 11:11 AM, Peter Klausler US wrote:
> On Friday, June 30, 2023 at 6:45:20 AM UTC-7, Gary Scott wrote:
>> Of course, once you've checked that they arent there, you can either
>> rewind and read again or prepend the 3 characters to the next read
>> operation.
>
> Hard to rewind a TTY or socket or pipe, though.
Sure...there are ways to handle it

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor