Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

The two most common things in the Universe are hydrogen and stupidity. -- Harlan Ellison


devel / comp.lang.mumps / German collation routines for YottaDB UTF-8 mode

SubjectAuthor
* German collation routines for YottaDB UTF-8 modeK.S. Bhaskar
`* Re: German collation routines for YottaDB UTF-8 modeed de moel
 `* Re: German collation routines for YottaDB UTF-8 modeK.S. Bhaskar
  `* Re: German collation routines for YottaDB UTF-8 modeJens
   `* Re: German collation routines for YottaDB UTF-8 modeK.S. Bhaskar
    `* Re: German collation routines for YottaDB UTF-8 modeJens
     `- Re: German collation routines for YottaDB UTF-8 modeK.S. Bhaskar

1
German collation routines for YottaDB UTF-8 mode

<b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=255&group=comp.lang.mumps#255

  copy link   Newsgroups: comp.lang.mumps
X-Received: by 2002:ad4:5762:: with SMTP id r2mr93977461qvx.31.1637526045332;
Sun, 21 Nov 2021 12:20:45 -0800 (PST)
X-Received: by 2002:a05:6214:1c06:: with SMTP id u6mr64932315qvc.35.1637526045158;
Sun, 21 Nov 2021 12:20:45 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Sun, 21 Nov 2021 12:20:45 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=108.52.84.50; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 108.52.84.50
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>
Subject: German collation routines for YottaDB UTF-8 mode
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Sun, 21 Nov 2021 20:20:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 20
 by: K.S. Bhaskar - Sun, 21 Nov 2021 20:20 UTC

Characters in Unicode order are often not the linguistically or culturally correct order. For example, from YottaDB in UTF-8 mode:

YDB>set sz="ß",SZ="ẞ" write $ascii(sz)," ",$ascii(SZ)
223 7838
YDB>set umch="äëïöüÿÄËÏÖÜŸ" for i=1:1:$length(umch) write $ascii($extract(umch,i))," "
228 235 239 246 252 255 196 203 207 214 220 376
YDB>write "Öhman"]"Pfaff"," ","Ohman"]"Pfaff"
1 0
YDB>write "Öhman"]]"Pfaff"," ","Ohman"]]"Pfaff"
1 0
YDB>

Has anyone developed collation routines (https://docs.yottadb.com/ProgrammersGuide/internatn.html#creating-the-alternate-collation-routines) so that YottaDB can correctly display German words and names? Thank you very much.

Regards
– Bhaskar

Re: German collation routines for YottaDB UTF-8 mode

<1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=256&group=comp.lang.mumps#256

  copy link   Newsgroups: comp.lang.mumps
X-Received: by 2002:ad4:4451:: with SMTP id l17mr104343033qvt.33.1637611997032;
Mon, 22 Nov 2021 12:13:17 -0800 (PST)
X-Received: by 2002:a05:622a:609:: with SMTP id z9mr34568290qta.243.1637611996894;
Mon, 22 Nov 2021 12:13:16 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Mon, 22 Nov 2021 12:13:16 -0800 (PST)
In-Reply-To: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=71.174.62.16; posting-account=j7lJmQoAAADTx0Apfk15DZ7D0qlYy4zv
NNTP-Posting-Host: 71.174.62.16
References: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com>
Subject: Re: German collation routines for YottaDB UTF-8 mode
From: eddemoel@gmail.com (ed de moel)
Injection-Date: Mon, 22 Nov 2021 20:13:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 8
 by: ed de moel - Mon, 22 Nov 2021 20:13 UTC

I don't have any code for this "on the shelf", but I'd start by going through the strings, and replacing all the compound characters with their components, i.e. translate "ä" into "ae", "ß" into "sz", etc., and then comparing them in the "old-fashioned" way.
(which would work for most cases, my German isn't too good, but I am aware that "ß" sometimes should become "sz" and sometimes "ss"...)

Hope this works as a starting point,
Ed

Re: German collation routines for YottaDB UTF-8 mode

<14d4bfe0-9cd3-49c3-a40b-980de7ff49b2n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=257&group=comp.lang.mumps#257

  copy link   Newsgroups: comp.lang.mumps
X-Received: by 2002:a37:a790:: with SMTP id q138mr49413987qke.405.1637613574022;
Mon, 22 Nov 2021 12:39:34 -0800 (PST)
X-Received: by 2002:a05:6214:dc2:: with SMTP id 2mr104073635qvt.39.1637613573810;
Mon, 22 Nov 2021 12:39:33 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Mon, 22 Nov 2021 12:39:33 -0800 (PST)
In-Reply-To: <1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=71.162.243.192; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 71.162.243.192
References: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com> <1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <14d4bfe0-9cd3-49c3-a40b-980de7ff49b2n@googlegroups.com>
Subject: Re: German collation routines for YottaDB UTF-8 mode
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Mon, 22 Nov 2021 20:39:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 19
 by: K.S. Bhaskar - Mon, 22 Nov 2021 20:39 UTC

On Monday, November 22, 2021 at 3:13:17 PM UTC-5, ed de moel wrote:
> I don't have any code for this "on the shelf", but I'd start by going through the strings, and replacing all the compound characters with their components, i.e. translate "ä" into "ae", "ß" into "sz", etc., and then comparing them in the "old-fashioned" way.
> (which would work for most cases, my German isn't too good, but I am aware that "ß" sometimes should become "sz" and sometimes "ss"...)
>
> Hope this works as a starting point,
> Ed

Thanks Ed. That's a good suggestion, but for performance reasons, the database engine doesn't quite work that way. It requires a forward transformation, which should be fairly straightforward (e.g., ä→ae), but the reverse transformation is not always clear (e.g., should all occurrences of ae in subscripts be converted to ä)? But this gives me something to think about.

Regards
– Bhaskar

Re: German collation routines for YottaDB UTF-8 mode

<815bffc1-08db-4c60-88b8-769be4131afdn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=258&group=comp.lang.mumps#258

  copy link   Newsgroups: comp.lang.mumps
X-Received: by 2002:a05:622a:164c:: with SMTP id y12mr6093242qtj.63.1637673189866;
Tue, 23 Nov 2021 05:13:09 -0800 (PST)
X-Received: by 2002:a05:620a:2889:: with SMTP id j9mr4743089qkp.135.1637673189661;
Tue, 23 Nov 2021 05:13:09 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Tue, 23 Nov 2021 05:13:09 -0800 (PST)
In-Reply-To: <14d4bfe0-9cd3-49c3-a40b-980de7ff49b2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2003:d5:e70d:900:c98d:f466:6ad4:b531;
posting-account=Fb5loAoAAAAWGHFa1kwW5TIlX7XcPFIS
NNTP-Posting-Host: 2003:d5:e70d:900:c98d:f466:6ad4:b531
References: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>
<1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com> <14d4bfe0-9cd3-49c3-a40b-980de7ff49b2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <815bffc1-08db-4c60-88b8-769be4131afdn@googlegroups.com>
Subject: Re: German collation routines for YottaDB UTF-8 mode
From: jewu34@web.de (Jens)
Injection-Date: Tue, 23 Nov 2021 13:13:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 15
 by: Jens - Tue, 23 Nov 2021 13:13 UTC

I'm german, but I wasn't sure about the correct sort-order.
It seems that there are two options:

1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße

Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.

Hope, this helps.

Jens

Re: German collation routines for YottaDB UTF-8 mode

<a6dd107a-bbe7-4781-9fb7-1a385d16942an@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=259&group=comp.lang.mumps#259

  copy link   Newsgroups: comp.lang.mumps
X-Received: by 2002:a05:622a:1aa5:: with SMTP id s37mr6922829qtc.377.1637680575649;
Tue, 23 Nov 2021 07:16:15 -0800 (PST)
X-Received: by 2002:a05:620a:1a10:: with SMTP id bk16mr5700886qkb.258.1637680575404;
Tue, 23 Nov 2021 07:16:15 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Tue, 23 Nov 2021 07:16:15 -0800 (PST)
In-Reply-To: <815bffc1-08db-4c60-88b8-769be4131afdn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=108.52.84.50; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 108.52.84.50
References: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>
<1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com> <14d4bfe0-9cd3-49c3-a40b-980de7ff49b2n@googlegroups.com>
<815bffc1-08db-4c60-88b8-769be4131afdn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a6dd107a-bbe7-4781-9fb7-1a385d16942an@googlegroups.com>
Subject: Re: German collation routines for YottaDB UTF-8 mode
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Tue, 23 Nov 2021 15:16:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 22
 by: K.S. Bhaskar - Tue, 23 Nov 2021 15:16 UTC

On Tuesday, November 23, 2021 at 8:13:10 AM UTC-5, Jens wrote:
> I'm german, but I wasn't sure about the correct sort-order.
> It seems that there are two options:
>
> 1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
> 2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße
>
> Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.
>
> Hope, this helps.
>
> Jens

Thanks Jens. I was trying to help a German friend who uses YottaDB to index some text as a personal, post-retirement project. It seems that things are not as simple as I thought! Out of curiosity, what sorting order do dictionaries use?

Regards
– Bhaskar

Re: German collation routines for YottaDB UTF-8 mode

<52c4be35-fab5-46ec-b9c0-e9f8e154bba3n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=260&group=comp.lang.mumps#260

  copy link   Newsgroups: comp.lang.mumps
X-Received: by 2002:a0c:df0c:: with SMTP id g12mr7463856qvl.24.1637681433154;
Tue, 23 Nov 2021 07:30:33 -0800 (PST)
X-Received: by 2002:ae9:e30b:: with SMTP id v11mr5658993qkf.329.1637681432948;
Tue, 23 Nov 2021 07:30:32 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Tue, 23 Nov 2021 07:30:32 -0800 (PST)
In-Reply-To: <a6dd107a-bbe7-4781-9fb7-1a385d16942an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2003:d5:e70d:900:c98d:f466:6ad4:b531;
posting-account=Fb5loAoAAAAWGHFa1kwW5TIlX7XcPFIS
NNTP-Posting-Host: 2003:d5:e70d:900:c98d:f466:6ad4:b531
References: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>
<1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com> <14d4bfe0-9cd3-49c3-a40b-980de7ff49b2n@googlegroups.com>
<815bffc1-08db-4c60-88b8-769be4131afdn@googlegroups.com> <a6dd107a-bbe7-4781-9fb7-1a385d16942an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <52c4be35-fab5-46ec-b9c0-e9f8e154bba3n@googlegroups.com>
Subject: Re: German collation routines for YottaDB UTF-8 mode
From: jewu34@web.de (Jens)
Injection-Date: Tue, 23 Nov 2021 15:30:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 31
 by: Jens - Tue, 23 Nov 2021 15:30 UTC

K.S. Bhaskar schrieb am Dienstag, 23. November 2021 um 16:16:16 UTC+1:
> On Tuesday, November 23, 2021 at 8:13:10 AM UTC-5, Jens wrote:
> > I'm german, but I wasn't sure about the correct sort-order.
> > It seems that there are two options:
> >
> > 1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
> > 2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße
> >
> > Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.
> >
> > Hope, this helps.
> >
> > Jens
> Thanks Jens. I was trying to help a German friend who uses YottaDB to index some text as a personal, post-retirement project. It seems that things are not as simple as I thought! Out of curiosity, what sorting order do dictionaries use?
>
> Regards
> – Bhaskar
I just looked into a German/English dictionary and this is sorted like option 1

Regards Jens

PS: if I can help your friend in any way, I would do so. I still like coding in M
PSS: Just working on the Visual Studio Code extension to check correct NEWing of M local variables. :-)

Re: German collation routines for YottaDB UTF-8 mode

<308fbcd6-b147-482f-993b-b7680ef65405n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=262&group=comp.lang.mumps#262

  copy link   Newsgroups: comp.lang.mumps
X-Received: by 2002:a05:622a:2c9:: with SMTP id a9mr9176115qtx.28.1637696888942;
Tue, 23 Nov 2021 11:48:08 -0800 (PST)
X-Received: by 2002:a05:620a:11b0:: with SMTP id c16mr7319423qkk.354.1637696888728;
Tue, 23 Nov 2021 11:48:08 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Tue, 23 Nov 2021 11:48:08 -0800 (PST)
In-Reply-To: <52c4be35-fab5-46ec-b9c0-e9f8e154bba3n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=71.162.243.192; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 71.162.243.192
References: <b2c4f844-8f0f-4066-8a1a-2c24f04a0672n@googlegroups.com>
<1dffde9f-e902-4b88-b7d8-29050c4a1480n@googlegroups.com> <14d4bfe0-9cd3-49c3-a40b-980de7ff49b2n@googlegroups.com>
<815bffc1-08db-4c60-88b8-769be4131afdn@googlegroups.com> <a6dd107a-bbe7-4781-9fb7-1a385d16942an@googlegroups.com>
<52c4be35-fab5-46ec-b9c0-e9f8e154bba3n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <308fbcd6-b147-482f-993b-b7680ef65405n@googlegroups.com>
Subject: Re: German collation routines for YottaDB UTF-8 mode
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Tue, 23 Nov 2021 19:48:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 41
 by: K.S. Bhaskar - Tue, 23 Nov 2021 19:48 UTC

On Tuesday, November 23, 2021 at 10:30:33 AM UTC-5, Jens wrote:
> K.S. Bhaskar schrieb am Dienstag, 23. November 2021 um 16:16:16 UTC+1:
> > On Tuesday, November 23, 2021 at 8:13:10 AM UTC-5, Jens wrote:
> > > I'm german, but I wasn't sure about the correct sort-order.
> > > It seems that there are two options:
> > >
> > > 1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
> > > 2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße
> > >
> > > Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.
> > >
> > > Hope, this helps.
> > >
> > > Jens
> > Thanks Jens. I was trying to help a German friend who uses YottaDB to index some text as a personal, post-retirement project. It seems that things are not as simple as I thought! Out of curiosity, what sorting order do dictionaries use?
> >
> > Regards
> > – Bhaskar
> I just looked into a German/English dictionary and this is sorted like option 1
>
> Regards Jens
>
> PS: if I can help your friend in any way, I would do so. I still like coding in M
> PSS: Just working on the Visual Studio Code extension to check correct NEWing of M local variables. :-)

Jens –

My friend would be glad of any assistance. Would you please send your e-mail address to me: bhaskar at yottadb dot com? Thank you very much in advance..

Regards
– Bhaskar

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor