Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Security check: INTRUDER ALERT!


computers / comp.editors / Re: [vim] Jumping from current Unicode string to next/prev appearance

SubjectAuthor
* [vim] Jumping from current Unicode string to next/prev appearanceJanis Papanagnou
+* Re: [vim] Jumping from current Unicode string to next/prev appearanceEli the Bearded
|+- Re: [vim] Jumping from current Unicode string to next/prev appearanceJulieta Shem
|`* Re: [vim] Jumping from current Unicode string to next/prev appearanceJanis Papanagnou
| `* Re: [vim] Jumping from current Unicode string to next/prev appearanceJanis Papanagnou
|  `* Re: [vim] Jumping from current Unicode string to next/prev appearanceEli the Bearded
|   `* Re: [vim] Jumping from current Unicode string to next/prev appearanceJanis Papanagnou
|    `* Re: [vim] Jumping from current Unicode string to next/prev appearanceEli the Bearded
|     `* Re: [vim] Jumping from current Unicode string to next/prev appearanceJanis Papanagnou
|      `* Re: [vim] Jumping from current Unicode string to next/prev appearanceEli the Bearded
|       `- Re: [vim] Jumping from current Unicode string to next/prev appearanceJanis Papanagnou
`- Re: [vim] Jumping from current Unicode string to next/prev appearanceJanis Papanagnou

1
[vim] Jumping from current Unicode string to next/prev appearance

<umikdj$496s$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=82&group=comp.editors#82

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.editors
Subject: [vim] Jumping from current Unicode string to next/prev appearance
Date: Thu, 28 Dec 2023 02:52:50 +0100
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <umikdj$496s$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Dec 2023 01:52:51 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb6050195e1fb42308e82c985667a1d1";
logging-data="140508"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/KDozl5gq1pgGH9/jjMJo6"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:rTsRznHiO7ydfcYWPmkrp9ZlMRI=
X-Enigmail-Draft-Status: N1110
X-Mozilla-News-Host: news://news.eternal-september.org:119
 by: Janis Papanagnou - Thu, 28 Dec 2023 01:52 UTC

In Vim I frequently jump from string to the next equal string using the
commands '*' (forward search'n'jump) and '#' (backward search'n'jump).

With Unicode characters that doesn't seem to always work (at least not
per default).

In the following (UTF-8 encoded) test sample there is one subset of
Omega words where * and # works correctly and one where it doesn't
(starting with the cursor on the first letter of any word)

Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega

The difference is only the encoding of the first character of that
word ('\x03A9' versus '\x2126'). For words with Ω=\x03A9 it works but
not for words with Ω=\x2126.

Is there a way to fix or achieve that function for all UTF-8 encoded
words?

Janis

Re: [vim] Jumping from current Unicode string to next/prev appearance

<eli$2312272135@qaz.wtf>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=83&group=comp.editors#83

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Thu, 28 Dec 2023 02:36:58 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2312272135@qaz.wtf>
References: <umikdj$496s$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Date: Thu, 28 Dec 2023 02:36:58 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="25571"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Thu, 28 Dec 2023 02:36 UTC

In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> In Vim I frequently jump from string to the next equal string using the
> commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
>
> With Unicode characters that doesn't seem to always work (at least not
> per default).
>
> In the following (UTF-8 encoded) test sample there is one subset of
> Omega words where * and # works correctly and one where it doesn't
> (starting with the cursor on the first letter of any word)
>
> Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega

This is like complaining that a search for "MISS" does not also match
"МІЅЅ". They are completely different strings that just happen to look
alike with certain font choices. Some of those are "ohm sign", "Latin
small letter m", "Latin small letter e", "Latin small letter g", "Latin
small letter a" and the others are "Greek capital letter omega",
"Latin small letter m", "Latin small letter e", "Latin small letter g",
"Latin small letter a".

Your "difference is only the encoding" fails to grasp that Unicode is
semiotics aware, even if users might not be.

Elijah
------
https://www.unicode.org/reports/tr36/#visual_spoofing

Re: [vim] Jumping from current Unicode string to next/prev appearance

<874jg3avv0.fsf@yaxenu.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=84&group=comp.editors#84

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jshem@yaxenu.org (Julieta Shem)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Wed, 27 Dec 2023 23:45:07 -0300
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <874jg3avv0.fsf@yaxenu.org>
References: <umikdj$496s$1@dont-email.me> <eli$2312272135@qaz.wtf>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="0f55de256f731b22a5fb9b4cc890dfdd";
logging-data="149495"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+F124LlR3oTb062+ejlRnhw8jDmeLcP5c="
Cancel-Lock: sha1:e2TuULcsoR0JgOCN/7nfmazKF3w=
sha1:KYMSxKYTtpsWhA7/DHXsn/VFROI=
 by: Julieta Shem - Thu, 28 Dec 2023 02:45 UTC

Eli the Bearded <*@eli.users.panix.com> writes:

> In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> In Vim I frequently jump from string to the next equal string using the
>> commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
>>
>> With Unicode characters that doesn't seem to always work (at least not
>> per default).
>>
>> In the following (UTF-8 encoded) test sample there is one subset of
>> Omega words where * and # works correctly and one where it doesn't
>> (starting with the cursor on the first letter of any word)
>>
>> Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
>
> This is like complaining that a search for "MISS" does not also match
> "МІЅЅ". They are completely different strings that just happen to look
> alike with certain font choices.

It looks very much alike with Google's ``Fira Code''.

> Some of those are "ohm sign", "Latin small letter m", "Latin small
> letter e", "Latin small letter g", "Latin small letter a" and the
> others are "Greek capital letter omega", "Latin small letter m",
> "Latin small letter e", "Latin small letter g", "Latin small letter
> a".
>
> Your "difference is only the encoding" fails to grasp that Unicode is
> semiotics aware, even if users might not be.

There's a package for the GNU EMACS that implements the search as the OP
desires. You can invoke it with saying

C-u 42 S E M I O T I C A W A R E RET C-c A I RET A W Y E A H RET

to the minibuffer. (Then press * and # as you wish.)

Re: [vim] Jumping from current Unicode string to next/prev appearance

<umiqnt$8pa2$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=85&group=comp.editors#85

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Thu, 28 Dec 2023 04:40:44 +0100
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <umiqnt$8pa2$1@dont-email.me>
References: <umikdj$496s$1@dont-email.me> <eli$2312272135@qaz.wtf>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Dec 2023 03:40:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb6050195e1fb42308e82c985667a1d1";
logging-data="288066"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Dac+Pxw3IdCOHdpPQKV/F"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:gh2WiZwB7y8beUEdyEwsBhRDiSo=
In-Reply-To: <eli$2312272135@qaz.wtf>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Thu, 28 Dec 2023 03:40 UTC

On 28.12.2023 03:36, Eli the Bearded wrote:
> In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> In Vim I frequently jump from string to the next equal string using the
>> commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
>>
>> With Unicode characters that doesn't seem to always work (at least not
>> per default).
>>
>> In the following (UTF-8 encoded) test sample there is one subset of
>> Omega words where * and # works correctly and one where it doesn't
>> (starting with the cursor on the first letter of any word)
>>
>> Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
>
> This is like complaining that a search for "MISS" does not also match
> "МІЅЅ". They are completely different strings that just happen to look
> alike with certain font choices.

No, unfortunately you seem to have MISSed the point. It's not about
same looking but different strings. It's about different behavior of
the same Vim operations (* and #) on _two types_ of words.

Try to copy/paste the line into a Vim session, then move the cursor
onto the first character of the first word, then type * repeatedly.
Then do the same starting with the first character of the third word,
and observe the difference! - Tell me what you think about that.

(You can adjust the test-case to use these two letters in different
contexts, or work on single characters.)

Janis

> Some of those are "ohm sign", "Latin
> small letter m", "Latin small letter e", "Latin small letter g", "Latin
> small letter a" and the others are "Greek capital letter omega",
> "Latin small letter m", "Latin small letter e", "Latin small letter g",
> "Latin small letter a".
>
> Your "difference is only the encoding" fails to grasp that Unicode is
> semiotics aware, even if users might not be.
>
> Elijah
> ------
> https://www.unicode.org/reports/tr36/#visual_spoofing
>

Re: [vim] Jumping from current Unicode string to next/prev appearance

<umirkf$8sbd$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=86&group=comp.editors#86

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Thu, 28 Dec 2023 04:55:59 +0100
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <umirkf$8sbd$1@dont-email.me>
References: <umikdj$496s$1@dont-email.me> <eli$2312272135@qaz.wtf>
<umiqnt$8pa2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Dec 2023 03:55:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb6050195e1fb42308e82c985667a1d1";
logging-data="291181"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187V4JQSOjUtVt56ycepKuO"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:uny+cpWHq98YXSh65PCOHET4k3o=
In-Reply-To: <umiqnt$8pa2$1@dont-email.me>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Thu, 28 Dec 2023 03:55 UTC

On 28.12.2023 04:40, Janis Papanagnou wrote:
>
> Try to copy/paste the line into a Vim session, then move the cursor
> onto the first character of the first word, then type * repeatedly.
> Then do the same starting with the first character of the third word,
> and observe the difference! - Tell me what you think about that.

Here's the effect visualized, where ^ indicates the cursor position
after a '*' operation

Case 1 (cursor starting at first character of the _third_ word):

Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
^ ^ ^ ^

(All okay, the four matching words are addressed correctly.)

Case 2 (cursor starting at first character of the _first_ word):

Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
^ ^ ^ ^ first turn
^ ^ ^ ^ second turn

(Not okay: in all subsequent words the first character is skipped.)

This is what annoys me and where I am looking for a solution (or a
hint that this is, maybe, an unavoidable flaw).

Janis

Re: [vim] Jumping from current Unicode string to next/prev appearance

<umismo$90ji$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=87&group=comp.editors#87

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Thu, 28 Dec 2023 05:14:16 +0100
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <umismo$90ji$1@dont-email.me>
References: <umikdj$496s$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Dec 2023 04:14:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb6050195e1fb42308e82c985667a1d1";
logging-data="295538"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8Pgq29MehtGyODoyOLqMH"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:JKZUhxmajpQ3gLC/T7uYNro0OUM=
In-Reply-To: <umikdj$496s$1@dont-email.me>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Thu, 28 Dec 2023 04:14 UTC

On 28.12.2023 02:52, Janis Papanagnou wrote:
> In Vim I frequently jump from string to the next equal string using the
> commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
>
> With Unicode characters that doesn't seem to always work (at least not
> per default).
>
> In the following (UTF-8 encoded) test sample there is one subset of
> Omega words where * and # works correctly and one where it doesn't
> (starting with the cursor on the first letter of any word)
>
> Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
>
> The difference is only the encoding of the first character of that
> word ('\x03A9' versus '\x2126'). For words with Ω=\x03A9 it works but
> not for words with Ω=\x2126.
>
> Is there a way to fix or achieve that function for all UTF-8 encoded
> words?

I noticed that the effect is not depending on Unicode characters but
behaves similar to this ASCII-only test-case

'help' 'help' 'help'

If the cursor starts at the first quote we see the same effect

'help' 'help' 'help'
^ ^ ^ first turn
^ ^ ^ second turn

The quote seems to be excluded from consideration of the * command,
and the cursor jumps to the next word part. - Can this be explained?

So one of the Unicode characters mentioned above is not considered
part of the word while the other one is. And only words seem to be
considered, at least in this case.

But on the other hand, I can navigate with * also within non-alpha
characters like

§%" §%" §%" §%"
^ ^ ^ ^

So this also works.

I'm not pleased by that behavior. Looks also inconsistent to me.

Janis

Re: [vim] Jumping from current Unicode string to next/prev appearance

<eli$2312280310@qaz.wtf>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=88&group=comp.editors#88

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!paganini.bofh.team!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Thu, 28 Dec 2023 08:13:21 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2312280310@qaz.wtf>
References: <umikdj$496s$1@dont-email.me> <eli$2312272135@qaz.wtf> <umiqnt$8pa2$1@dont-email.me> <umirkf$8sbd$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Date: Thu, 28 Dec 2023 08:13:21 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="14993"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Thu, 28 Dec 2023 08:13 UTC

In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> Case 2 (cursor starting at first character of the _first_ word):
>
> Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
> ^ ^ ^ ^ first turn
> ^ ^ ^ ^ second turn
>

:help *

*star* *E348* *E349*
* Search forward for the [count]'th occurrence of the
word nearest to the cursor. The word used for the
search is the first of:
1. the keyword under the cursor |'iskeyword'|
2. the first keyword after the cursor, in the
current line
...

:help iskeyword
*'iskeyword'* *'isk'*
'iskeyword' 'isk' string (Vim default for MS-DOS and Win32:
"@,48-57,_,128-167,224-235"
otherwise: "@,48-57,_,192-255"
Vi default: "@,48-57,_")
local to buffer
Keywords are used in searching and recognizing with many commands:
"w", "*", "[i", etc. It is also used for "\k" in a |pattern|. See
'isfname' for a description of the format of this option. For '@'
characters above 255 check the "word" character class.
For C programs you could use "a-z,A-Z,48-57,_,.,-,>".
...
I think it is a bug that "word" is not a link to somewhere in pattern.txt

In any case, it is clear that # and * recognize alphabetic characters
like Greek capital *letter* omega differently from non-alphabet symbol
characters like ohm *sign*. If you move along the line with "w" to jump
between "words" you see the differences. The # and * searches use word
boundaries, so word definitions are very important there.

You are still looking at an ohm sign and thinking of a letter which is
the trap of Unicode "look alikes", not something vim is doing wrong.

Elijah
------
has vim's * remapped to _ and nearly used that writing this

Re: [vim] Jumping from current Unicode string to next/prev appearance

<umk5nc$e5vo$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=89&group=comp.editors#89

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Thu, 28 Dec 2023 16:54:19 +0100
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <umk5nc$e5vo$1@dont-email.me>
References: <umikdj$496s$1@dont-email.me> <eli$2312272135@qaz.wtf>
<umiqnt$8pa2$1@dont-email.me> <umirkf$8sbd$1@dont-email.me>
<eli$2312280310@qaz.wtf>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Dec 2023 15:54:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb6050195e1fb42308e82c985667a1d1";
logging-data="464888"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/VSMPsql7zNdMCTsXZ5VE2"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:GdgbZnKMu0JBZnnJmEHO8MBRvSY=
In-Reply-To: <eli$2312280310@qaz.wtf>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Thu, 28 Dec 2023 15:54 UTC

On 28.12.2023 09:13, Eli the Bearded wrote:
> [snip]
>
> In any case, it is clear that # and * recognize alphabetic characters
> like Greek capital *letter* omega differently from non-alphabet symbol
> characters like ohm *sign*. If you move along the line with "w" to jump
> between "words" you see the differences. The # and * searches use word
> boundaries, so word definitions are very important there.

Right.

>
> You are still looking at an ohm sign and thinking of a letter which is
> the trap of Unicode "look alikes", not something vim is doing wrong.

Erm, no. (I already explained elsethread that it's not about characters
that are looking alike; the issue turned out to not be about Unicode,
although it got apparent there. That's why I changed the test sample to
a plain ASCII test case.)

Your quotes (from the Vim help) helps explaining the behavior with the
'help' sample I posted: 'help' 'help' 'help'

I still think the behavior of Vim's * command is counterintuitive and
inconsistent. See this example (a file with two lines):

§%" §%" *+*+ §%" §%"
§%" a §%" a *+*+ §%" a §%" a

Starting from the first character of the first word we see the command
'*' jump words as depicted by the ^ symbols:

§%" §%" *+*+ §%" §%"
^ ^ ^ ^ # search-jumps on first line
§%" a §%" a *+*+ §%" a §%" a
^ ^ ^ ^ # continuing/changing on second line
^ ^ ^ ^

It means that * is first identifying the §%" string, and it continues
the search on the next line. But after it located the first §%" on the
second line it ad hoc changes the search pattern. - I would call that
undesired and inconsistent behavior.

We can "explain" (sort of) what happens. As in, say,
"If no alpha character is on the line * tries to match the next string
that matches the current one, but as soon as this search reaches or is
on a line that contains an alpha character the search pattern changes
and * jumps to the next alpha character on that line."

Okay, is it as it is. But shouldn't that feature be straightened? It's
not the first time that I missed a more coherent behavior in contexts
of non-alpha character strings, and I think that it would be generally
useful. - Is there, on the other hand, some sensible use-case for that
current [inconsistent] behavior (of ad hoc changing the pattern)?

Janis

Re: [vim] Jumping from current Unicode string to next/prev appearance

<eli$2312282053@qaz.wtf>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=90&group=comp.editors#90

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Fri, 29 Dec 2023 01:53:33 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2312282053@qaz.wtf>
References: <umikdj$496s$1@dont-email.me> <umirkf$8sbd$1@dont-email.me> <eli$2312280310@qaz.wtf> <umk5nc$e5vo$1@dont-email.me>
Injection-Date: Fri, 29 Dec 2023 01:53:33 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="14722"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Fri, 29 Dec 2023 01:53 UTC

In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> Is there, on the other hand, some sensible use-case for that
> current [inconsistent] behavior (of ad hoc changing the pattern)?

It is a keyword search tool, not a random object search tool. The word
boundaries should be the indicator.

Elijah
------
printf, eg, is different than sprintf

Re: [vim] Jumping from current Unicode string to next/prev appearance

<ummp28$sf5r$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=91&group=comp.editors#91

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Fri, 29 Dec 2023 16:36:39 +0100
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <ummp28$sf5r$1@dont-email.me>
References: <umikdj$496s$1@dont-email.me> <umirkf$8sbd$1@dont-email.me>
<eli$2312280310@qaz.wtf> <umk5nc$e5vo$1@dont-email.me>
<eli$2312282053@qaz.wtf>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 29 Dec 2023 15:36:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9a88f9971171da36d21591f54ad7b25f";
logging-data="933051"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+SO7/9F+Ic6s7l6slydMWW"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:sd4j/an1Cf/Sf35CIE2Q4cQxDA0=
In-Reply-To: <eli$2312282053@qaz.wtf>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Fri, 29 Dec 2023 15:36 UTC

On 29.12.2023 02:53, Eli the Bearded wrote:
> In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> Is there, on the other hand, some sensible use-case for that
>> current [inconsistent] behavior (of ad hoc changing the pattern)?
>
> It is a keyword search tool, not a random object search tool.

Yes, obviously. And that's IMO an unnecessary restriction.
YMMV, of course.

And even as an artificially restricted "keyword search tool"
it's not working consistent if applied to the two lines of
test data that I posted.

I suppose there's little use to discuss that since it won't
change if not widely accepted as a useful generalization of
the * and # command.

In my book it was certainly often a nuisance in the restricted
and inconsistent form and I would have appreciated if it works
also on other (non-alphanumeric) keywords (i.e. on strings).

> The word boundaries should be the indicator.

Janis

PS: Historically (IIRC), in Vi, there was just the # command
(but not the * which I saw later in Vim). A typical use was to
jump from a C function call backwards to find its declaration.
Application of Vi(m) broadened since then, and yet more useful
features and changes entered the Vim command base.

Re: [vim] Jumping from current Unicode string to next/prev appearance

<eli$2312300156@qaz.wtf>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=93&group=comp.editors#93

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Sat, 30 Dec 2023 07:00:12 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2312300156@qaz.wtf>
References: <umikdj$496s$1@dont-email.me> <umk5nc$e5vo$1@dont-email.me> <eli$2312282053@qaz.wtf> <ummp28$sf5r$1@dont-email.me>
Injection-Date: Sat, 30 Dec 2023 07:00:12 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="6943"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Sat, 30 Dec 2023 07:00 UTC

In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> PS: Historically (IIRC), in Vi, there was just the # command
> (but not the * which I saw later in Vim).

I do not believe you. For starters, nvi has a completely different
function bound to #, and nvi tries to be backwards compatible with vi.

> jump from a C function call backwards to find its declaration.
> Application of Vi(m) broadened since then, and yet more useful
> features and changes entered the Vim command base.

It occurs to me that you may like the boundary free versions of * and #:
prefix them with a g.

:noremap * g*
:noremap # g#

Elijah
------
uses very few of the g_ library of commands

Re: [vim] Jumping from current Unicode string to next/prev appearance

<umpnu2$1c587$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=95&group=comp.editors#95

  copy link   Newsgroups: comp.editors
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.editors
Subject: Re: [vim] Jumping from current Unicode string to next/prev appearance
Date: Sat, 30 Dec 2023 19:35:45 +0100
Organization: A noiseless patient Spider
Lines: 85
Message-ID: <umpnu2$1c587$1@dont-email.me>
References: <umikdj$496s$1@dont-email.me> <umk5nc$e5vo$1@dont-email.me>
<eli$2312282053@qaz.wtf> <ummp28$sf5r$1@dont-email.me>
<eli$2312300156@qaz.wtf>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 30 Dec 2023 18:35:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="464f9a33e26b1c4509e5548488ecb17b";
logging-data="1447175"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19OQ9+g2EFwyXSKgg8+uIlG"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:MohUC+8qkEgjaH63ZnnHuzTi+Jw=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <eli$2312300156@qaz.wtf>
 by: Janis Papanagnou - Sat, 30 Dec 2023 18:35 UTC

On 30.12.2023 08:00, Eli the Bearded wrote:
> In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> PS: Historically (IIRC), in Vi, there was just the # command
>> (but not the * which I saw later in Vim).
>
> I do not believe you. For starters, nvi has a completely different
> function bound to #, and nvi tries to be backwards compatible with vi.

I don't think that the '#' command (with the current semantic) was in
the _original_ Vi. (If that is how you interpreted "historically"). I
observed the command # with the current behavior when I regularly used
Vi starting around 1990 on AIX (and HPUX). And I'm positive - since I
recall to have been looking for that - that at these days there was no
'*' (as counterpart that matches in the opposite direction). - But
please correct me if I am wrong.

>
>> jump from a C function call backwards to find its declaration.
>> Application of Vi(m) broadened since then, and yet more useful
>> features and changes entered the Vim command base.
>
> It occurs to me that you may like the boundary free versions of * and #:
> prefix them with a g.
>
> :noremap * g*
> :noremap # g#

I didn't know of the 'g' variants, but 'g*' seems to behave equivalent
to '*' on my two-line test sample; i.e. when reaching the second line
it jumps from the punctuation character block to the letter a.

§%" §%" *+*+ §%" §%"
^ ^ ^ ^
§%" a §%" a *+*+ §%" a §%" a
^ ^ ^ ^
^ ^ ^ ^

So while 'g*' doesn't address the issue it is actually even worse since
without the \< and \> it then also matches other appearing 'a' in the
text.

I want to provide two more examples to explain my desire for a "better"
behavior with non-alpha character blocks.[*]

1) Matching (non-alpha) shell keywords (or other non-alpha constructs
that are so typical in shells)

f() {
: ${1:?}
}
: ${1:?}
echo "a: b"

Positioning at the first colon I want to find other standalone ones.

2) Matching ASN.1 identifiers (or other not pure-alpha identifiers)

direct-reference OBJECT IDENTIFIER OPTIONAL,
indirect-reference INTEGER OPTIONAL,

Positioning it in one of the "reference" substrings I want to find
the whole identifier (e.g. "direct-reference"), but not any string
with the substring reference.

In other words, a keyword and an identifier (beyond C and alike) has a
broader definition generally, and a quick-match for non-alpha strings
would be very convenient as I regularly observe in various editing
contexts.

I am aware that we cannot cover all matching combinations - e.g. how
should "an-id: 'a value'" be parsed; it might get non-trivial - but
a quick-search for space-separated entities would already be very
convenient as I've often experienced in my editing contexts.

Vim already supports a lot of such settings (breakat, isfname, isident,
iskeyword, and yet more even language specifics), so maybe there's a
not too complex way to achieve that.

Janis

[*] Note: Of course all searching can be done with regular search/regexp
but as I use * for quick match convenience I'd like to have it not only
for alpha sequences.


computers / comp.editors / Re: [vim] Jumping from current Unicode string to next/prev appearance

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor