Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

God doesn't play dice. -- Albert Einstein


devel / comp.lang.tcl / Argument handling of [regexp]

SubjectAuthor
* Argument handling of [regexp]Erik Leunissen
+* Re: Argument handling of [regexp]Erik Leunissen
|`* Re: Argument handling of [regexp]briang
| `* Re: Argument handling of [regexp]Erik Leunissen
|  `- Re: Argument handling of [regexp]briang
+* Re: Argument handling of [regexp]Schelte
|`* Re: Argument handling of [regexp]heinrichmartin
| `- Re: Argument handling of [regexp]heinrichmartin
`- Re: Argument handling of [regexp]Erik Leunissen

1
Argument handling of [regexp]

<nnd$253242c0$043915ce@359b9e61de13cce5>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20081&group=comp.lang.tcl#20081

  copy link   Newsgroups: comp.lang.tcl
Newsgroups: comp.lang.tcl
X-Mozilla-News-Host: news://news.xs4all.nl:119
From: look@the.footer.invalid (Erik Leunissen)
Subject: Argument handling of [regexp]
Date: Fri, 16 Sep 2022 21:06:44 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
Message-ID: <nnd$253242c0$043915ce@359b9e61de13cce5>
Organization: KPN B.V.
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!feeder.usenetexpress.com!tr3.eu1.usenetexpress.com!94.232.112.244.MISMATCH!feed.abavia.com!abe004.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 40
Injection-Date: Fri, 16 Sep 2022 21:06:45 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
 by: Erik Leunissen - Fri, 16 Sep 2022 19:06 UTC

Here are the results of six invocations of the "regexp" command.

For one of them I'm sure that the result is correct (1).

For some of them I'm unsure (2, 3). I wouldn't be surprised if the result can be explained to be
correct though.

However, for invocations 4, 5 and 6 I definitely can't imagine how the results can be correct:

% set str "a-z"
a-z
% regexp - $str; #1
bad option "-": must be -all, -about, -indices, -inline, -expanded, -line, -linestop, -lineanchor,
-nocase, -start, or --
% regexp \- $str; #2
1 % regexp {-} $str; #3
bad option "-": must be -all, -about, -indices, -inline, -expanded, -line, -linestop, -lineanchor,
-nocase, -start, or --
% regexp -\\ $str; #4
couldn't compile regular expression pattern: invalid escape \ sequence
% regexp \\- $str; #5
1 % set initstring "-"
- % regexp $initstring $str; #6
1 %

(Note that I'm aware of the purpose of "--" in invocations of "regexp" and of several other
commands. However, that is not my point. My point is the correctness of argument handling in the
above examples).

I'd be grateful for any explanations and judgement.

Erik.
--
elns@ nl | Merge the left part of these two lines into one,
xs4all. | respecting a character's position in a line.

Re: Argument handling of [regexp]

<nnd$229e98b8$18d06dd0@6288a100734de3bb>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20082&group=comp.lang.tcl#20082

  copy link   Newsgroups: comp.lang.tcl
Subject: Re: Argument handling of [regexp]
Newsgroups: comp.lang.tcl
References: <nnd$253242c0$043915ce@359b9e61de13cce5>
From: look@the.footer.invalid (Erik Leunissen)
Date: Fri, 16 Sep 2022 21:19:37 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.2.1
MIME-Version: 1.0
In-Reply-To: <nnd$253242c0$043915ce@359b9e61de13cce5>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
Message-ID: <nnd$229e98b8$18d06dd0@6288a100734de3bb>
Organization: KPN B.V.
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe006.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail
Lines: 20
Injection-Date: Fri, 16 Sep 2022 21:19:38 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 1640
 by: Erik Leunissen - Fri, 16 Sep 2022 19:19 UTC

On 16/09/2022 21:06, Erik Leunissen wrote:
> Here are the results of six invocations of the "regexp" command.
>
> For one of them I'm sure that the result is correct (1).
>
> For some of them I'm unsure (2, 3). I wouldn't be surprised if the result can be explained to be
> correct though.
>
> However, for invocations 4, 5 and 6 I definitely can't imagine how the results can be correct:
>

Aftre more thinking, I can imagine that 5 and 6 can be explained also.

But I can't wrap my mind around cases 4 and 3 (the latter additionally to my previous post).

Erik.
--
elns@ nl | Merge the left part of these two lines into one,
xs4all. | respecting a character's position in a line.

Re: Argument handling of [regexp]

<b58a884a-8b22-4787-a876-46b59c96191dn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20083&group=comp.lang.tcl#20083

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a0c:aa19:0:b0:4ac:fb0:8a75 with SMTP id d25-20020a0caa19000000b004ac0fb08a75mr6664048qvb.36.1663376090065;
Fri, 16 Sep 2022 17:54:50 -0700 (PDT)
X-Received: by 2002:a05:6871:14a:b0:127:622a:1f06 with SMTP id
z10-20020a056871014a00b00127622a1f06mr4421802oab.113.1663376089622; Fri, 16
Sep 2022 17:54:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Fri, 16 Sep 2022 17:54:49 -0700 (PDT)
In-Reply-To: <nnd$229e98b8$18d06dd0@6288a100734de3bb>
Injection-Info: google-groups.googlegroups.com; posting-host=98.19.43.80; posting-account=f4QznQoAAAAjupLEpV87s_G-96g1Io1w
NNTP-Posting-Host: 98.19.43.80
References: <nnd$253242c0$043915ce@359b9e61de13cce5> <nnd$229e98b8$18d06dd0@6288a100734de3bb>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b58a884a-8b22-4787-a876-46b59c96191dn@googlegroups.com>
Subject: Re: Argument handling of [regexp]
From: bgriffinfortytwo@gmail.com (briang)
Injection-Date: Sat, 17 Sep 2022 00:54:50 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2579
 by: briang - Sat, 17 Sep 2022 00:54 UTC

On Friday, September 16, 2022 at 12:19:43 PM UTC-7, Erik Leunissen wrote:
> On 16/09/2022 21:06, Erik Leunissen wrote:
> > Here are the results of six invocations of the "regexp" command.
> >
> > For one of them I'm sure that the result is correct (1).
> >
> > For some of them I'm unsure (2, 3). I wouldn't be surprised if the result can be explained to be
> > correct though.
> >
> > However, for invocations 4, 5 and 6 I definitely can't imagine how the results can be correct:
> >
> Aftre more thinking, I can imagine that 5 and 6 can be explained also.
>
> But I can't wrap my mind around cases 4 and 3 (the latter additionally to my previous post).
> Erik.
> --
> elns@ nl | Merge the left part of these two lines into one,
> xs4all. | respecting a character's position in a line.

The arguments typed in the source code is not percisly what the command actually sees. This is explained by reading the rules of Tcl, closely.
The best way to demonstrate this is by the following example:

proc myregexp {args} {
puts -nonewline "regexp "
foreach arg $args {
puts -nonewline "$arg "
}
puts ""
}

myregexp - $str; #1
myregexp \- $str; #2
myregexp {-} $str; #3
myregexp -\\ $str; #4
myregexp \\- $str; #5
set initstring "-"
myregexp $initstring $str; #6

The results:
regexp - a-z
regexp - a-z
regexp - a-z
regexp -\ a-z
regexp \- a-z
regexp - a-z

-Brian

Re: Argument handling of [regexp]

<nnd$0caa8fd3$44c38567@4ab2bfb9621b3cac>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20084&group=comp.lang.tcl#20084

  copy link   Newsgroups: comp.lang.tcl
From: look@the.footer.invalid (Erik Leunissen)
Subject: Re: Argument handling of [regexp]
Newsgroups: comp.lang.tcl
References: <nnd$253242c0$043915ce@359b9e61de13cce5>
<nnd$229e98b8$18d06dd0@6288a100734de3bb>
<b58a884a-8b22-4787-a876-46b59c96191dn@googlegroups.com>
X-Mozilla-News-Host: news://news.xs4all.nl
Date: Sat, 17 Sep 2022 15:39:49 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.2.1
MIME-Version: 1.0
In-Reply-To: <b58a884a-8b22-4787-a876-46b59c96191dn@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
Message-ID: <nnd$0caa8fd3$44c38567@4ab2bfb9621b3cac>
Organization: KPN B.V.
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe005.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 38
Injection-Date: Sat, 17 Sep 2022 15:39:49 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2076
 by: Erik Leunissen - Sat, 17 Sep 2022 13:39 UTC

On 17/09/2022 02:54, briang wrote:
>
> The arguments typed in the source code is not percisly what the command actually sees. This is explained by reading the rules of Tcl, closely.

Thanks Brian, I will investigate what you indicate.
Nonetheless, these results ... :

>
> The results:
> regexp - a-z
> regexp - a-z
> regexp - a-z
> regexp -\ a-z
> regexp \- a-z
> regexp - a-z
>

.... indicate that the regexp command sees identical arguments for cases 1, 2, 3 and 6.
However, the *results* of the command invocations for these cases are not the same.

Therefore, this non-correspondence still puzzles me, and right now I can't imagine how any rule can
make that correspond. Nevertheless, I will have a look at "the rules of Tcl". Just to be sure: do
you mean the dodekalogue as in:

https://wiki.tcl-lang.org/page/Dodekalogue

Regards,
Erik
--

> -Brian
>

--
elns@ nl | Merge the left part of these two lines into one,
xs4all. | respecting a character's position in a line.

Re: Argument handling of [regexp]

<1a0cab08-d263-466c-bd8d-4f737ff60b15n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20085&group=comp.lang.tcl#20085

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:622a:178c:b0:35b:b8cc:e711 with SMTP id s12-20020a05622a178c00b0035bb8cce711mr8488469qtk.111.1663426153758;
Sat, 17 Sep 2022 07:49:13 -0700 (PDT)
X-Received: by 2002:a05:6870:d285:b0:12b:cdce:63d8 with SMTP id
d5-20020a056870d28500b0012bcdce63d8mr11200669oae.140.1663426153533; Sat, 17
Sep 2022 07:49:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Sat, 17 Sep 2022 07:49:13 -0700 (PDT)
In-Reply-To: <nnd$0caa8fd3$44c38567@4ab2bfb9621b3cac>
Injection-Info: google-groups.googlegroups.com; posting-host=98.19.43.80; posting-account=f4QznQoAAAAjupLEpV87s_G-96g1Io1w
NNTP-Posting-Host: 98.19.43.80
References: <nnd$253242c0$043915ce@359b9e61de13cce5> <nnd$229e98b8$18d06dd0@6288a100734de3bb>
<b58a884a-8b22-4787-a876-46b59c96191dn@googlegroups.com> <nnd$0caa8fd3$44c38567@4ab2bfb9621b3cac>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1a0cab08-d263-466c-bd8d-4f737ff60b15n@googlegroups.com>
Subject: Re: Argument handling of [regexp]
From: bgriffinfortytwo@gmail.com (briang)
Injection-Date: Sat, 17 Sep 2022 14:49:13 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1971
 by: briang - Sat, 17 Sep 2022 14:49 UTC

On Saturday, September 17, 2022 at 6:39:54 AM UTC-7, Erik Leunissen wrote:
> On 17/09/2022 02:54, briang wrote:
> >
> > The arguments typed in the source code is not percisly what the command actually sees. This is explained by reading the rules of Tcl, closely.
> Thanks Brian, I will investigate what you indicate.
> Nonetheless, these results ... :
> >
> > The results:
> > regexp - a-z
> > regexp - a-z
> > regexp - a-z
> > regexp -\ a-z
> > regexp \- a-z
> > regexp - a-z
> >
> ... indicate that the regexp command sees identical arguments for cases 1, 2, 3 and 6.
> However, the *results* of the command invocations for these cases are not the same.

I see what you mean. That is strange.

-Brian

Re: Argument handling of [regexp]

<nnd$0576d185$54e2ac18@9dade5d48646e821>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20086&group=comp.lang.tcl#20086

  copy link   Newsgroups: comp.lang.tcl
Date: Sat, 17 Sep 2022 17:09:05 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.0.2
Subject: Re: Argument handling of [regexp]
Content-Language: nl-NL, en-US
Newsgroups: comp.lang.tcl
References: <nnd$253242c0$043915ce@359b9e61de13cce5>
From: nospam@wanadoo.nl (Schelte)
In-Reply-To: <nnd$253242c0$043915ce@359b9e61de13cce5>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <nnd$0576d185$54e2ac18@9dade5d48646e821>
Organization: KPN B.V.
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe006.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail
Lines: 45
Injection-Date: Sat, 17 Sep 2022 17:09:05 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2169
 by: Schelte - Sat, 17 Sep 2022 15:09 UTC

On 16/09/2022 21:06, Erik Leunissen wrote:
> % regexp - $str; #1
> bad option "-": must be -all, -about, -indices, -inline, -expanded,
> -line, -linestop, -lineanchor, -nocase, -start, or --
> % regexp \- $str; #2
> 1
While these should be the exact same thing, they produce different byte
codes:

% ::tcl::unsupported::disassemble script {regexp - $str}
ByteCode 0x0x555be8c65680, refCt 1, epoch 17, interp 0x0x555be8bfa380
(epoch 17)
Source "regexp - $str"
Cmds 1, src 13, inst 10, litObjs 3, aux 0, stkDepth 3, code/src 0.00
Commands 1:
1: pc 0-8, src 0-12
Command 1: "regexp - $str"
(0) push1 0 # "regexp"
(2) push1 1 # "-"
(4) push1 2 # "str"
(6) loadStk
(7) invokeStk1 3
(9) done

% ::tcl::unsupported::disassemble script {regexp \- $str}
ByteCode 0x0x555be8c66180, refCt 1, epoch 17, interp 0x0x555be8bfa380
(epoch 17)
Source "regexp \- $str"
Cmds 1, src 14, inst 8, litObjs 2, aux 0, stkDepth 2, code/src 0.00
Commands 1:
1: pc 0-6, src 0-13
Command 1: "regexp \- $str"
(0) push1 0 # "-"
(2) push1 1 # "str"
(4) loadStk
(5) regexp +3
(7) done

That seems to me like there's a bug lurking somewhere.

Schelte.

Re: Argument handling of [regexp]

<b0703847-4ec1-4d5b-8dc1-88472ced479bn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20089&group=comp.lang.tcl#20089

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:622a:138b:b0:35b:b619:b87d with SMTP id o11-20020a05622a138b00b0035bb619b87dmr10918982qtk.146.1663495654758;
Sun, 18 Sep 2022 03:07:34 -0700 (PDT)
X-Received: by 2002:a9d:2964:0:b0:655:8471:5189 with SMTP id
d91-20020a9d2964000000b0065584715189mr5831110otb.384.1663495654503; Sun, 18
Sep 2022 03:07:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Sun, 18 Sep 2022 03:07:34 -0700 (PDT)
In-Reply-To: <nnd$0576d185$54e2ac18@9dade5d48646e821>
Injection-Info: google-groups.googlegroups.com; posting-host=84.115.229.36; posting-account=Od2xOAoAAACEyRX3Iu5rYt4oevuoeYUG
NNTP-Posting-Host: 84.115.229.36
References: <nnd$253242c0$043915ce@359b9e61de13cce5> <nnd$0576d185$54e2ac18@9dade5d48646e821>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b0703847-4ec1-4d5b-8dc1-88472ced479bn@googlegroups.com>
Subject: Re: Argument handling of [regexp]
From: martin.heinrich@frequentis.com (heinrichmartin)
Injection-Date: Sun, 18 Sep 2022 10:07:34 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4586
 by: heinrichmartin - Sun, 18 Sep 2022 10:07 UTC

This message evolved in a non-linear way while thinking/trying. I hope I cleaned up enough to prevent confusion ...

On Saturday, September 17, 2022 at 5:09:10 PM UTC+2, Schelte wrote:
> That seems to me like there's a bug lurking somewhere.

Just guessing: the byte-code compiler treats regexp special; and in this case it gets switch-handling wrong, i.e. it does not obey the Dodekalogue for switches.

Not just guessing: if we override regexp, the issue is gone.

set str foo
eval {regexp \- $str} ;# 0
rename regexp tcl_regexp
proc regexp args {tailcall tcl_regexp {*}$args}
eval {regexp \- $str} ;# bad option "-"

Using -- fixes the issue, too.

eval {regexp -- \- $str} ;# 0
eval {regexp -- - $str} ;# 0

Back to the original command. Still with -- in place, byte-code (obviously?) differs from Schelte's one ...

% set tcl_patchLevel
8.6.4
% ::tcl::unsupported::disassemble script {regexp -- \- $str}
ByteCode 0x0x23c31d0, refCt 1, epoch 15, interp 0x0x22fc680 (epoch 15)
Source "regexp -- \- $str"
Cmds 1, src 17, inst 8, litObjs 2, aux 0, stkDepth 2, code/src 0.00
Commands 1:
1: pc 0-6, src 0-16
Command 1: "regexp -- \- $str"
(0) push1 0 # "-"
(2) push1 1 # "str"
(4) loadStk
(5) regexp +3
(7) done

% ::tcl::unsupported::disassemble script {regexp -- - $str}
ByteCode 0x0x23c33d0, refCt 1, epoch 15, interp 0x0x22fc680 (epoch 15)
Source "regexp -- - $str"
Cmds 1, src 16, inst 8, litObjs 2, aux 0, stkDepth 2, code/src 0.00
Commands 1:
1: pc 0-6, src 0-15
Command 1: "regexp -- - $str"
(0) push1 0 # "*-*"
(2) push1 1 # "str"
(4) loadStk
(5) strmatch +0
(7) done

Now, this is really interesting: Tcl optimizes regexp by replacing it with string match for trivial pattern "-".
And now, I can also understand the byte-code better.

> On 16/09/2022 21:06, Erik Leunissen wrote:
> > % regexp - $str; #1
> > bad option "-": must be -all, -about, -indices, -inline, -expanded,
> > -line, -linestop, -lineanchor, -nocase, -start, or --
> > % regexp \- $str; #2
> > 1
> While these should be the exact same thing, they produce different byte
> codes:
>
> % ::tcl::unsupported::disassemble script {regexp - $str}
> ByteCode 0x0x555be8c65680, refCt 1, epoch 17, interp 0x0x555be8bfa380
> (epoch 17)
> Source "regexp - $str"
> Cmds 1, src 13, inst 10, litObjs 3, aux 0, stkDepth 3, code/src 0.00
> Commands 1:
> 1: pc 0-8, src 0-12
> Command 1: "regexp - $str"
> (0) push1 0 # "regexp"
> (2) push1 1 # "-"
> (4) push1 2 # "str"
> (6) loadStk
> (7) invokeStk1 3
> (9) done

Byte-code compiler cannot optimize regexp with invalid "switch" "-"; therefore, it simply invokes the actual proc (that will bail out).

> % ::tcl::unsupported::disassemble script {regexp \- $str}
> ByteCode 0x0x555be8c66180, refCt 1, epoch 17, interp 0x0x555be8bfa380
> (epoch 17)
> Source "regexp \- $str"
> Cmds 1, src 14, inst 8, litObjs 2, aux 0, stkDepth 2, code/src 0.00
> Commands 1:
> 1: pc 0-6, src 0-13
> Command 1: "regexp \- $str"
> (0) push1 0 # "-"
> (2) push1 1 # "str"
> (4) loadStk
> (5) regexp +3
> (7) done
The byte-code compiler seems to not detect the erroneous first argument. It fails to apply backslash substitution before looking for switches ... therefore, it produces the "correct" invocation of regexp.

Re: Argument handling of [regexp]

<nnd$1554c763$1a8a023d@f29b91ccb580e48a>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20090&group=comp.lang.tcl#20090

  copy link   Newsgroups: comp.lang.tcl
Subject: Re: Argument handling of [regexp]
Newsgroups: comp.lang.tcl
References: <nnd$253242c0$043915ce@359b9e61de13cce5>
From: look@the.footer.invalid (Erik Leunissen)
Date: Sun, 18 Sep 2022 13:34:54 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.1
MIME-Version: 1.0
In-Reply-To: <nnd$253242c0$043915ce@359b9e61de13cce5>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
Message-ID: <nnd$1554c763$1a8a023d@f29b91ccb580e48a>
Organization: KPN B.V.
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!feeder.usenetexpress.com!tr1.eu1.usenetexpress.com!94.232.112.245.MISMATCH!feed.abavia.com!abe005.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail
Lines: 9
Injection-Date: Sun, 18 Sep 2022 13:34:54 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
 by: Erik Leunissen - Sun, 18 Sep 2022 11:34 UTC

A bug report, referring to this discussion thread, has been filed at:

https://core.tcl-lang.org/tcl/tktview?name=697b1bbfe3

Erik.
--
elns@ nl | Merge the left part of these two lines into one,
xs4all. | respecting a character's position in a line.

Re: Argument handling of [regexp]

<1d7f1bab-75b8-4ec2-898c-cae83b53e7ccn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=20091&group=comp.lang.tcl#20091

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:622a:15d4:b0:35c:dda3:7bc5 with SMTP id d20-20020a05622a15d400b0035cdda37bc5mr5032252qty.676.1663511161297;
Sun, 18 Sep 2022 07:26:01 -0700 (PDT)
X-Received: by 2002:a05:6808:23d6:b0:350:7776:906a with SMTP id
bq22-20020a05680823d600b003507776906amr3589617oib.157.1663511161089; Sun, 18
Sep 2022 07:26:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Sun, 18 Sep 2022 07:26:00 -0700 (PDT)
In-Reply-To: <b0703847-4ec1-4d5b-8dc1-88472ced479bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=84.115.229.36; posting-account=Od2xOAoAAACEyRX3Iu5rYt4oevuoeYUG
NNTP-Posting-Host: 84.115.229.36
References: <nnd$253242c0$043915ce@359b9e61de13cce5> <nnd$0576d185$54e2ac18@9dade5d48646e821>
<b0703847-4ec1-4d5b-8dc1-88472ced479bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1d7f1bab-75b8-4ec2-898c-cae83b53e7ccn@googlegroups.com>
Subject: Re: Argument handling of [regexp]
From: martin.heinrich@frequentis.com (heinrichmartin)
Injection-Date: Sun, 18 Sep 2022 14:26:01 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4481
 by: heinrichmartin - Sun, 18 Sep 2022 14:26 UTC

"Why are other commends with options not affected? Are they?" came to my mind.
Well, lsearch is not.

lsearch has optional arguments only on one side of the required ones, but regexp has options and optional trailing arguments, which makes interpretation ambiguous and therefore requires --.
But calling *regexp with exactly two arguments is not ambiguous* at all! However, the man page clearly states "If the initial arguments to regexp start with - then they are treated as switches.".

Having that said, my interpretation could have been wrong:

On Sunday, September 18, 2022 at 12:07:36 PM UTC+2, heinrichmartin wrote:
> > On 16/09/2022 21:06, Erik Leunissen wrote:
> > > % regexp - $str; #1
> > > bad option "-": must be -all, -about, -indices, -inline, -expanded,
> > > -line, -linestop, -lineanchor, -nocase, -start, or --
> > > % regexp \- $str; #2
> > > 1
> > While these should be the exact same thing, they produce different byte
> > codes:
> >
> > % ::tcl::unsupported::disassemble script {regexp - $str}
> > ByteCode 0x0x555be8c65680, refCt 1, epoch 17, interp 0x0x555be8bfa380
> > (epoch 17)
> > Source "regexp - $str"
> > Cmds 1, src 13, inst 10, litObjs 3, aux 0, stkDepth 3, code/src 0.00
> > Commands 1:
> > 1: pc 0-8, src 0-12
> > Command 1: "regexp - $str"
> > (0) push1 0 # "regexp"
> > (2) push1 1 # "-"
> > (4) push1 2 # "str"
> > (6) loadStk
> > (7) invokeStk1 3
> > (9) done
> Byte-code compiler cannot optimize regexp with invalid "switch" "-"; therefore, it simply invokes the actual proc (that will bail out).
> > % ::tcl::unsupported::disassemble script {regexp \- $str}
> > ByteCode 0x0x555be8c66180, refCt 1, epoch 17, interp 0x0x555be8bfa380
> > (epoch 17)
> > Source "regexp \- $str"
> > Cmds 1, src 14, inst 8, litObjs 2, aux 0, stkDepth 2, code/src 0.00
> > Commands 1:
> > 1: pc 0-6, src 0-13
> > Command 1: "regexp \- $str"
> > (0) push1 0 # "-"
> > (2) push1 1 # "str"
> > (4) loadStk
> > (5) regexp +3
> > (7) done
> The byte-code compiler seems to not detect the erroneous first argument. It fails to apply backslash substitution before looking for switches ... therefore, it produces the "correct" invocation of regexp.

Or byte-code compiler has a shortcut for exactly two arguments, which is against the doc.

As stopping here is unsatisfying ... for those who are interested:

2062 /*
2063 * We are only interested in compiling simple regexp cases. Currently
2064 * supported compile cases are:
2065 * regexp ?-nocase? ?--? staticString $var
2066 * regexp ?-nocase? ?--? {^staticString$} $var
2067 */

And a few lines later, we can confirm that the compiler is _not_ looking at the arguments, if there are only two of them.

2084 for (i = 1; i < parsePtr->numWords - 2; i++) {

Finally, let's cross-check by adding more args:

% set str foo
% eval {regexp \- $str match} ;# bails out
% eval {regexp \- $str} ;# 0

Bottom line: Byte-code compiler fails to implement "If the initial arguments to regexp start with - then they are treated as switches."; it should refuse to compile (i.e. leave to runtime), if the first of two words starts with a dash.


devel / comp.lang.tcl / Argument handling of [regexp]

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor