Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"Gravitation cannot be held responsible for people falling in love." -- Albert Einstein


devel / comp.lang.awk / Re: gsub() & Escaped characters in strings

SubjectAuthor
* Re: gsub() & Escaped characters in stringsEd Morton
`* Re: gsub() & Escaped characters in stringsOğuz
 `* Re: gsub() & Escaped characters in stringsOğuz
  `- Re: gsub() & Escaped characters in stringsLuuk

1
Re: gsub() & Escaped characters in strings

<s6ejp4$fn9$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=791&group=comp.lang.awk#791

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.lang.awk
Subject: Re: gsub() & Escaped characters in strings
Date: Thu, 29 Apr 2021 10:36:04 -0500
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <s6ejp4$fn9$1@dont-email.me>
References: <04ced886-346d-48a4-8c61-9bc92bfe78b0n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Apr 2021 15:36:04 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2d4341b148e19f3c4efe3b4d5e04cf87";
logging-data="16105"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Zb1XGI5PKXuDp2lTlJrA/"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:KyJVYKC4TAjkIWL/k7IQR2Fp2Ic=
In-Reply-To: <04ced886-346d-48a4-8c61-9bc92bfe78b0n@googlegroups.com>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210429-0, 04/28/2021), Outbound message
 by: Ed Morton - Thu, 29 Apr 2021 15:36 UTC

On 4/28/2021 8:37 PM, J Naman wrote:
> # I do not understand why gsub() seems to quadruple scan escaped characters inside
> # strings when the number of escaped characters is > 6. See below.
> BEGIN{ # quick test of regexpr
> # to match path = ...\foo\...
> # (path ~ "\\\\foo\\\\") is required
> # looking for "\\\\foo\\\\" string constant <==> regexp /\\foo\\/
> # in every case below, gsub() returns 2 = number of substitutions (as expected)
> x=";foo;"; n=gsub(/;/,"\\",x); printf("# returns %s\n",x)
> x=";foo;"; n=gsub(/;/,"\\\\",x); printf("# returns %s\n",x)
> x=";foo;"; n=gsub(/;/,"\\\\\\",x); printf("# returns %s\n",x)
> x=";foo;"; n=gsub(/;/,"\\\\\\\\",x); printf("# returns %s\n",x)
> x=";foo;"; n=gsub(/;/,"\\\\\\\\\\",x); printf("# returns %s\n",x)
> x=";foo;"; n=gsub(/;/,"\\\\\\\\\\\\",x); printf("# returns %s\n",x)
> }
> # 2 \s returns 1 escaped: \foo\ as expected
> # 4 \s returns 2 escaped: \\foo\\ as expected
> # 6 \s returns 3 escaped: \\\foo\\\ as expected
> # 8 \s returns 2 escaped: \\foo\\ NOT 4 \s
> # 10 \s returns 3 escaped: \\\foo\\\ NOT 6 \s
> # 12 \s returns 4 escaped: \\\\foo\\\\ ! finally get 4 \s!
> # Can anyone explain why 8+ \s are different?
> # Why gsub(/;/,"\\\\",x) == gsub(/;/,"\\\\\\\\",x)
> Thanks, john
>

I'm _guessing_ it's because the string gets interpreted twice, once when
the awk interpreter reads it and then again when it uses it, so for the
2 passes of interpretation, depending on how the "use" phase interprets
pairs of backslashes, we could get:

\\ -> read -> \ -> use -> \
\\\\ -> read -> \\ -> use -> \ or \\
\\\\\\ -> read -> \\\ -> use -> \\\
\\\\\\\\ -> read -> \\\\ -> use -> \\ or \\\\

Now WHY any given awk when using the string would interpret 4
backslashes as 2 but not 2 backslashes as 1, I can't guess.

Regards,

Ed.

Re: gsub() & Escaped characters in strings

<28b93171-68ae-4aab-a00b-2be4c8d7cfden@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=792&group=comp.lang.awk#792

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:ac8:6785:: with SMTP id b5mr502388qtp.296.1619717600257;
Thu, 29 Apr 2021 10:33:20 -0700 (PDT)
X-Received: by 2002:a25:d348:: with SMTP id e69mr839761ybf.299.1619717599907;
Thu, 29 Apr 2021 10:33:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Thu, 29 Apr 2021 10:33:19 -0700 (PDT)
In-Reply-To: <s6ejp4$fn9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=95.7.63.43; posting-account=RbOzpwoAAACSDI6OO1wVarfPakNstxUl
NNTP-Posting-Host: 95.7.63.43
References: <04ced886-346d-48a4-8c61-9bc92bfe78b0n@googlegroups.com> <s6ejp4$fn9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <28b93171-68ae-4aab-a00b-2be4c8d7cfden@googlegroups.com>
Subject: Re: gsub() & Escaped characters in strings
From: oguzismailuysal@gmail.com (Oğuz)
Injection-Date: Thu, 29 Apr 2021 17:33:20 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Oğuz - Thu, 29 Apr 2021 17:33 UTC

On Thursday, April 29, 2021 at 6:36:06 PM UTC+3, Ed Morton wrote:
> On 4/28/2021 8:37 PM, J Naman wrote:
> > # I do not understand why gsub() seems to quadruple scan escaped characters inside
> > # strings when the number of escaped characters is > 6. See below.
> > BEGIN{ # quick test of regexpr
> > # to match path = ...\foo\...
> > # (path ~ "\\\\foo\\\\") is required
> > # looking for "\\\\foo\\\\" string constant <==> regexp /\\foo\\/
> > # in every case below, gsub() returns 2 = number of substitutions (as expected)
> > x=";foo;"; n=gsub(/;/,"\\",x); printf("# returns %s\n",x)
> > x=";foo;"; n=gsub(/;/,"\\\\",x); printf("# returns %s\n",x)
> > x=";foo;"; n=gsub(/;/,"\\\\\\",x); printf("# returns %s\n",x)
> > x=";foo;"; n=gsub(/;/,"\\\\\\\\",x); printf("# returns %s\n",x)
> > x=";foo;"; n=gsub(/;/,"\\\\\\\\\\",x); printf("# returns %s\n",x)
> > x=";foo;"; n=gsub(/;/,"\\\\\\\\\\\\",x); printf("# returns %s\n",x)
> > }
> > # 2 \s returns 1 escaped: \foo\ as expected
> > # 4 \s returns 2 escaped: \\foo\\ as expected
> > # 6 \s returns 3 escaped: \\\foo\\\ as expected
> > # 8 \s returns 2 escaped: \\foo\\ NOT 4 \s
> > # 10 \s returns 3 escaped: \\\foo\\\ NOT 6 \s
> > # 12 \s returns 4 escaped: \\\\foo\\\\ ! finally get 4 \s!
> > # Can anyone explain why 8+ \s are different?
> > # Why gsub(/;/,"\\\\",x) == gsub(/;/,"\\\\\\\\",x)
> > Thanks, john
> >
> I'm _guessing_ it's because the string gets interpreted twice, once when
> the awk interpreter reads it and then again when it uses it, so for the
> 2 passes of interpretation, depending on how the "use" phase interprets
> pairs of backslashes, we could get:
>
> \\ -> read -> \ -> use -> \
> \\\\ -> read -> \\ -> use -> \ or \\
> \\\\\\ -> read -> \\\ -> use -> \\\
> \\\\\\\\ -> read -> \\\\ -> use -> \\ or \\\\

According to POSIX, when used as the second argument to sub or gsub, "\\" and "\\\\" are the same (one backslash) unless the former is followed by an ampersand. The same goes for "\\\\\\" and "\\\\\\\\" (two backslashes). Like, given the following program,
BEGIN { x = "y"; sub(/y/, "\\y\\\\y\\\\\\y\\\\\\\\y", x); print x }
a POSIX-conformant awk should output:
\y\y\\y\\y

Now, I have busybox awk, gawk, mawk, nawk, and NetBSD awk installed on my computer, and none of them gives that output.

>
> Now WHY any given awk when using the string would interpret 4
> backslashes as 2 but not 2 backslashes as 1, I can't guess.
>
> Regards,
>
> Ed.

Re: gsub() & Escaped characters in strings

<fca11069-864d-4427-84cc-5e19f7662c33n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=793&group=comp.lang.awk#793

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a37:c209:: with SMTP id i9mr812853qkm.363.1619717728831;
Thu, 29 Apr 2021 10:35:28 -0700 (PDT)
X-Received: by 2002:a25:258c:: with SMTP id l134mr925863ybl.374.1619717728580;
Thu, 29 Apr 2021 10:35:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Thu, 29 Apr 2021 10:35:28 -0700 (PDT)
In-Reply-To: <28b93171-68ae-4aab-a00b-2be4c8d7cfden@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=95.7.63.43; posting-account=RbOzpwoAAACSDI6OO1wVarfPakNstxUl
NNTP-Posting-Host: 95.7.63.43
References: <04ced886-346d-48a4-8c61-9bc92bfe78b0n@googlegroups.com>
<s6ejp4$fn9$1@dont-email.me> <28b93171-68ae-4aab-a00b-2be4c8d7cfden@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fca11069-864d-4427-84cc-5e19f7662c33n@googlegroups.com>
Subject: Re: gsub() & Escaped characters in strings
From: oguzismailuysal@gmail.com (Oğuz)
Injection-Date: Thu, 29 Apr 2021 17:35:28 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Oğuz - Thu, 29 Apr 2021 17:35 UTC

On Thursday, April 29, 2021 at 8:33:21 PM UTC+3, Oğuz wrote:
> Now, I have busybox awk, gawk, mawk, nawk, and NetBSD awk installed on my computer, and none of them gives that output.
No, sorry, actually mawk does.
$ mawk 'BEGIN { x = "y"; sub(/y/, "\\y\\\\y\\\\\\y\\\\\\\\y", x); print x }'
\y\y\\y\\y

Re: gsub() & Escaped characters in strings

<608b06d7$0$21169$e4fe514c@news.xs4all.nl>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=794&group=comp.lang.awk#794

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!news.swapon.de!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail
Subject: Re: gsub() & Escaped characters in strings
Newsgroups: comp.lang.awk
References: <04ced886-346d-48a4-8c61-9bc92bfe78b0n@googlegroups.com>
<s6ejp4$fn9$1@dont-email.me>
<28b93171-68ae-4aab-a00b-2be4c8d7cfden@googlegroups.com>
<fca11069-864d-4427-84cc-5e19f7662c33n@googlegroups.com>
From: luuk@invalid.lan (Luuk)
Date: Thu, 29 Apr 2021 21:19:51 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <fca11069-864d-4427-84cc-5e19f7662c33n@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Lines: 23
Message-ID: <608b06d7$0$21169$e4fe514c@news.xs4all.nl>
NNTP-Posting-Host: eb240d1f.news.xs4all.nl
X-Trace: G=M0IPXOzH,C=U2FsdGVkX1/20iceXVggCfSSGZDapMOQi+9PcU2awGq0J++gXgx2E9uPLS1ARrL0la5Y6kHD1Ekk1w2VKwVHlPkZHGo1gp82oR6O2n6BwBeLXrv+ZGvsPL1VQQUf0ltc
X-Complaints-To: abuse@xs4all.nl
 by: Luuk - Thu, 29 Apr 2021 19:19 UTC

On 29-4-2021 19:35, Oğuz wrote:
> On Thursday, April 29, 2021 at 8:33:21 PM UTC+3, Oğuz wrote:
>> Now, I have busybox awk, gawk, mawk, nawk, and NetBSD awk installed on my computer, and none of them gives that output.
> No, sorry, actually mawk does.
> $ mawk 'BEGIN { x = "y"; sub(/y/, "\\y\\\\y\\\\\\y\\\\\\\\y", x); print x }'
> \y\y\\y\\y
>

D:\TEMP>gawk "BEGIN { x = \"y\"; sub(/y/, \"\\y\\\\y\\\\\\y\\\\\\\\y\",
x); print x }"
\y\\y\\\y\\y

D:\TEMP>gawk -P "BEGIN { x = \"y\"; sub(/y/,
\"\\y\\\\y\\\\\\y\\\\\\\\y\", x); print x }"
\y\y\\y\\y

D:\TEMP>gawk --version
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 3.1.5, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2020 Free Software Foundation.

This pro.......

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor