Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

I *____knew* I had some reason for not logging you off... If I could just remember what it was.


devel / comp.lang.postscript / Re: Got the regular expression parser working again hurray!

SubjectAuthor
* Got the regular expression parser working again hurray!luser droog
`* Re: Got the regular expression parser working again hurray!Jeffrey H. Coffield
 `- Re: Got the regular expression parser working again hurray!luser droog

1
Got the regular expression parser working again hurray!

<5dc1e95f-c48c-4bfd-bdfb-d83a44234be8n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=88&group=comp.lang.postscript#88

  copy link   Newsgroups: comp.lang.postscript
X-Received: by 2002:ac8:5e0a:: with SMTP id h10mr56875944qtx.195.1636077884153;
Thu, 04 Nov 2021 19:04:44 -0700 (PDT)
X-Received: by 2002:a4a:5186:: with SMTP id s128mr3692165ooa.53.1636077883853;
Thu, 04 Nov 2021 19:04:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.postscript
Date: Thu, 4 Nov 2021 19:04:43 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=97.87.183.68; posting-account=G1KGwgkAAAAyw4z0LxHH0fja6wAbo7Cz
NNTP-Posting-Host: 97.87.183.68
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5dc1e95f-c48c-4bfd-bdfb-d83a44234be8n@googlegroups.com>
Subject: Got the regular expression parser working again hurray!
From: luser.droog@gmail.com (luser droog)
Injection-Date: Fri, 05 Nov 2021 02:04:44 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 115
 by: luser droog - Fri, 5 Nov 2021 02:04 UTC

For the umpteenth time I've rewritten my parser combinators.
With any luck, they finally have all the necessary bells and
whistles added to be useful for other stuff.

The code is in these 3 files. Trimmed of testing gibberish,
it's about 200 lines of code.
https://github.com/luser-dr00g/pcomb/blob/5efeca34f410e8855eff9d38ba21f80983b45afd/ps/pc11are.ps
https://github.com/luser-dr00g/pcomb/blob/5efeca34f410e8855eff9d38ba21f80983b45afd/ps/pc11a.ps
https://github.com/luser-dr00g/pcomb/blob/5efeca34f410e8855eff9d38ba21f80983b45afd/ps/struct2.ps

A parser is a function that takes an <input-stream> type and
yields a <result-structure> type. The <input-stream> is a lazy
list of [char [row col]] structures. The <result-structure> is a
two element array, the first element of which will be /OK /Fail or
/Error. For an /OK result, the second element will be
[result remainder]. For a /Fail or /Error result, the second element
will be [message remainder].

So here's the regular expression parser with this new setup.
And some simple testing code and output, which should be
comprehensible using the above description of the input/output.
In effect, this is a syntax-directed compiler from regular expressions
to the syntax for constructing parsers using these same combinators.

errordict/typecheck{ps pe quit}put
%errordict/stackunderflow{pe quit}put
%errordict/stackunderflow{pq}put
%errordict/undefined{pq}put
%(../../debug.ps/db5.ps) run
(pc11a.ps)run {
fix { flatten clean }
clean { { dup zero eq { pop } if } map }
? { {maybe} compose }
+ { {some} compose }
* { {many} compose }
} pairs-begin

/Dot (.) char {pop {item} one} using def
/Meta (*+?) anyof def
/Character (*+?.|()) noneof {first {literal} curry one} using def
/Expression {-777 exec} def
/Atom //Dot
(\() char //Expression executeonly xthen (\)) char thenx alt
//Character alt def
/Factor //Atom /A
//Meta {/A load first exch first load exec one } using
maybe {dup first zero eq {pop /A load} if } using
into def
/Term //Factor //Factor many then
{ fix { {then} compose compose } reduce one } using def
//Expression 0 //Term (|) char //Term xthen many then
{ fix { {plus} compose compose } reduce one } using put

/regex { 0 0 3 2 roll string-input //Expression exec report } def

{
0 0 (ab) string-input //Dot maybe exec pc
0 0 (ab) string-input //Meta exec pc
0 0 (*) string-input //Meta exec pc
0 0 (ab) string-input //Character maybe exec pc
0 0 (ab) string-input //Atom maybe exec pc
0 0 (.) string-input //Atom maybe exec pc
%0 0 (a*) string-input //Atom //Meta then ==
0 0 (a*) string-input //Atom //Meta then exec pc
0 0 (ab) string-input Factor pc
0 0 (a*) string-input Factor pc
0 0 (ab) string-input Term pc
0 0 (ab|c) string-input Expression pc
(ab) regex
} exec
{ } pop
quit

$ gsnd -dNOSAFER pc11are.ps
GPL Ghostscript 9.52 (2020-03-19)
Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
stack:
[/OK [[[] []] [[(a) [0 0]] {0 1 (b) string-input}]]]
:stack
stack:
[/Fail [[{(*+?) within} (not satisfied)] [[(a) [0 0]] {0 1 (b) string-input}]]]
:stack
stack:
[/OK [[(*) []] {0 1 () string-input}]]
:stack
stack:
[/OK [[{(a) literal} []] {0 1 (b) string-input}]]
:stack
stack:
[/OK [[{(a) literal} []] {0 1 (b) string-input}]]
:stack
stack:
[/OK [[{item} []] {0 1 () string-input}]]
:stack
stack:
[/OK [[{(a) literal} [(*) []]] {0 2 () string-input}]]
:stack
stack:
[/OK [[{(a) literal} []] [[(b) [0 1]] {0 2 () string-input}]]]
:stack
stack:
[/OK [[{(a) literal many} []] {0 2 () string-input}]]
:stack
stack:
[/OK [[{(a) literal (b) literal then} []] []]]
:stack
stack:
[/OK [[{(a) literal (b) literal then (c) literal plus} []] []]]
:stack
OK
[{(a) literal (b) literal then}]
remainder:[]

Re: Got the regular expression parser working again hurray!

<sm3hr9$qan$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=89&group=comp.lang.postscript#89

  copy link   Newsgroups: comp.lang.postscript
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jeffrey@digitalsynergyinc.com (Jeffrey H. Coffield)
Newsgroups: comp.lang.postscript
Subject: Re: Got the regular expression parser working again hurray!
Date: Fri, 5 Nov 2021 08:15:51 -0700
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <sm3hr9$qan$1@dont-email.me>
References: <5dc1e95f-c48c-4bfd-bdfb-d83a44234be8n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 5 Nov 2021 15:15:53 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2108ef5f0a3d6f07f53bbdb2c49a4e5e";
logging-data="26967"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18PIId3qM9HH1oaxpeu51ut107T0LpCvjM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.6.0
Cancel-Lock: sha1:ASAPalvFTI0ha38k7NtLiyeZysw=
In-Reply-To: <5dc1e95f-c48c-4bfd-bdfb-d83a44234be8n@googlegroups.com>
 by: Jeffrey H. Coffield - Fri, 5 Nov 2021 15:15 UTC

On 11/04/2021 07:04 PM, luser droog wrote:
> For the umpteenth time I've rewritten my parser combinators.
> With any luck, they finally have all the necessary bells and
> whistles added to be useful for other stuff.

I spent a significant amount of time writing several versions of a
generalized parser and never got all the "bells and whistles" that I
wanted. Then I came across Antlr4 which is in Java and while I admire
your intent to parse Postscript in Postscript, Antlr4 has tons of "bells
and whistles" and can be applied to practically any language, although I
don't see that anyone has posted a PostScript grammar yet. There about
250 different languages currently available at :

https://github.com/antlr/grammars-v4

There is a grammar for something called RPN which sounds like it could
be a starting point.

https://github.com/antlr/grammars-v4/tree/master/rpn

Antlr4 takes some time to learn and I'm not sure how much Java you would
need to know (I program about 80% in Java now) but if it sounded like
something you are interested in, I would be willing to help. There is a
great book "The Definitive Antlr 4 Reference".

I am not associated directly with Antlr4 but have submitted some patches.

Jeff Coffield
www.digitalsynergyinc.com

Re: Got the regular expression parser working again hurray!

<950f8de4-d105-439d-b22e-d04e74cc3aacn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=90&group=comp.lang.postscript#90

  copy link   Newsgroups: comp.lang.postscript
X-Received: by 2002:ae9:ef05:: with SMTP id d5mr11501382qkg.357.1636158071907;
Fri, 05 Nov 2021 17:21:11 -0700 (PDT)
X-Received: by 2002:a05:6830:1c6:: with SMTP id r6mr10310494ota.78.1636158071641;
Fri, 05 Nov 2021 17:21:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.postscript
Date: Fri, 5 Nov 2021 17:21:11 -0700 (PDT)
In-Reply-To: <sm3hr9$qan$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=97.87.183.68; posting-account=G1KGwgkAAAAyw4z0LxHH0fja6wAbo7Cz
NNTP-Posting-Host: 97.87.183.68
References: <5dc1e95f-c48c-4bfd-bdfb-d83a44234be8n@googlegroups.com> <sm3hr9$qan$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <950f8de4-d105-439d-b22e-d04e74cc3aacn@googlegroups.com>
Subject: Re: Got the regular expression parser working again hurray!
From: luser.droog@gmail.com (luser droog)
Injection-Date: Sat, 06 Nov 2021 00:21:11 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 66
 by: luser droog - Sat, 6 Nov 2021 00:21 UTC

On Friday, November 5, 2021 at 10:15:55 AM UTC-5, Jeffrey H. Coffield wrote:
> On 11/04/2021 07:04 PM, luser droog wrote:
> > For the umpteenth time I've rewritten my parser combinators.
> > With any luck, they finally have all the necessary bells and
> > whistles added to be useful for other stuff.
> I spent a significant amount of time writing several versions of a
> generalized parser and never got all the "bells and whistles" that I
> wanted. Then I came across Antlr4 which is in Java and while I admire
> your intent to parse Postscript in Postscript, Antlr4 has tons of "bells
> and whistles" and can be applied to practically any language, although I
> don't see that anyone has posted a PostScript grammar yet. There about
> 250 different languages currently available at :
>
> https://github.com/antlr/grammars-v4
>
> There is a grammar for something called RPN which sounds like it could
> be a starting point.
>
> https://github.com/antlr/grammars-v4/tree/master/rpn
>
> Antlr4 takes some time to learn and I'm not sure how much Java you would
> need to know (I program about 80% in Java now) but if it sounded like
> something you are interested in, I would be willing to help. There is a
> great book "The Definitive Antlr 4 Reference".
>
> I am not associated directly with Antlr4 but have submitted some patches.
>
> Jeff Coffield
> www.digitalsynergyinc.com

Thanks, I'll check those out. I've had Antlr suggested to me many times before
but I've shied away thus far, probably from squeamish feelings about Java.
(If only Sun NeWS had been successful, we might never have needed Java.)

But I had to use Java for several school projects last year while finally
completing my degree. It's not so scary anymore. I'm pretty happy with
the composability of my functions the way they work now, but there are
probably ways to optimize the execution or use a different model behind
the scenes and maintain a similar interface. I'm also going to need a
nice syntax for describing grammars so I can write a parser for /that/
and have it translate to the combinators. Then you can have any color
you want as long as it's black.

Which bells and whistles were you still missing? My latest version adds
error messages which were sorely lacking in previous ones. But I've also
gotten a handle on how to do lazy execution. And bizarrely enough, I've
been able to prototype in PostScript and then re-code in C, despite the
dissimilarity between those (the C one kinda pretends to be Lisp, tho).

For PostScript Level 1, the grammar is actually super simple.

<Object> :: <Single>
|| { <Object> * }

<Single> :: <Integer> | <Real> | <Name> | <String>

Some fiddly stuff further down the tree. But it's just the one production
for procedures that needs a CFG. The rest of the language is strictly
Regular. Level 2 adds a few more kinds of Single, like binary name
tokens and binary object encodings.

So probably no one has written a grammar for PS for popular parsing
libraries because it doesn't get you anywhere. You get just [Object
Object Object ...]. It would get you like 2% closer to being able to
write a translator or interpreter for PostScript. Whereas with almost
any other structured language the grammar gets you roughly 50% there.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor