Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

backups: always in season, never out of style.


devel / comp.lang.postscript / C scanner

SubjectAuthor
o C scannerluser droog

1
C scanner

<bc506bd2-6516-4a4e-8e78-caa37fee9c0fn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=102&group=comp.lang.postscript#102

  copy link   Newsgroups: comp.lang.postscript
X-Received: by 2002:ad4:5aeb:: with SMTP id c11mr1654632qvh.69.1639437658633;
Mon, 13 Dec 2021 15:20:58 -0800 (PST)
X-Received: by 2002:a05:6808:2216:: with SMTP id bd22mr1431174oib.27.1639437658324;
Mon, 13 Dec 2021 15:20:58 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.postscript
Date: Mon, 13 Dec 2021 15:20:58 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=97.87.183.68; posting-account=G1KGwgkAAAAyw4z0LxHH0fja6wAbo7Cz
NNTP-Posting-Host: 97.87.183.68
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bc506bd2-6516-4a4e-8e78-caa37fee9c0fn@googlegroups.com>
Subject: C scanner
From: luser.droog@gmail.com (luser droog)
Injection-Date: Mon, 13 Dec 2021 23:20:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 130
 by: luser droog - Mon, 13 Dec 2021 23:20 UTC

Sticking with PS version 12 of the parser combinators, I finished the
usual 3 examples (regex, PS scanner, JSON parser) and they seemed
pretty good and concise. So I translated my C scanner over from
the C version 9. It looks pretty good to me. Especially the helper
function `tokendef` which makes the parser add a tag to the return value.

Wrapping a lazy-input function around another lazy-input functions is
just weird. It seems to work when I run it stepwise in my head, but it
still looks weird the way it's written. It makes more sense when you
look at how `lazy-input` builds the function. But that part isn't new so
I won't include it here.

The big idea is at the bottom. Calling `token-input` with a string-input
and 2 zeros gives you a lazy stream of tagged token structures.
Calling `string-input` needs its own 2 zeros. So there's a lot of zeros
to put 'em together.

%errordict/typecheck{ps pe quit}put
(pc12.ps)run {
tokendef{ 1 index cvlit { exch cons one } curry using def }
cvsstr{ dup length string cvs }
strcat{ 2 copy length exch length add string % a b s
3 2 roll 2 copy 0 exch putinterval % b s a
length 3 2 roll 3 copy putinterval pop pop }
prefix{ exch strcat cvn }
} pairs-begin

/keywords {
int char
float double struct
auto extern
register static
goto return sizeof
break continue
if else
for do while
switch case default
} cvlit def
keywords { cvsstr dup (k_) prefix exch str tokendef } forall
/keyword-names keywords { cvsstr (k_) prefix } map def

/symbols {
star (*) plusplus (++) plus (+) dot (.)
arrow (->) minusminus (--) minus (-)
bangeq (!=) bang (!) tilde (~)
ampamp (&&) amp (&) eqeq (==) equal (=)
caret (^) pipepipe (||) pipe (|)
slant (/) percent (%)
ltlt (<<) lteq (<=) less (<)
gtgt (>>) gteq (>=) greater (>)
lparen (\() rparen (\))
comma (,) semi (;) colon (:) quest (?)
lbrace ({) rbrace (}) lbrack ([) rbrack (])
} cvlit def
symbols 2 { aload pop str tokendef } fortuple
/symbol-names [ symbols 2 { first } fortuple ] def

/assignops {
pluseq (+=) minuseq (-=)
stareq (*=) slanteq (/=) percenteq (%=)
gtgteq (>>=) ltlteq (<<=)
ampeq (&=) careteq (^=) pipeeq (|=)
} cvlit def
assignops 2 { aload pop str tokendef } fortuple

/comment (/*) str (*) noneof many (*) char then some then (/) then def
/space ( \t\n) anyof //comment alt many def

/alpha_ (a)(z)range (A)(Z)range alt (_)char alt def
/digit (0)(9)range def
/identifier //alpha_ //alpha_ //digit alt many then tokendef

/integer //digit some tokendef
/floating //digit some (.) char then //digit many then
(.) char //digit some then alt
(eE) anyof (+-) anyof maybe then //digit some then maybe then tokendef

/escape (\\) char
//digit //digit maybe then //digit maybe then
('"bnrt\\) anyof alt then def
/char_ //escape ('\n) noneof alt def
/schar_ //escape ("\n) noneof alt def
/character (') char //char_ then (') char then tokendef
/astring (") char //schar_ many then (") char then tokendef

/constant //floating //integer alt //character alt //astring alt tokendef

/symbolic [ keyword-names {load} forall
symbol-names {load} forall
assignops 2{first load} fortuple
counttomark 1 sub {alt} repeat exch pop def

/ctoken //space //constant //symbolic alt //identifier alt xthen def
/token-input{r c in}
{ in dup //ctoken exec +not-ok { true }{ exch pop second xs-x false } ifelse }
{ 4 3 roll } % xs [x[r c]] r' c' -> [x[r c]] r' c' xs
{ token-input } lazy-input def

0 0 ( aname another) string-input //ctoken exec report
0 0 ( ++ / * ) string-input //ctoken exec report
0 0 ( 37,x,y ) string-input //ctoken exec report
0 0 0 0 ( 37,x,y{12+q;} ) string-input token-input
dup first ==
next dup first ==
next dup first ==
next dup first ==
next dup first ==
next dup first ==
pc

quit

$ gsnd -q -dNOSAFER pc12ctok.ps
OK
[[/identifier [(a) (n) (a) (m) (e)]]]
remainder:[[( ) [0 6]] {0 7 (another) string-input}]
OK
[[/plusplus [(+) (+)]]]
remainder:{0 3 ( / * ) string-input}
OK
[[/constant [[/integer [(3) (7)]]]]]
remainder:[[(,) [0 3]] {0 4 (x,y ) string-input}]
[[[/constant [[/integer [(3) (7)]]]]] [0 0]]
[[[/comma (,)]] [0 1]]
[[[/identifier (x)]] [0 2]]
[[[/comma (,)]] [0 3]]
[[[/identifier (y)]] [0 4]]
[[[/lbrace ({)]] [0 5]]
stack:
[[[[/lbrace ({)]] [0 5]] {0 6 {0 8 (12+q;} ) string-input} token-input}]

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor