Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"Hello again, Peabody here..." -- Mister Peabody


devel / comp.compilers / Re: bison parser : retrieving values from recursive pattern

SubjectAuthor
* bison parser : retrieving values from recursive patternArchana Deshmukh
`- Re: bison parser : retrieving values from recursive patternKaz Kylheku

1
bison parser : retrieving values from recursive pattern

<23-07-001@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=830&group=comp.compilers#830

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: desharchana19@gmail.com (Archana Deshmukh)
Newsgroups: comp.compilers
Subject: bison parser : retrieving values from recursive pattern
Date: Thu, 06 Jul 2023 02:12:38 -0700
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-07-001@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="57725"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, yacc, comment
Posted-Date: 06 Jul 2023 16:08:55 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Archana Deshmukh - Thu, 6 Jul 2023 09:12 UTC

Hello,

I have a following rule

num :
| integer comma num
| integer closeroundbkt
| integer closesquarebkt

I need to parse data like
efg @main(%data: r[(1, 2, 4, 4), float32], %param_1: or[(2, 1, 5, 5), float32], %param_2: or[(20), float32], %param_3: or[(5, 2, 5, 5), float32], %param_4: or[(50), float32], %param_5: or[(50, 80), float32], %param_6: Tensor[(50), float32], %param_7: or[(10, 50), float32], %param_8: or[(20), float32]

I also need to retrieve these values and store to a lsit.

Retreiving and storing values for patterns like
| integer closeroundbkt
| integer closesquarebkt

is simple.

However, I am not able to find a way to retrieve and store recursive numbers from pattern

| integer comma num

Sometimes there can be 2 numbers (50, 80), sometimes there can be 4 numbers ((1, 2, 4, 4)). How to handle this?

Any suggestions are welcome.

Best Regards,
Archana Deshmukh
[For a list of numbers in parens I would do something like this:

parennumlist: '(' numlist ')' ;

numlist: integer
| numlist ',' integer ;

For the bracketed lists:

bracketlist: '[' parennumlist ',' datatype ']':

datatype: FLOAT32 | ... whatever other types there are ... ;

The usual way you do a variable length list is to make a recursive rule with one item
for a single item and another rule to add an item. Any book about compiler design should
give advice on writing grammar rules or my "flex & bison" has example grammars that
include lists. -John]

Re: bison parser : retrieving values from recursive pattern

<23-07-002@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=831&group=comp.compilers#831

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: bison parser : retrieving values from recursive pattern
Date: Fri, 07 Jul 2023 01:14:04 -0000
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-07-002@comp.compilers>
References: <23-07-001@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="50127"; mail-complaints-to="abuse@iecc.com"
Keywords: yacc, design
Posted-Date: 06 Jul 2023 21:17:02 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Fri, 7 Jul 2023 01:14 UTC

On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:
> Hello,
>
> I have a following rule
>
> num :
>| integer comma num
>| integer closeroundbkt
>| integer closesquarebkt
>

Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.

Consider a nice grammar like

list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'

items : items ',' item
| item
;

item : list | num | type | decl

decl : keyword ':' oper list

keyword : KW_main | KW_data | KW_param_1

type : TYPE_float32 | ...

oper : OPER_r | OPER_or

I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.

I.e. the over-generated grammar would allow nonsense like

@data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)

This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.

list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'

items : items ',' item
| item
;

item : list | num | SYMBOL | decl

decl : SYMBOL ':' SYMBOL list

Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.

Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:

SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */

The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!

In your program you can have initialization like this:

struct symbol *float32_s;

void global_init(void)
{
float32_s = intern("float32");

...
}

Then when the parser sees float32, it will produce
the same pointer.

The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;

void foo(struct symbol *x)
{
if (X == float32_s) {
// we are looking at the float32 symbol

}

}

Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.

Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor