Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"I will make no bargains with terrorist hardware." -- Peter da Silva


devel / comp.compilers / Re: Deep expression chain performance

SubjectAuthor
* Deep expression chain performanceAndy
+- Re: Deep expression chain performanceAnton Ertl
+- Re: Deep expression chain performancegah4
`* Re: Deep expression chain performanceKaz Kylheku
 `- Re: Deep expression chain performanceAnton Ertl

1
Deep expression chain performance

<23-04-012@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=785&group=comp.compilers#785

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: borucki.andrzej@gmail.com (Andy)
Newsgroups: comp.compilers
Subject: Deep expression chain performance
Date: Tue, 25 Apr 2023 06:51:55 -0700 (PDT)
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-04-012@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="39482"; mail-complaints-to="abuse@iecc.com"
Keywords: C, parse, question, comment
Posted-Date: 25 Apr 2023 12:21:40 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Andy - Tue, 25 Apr 2023 13:51 UTC

I am writing C grammar.
Grammar speed may be down caused by deep expression chain.
For example, simple "n=0" has 20 levels:
assignmentExpression
conditionalExpression
logicalOrExpression
logicalAndExpression
inclusiveOrExpression
exclusiveOrExpression
andExpression
eqaulityExpression
relationalExpression
shiftExpression
additiveExpression
multiplicativeExpression
castExpression
unaryExpression
postfixExpression
postfixExpressionLeft
atom
literal
IntegerConstant
DeciamlConstant

In other hand, Rust grammar for Antlr4
https://github.com/antlr/grammars-v4/blob/master/rust/RustParser.g4
has precedence definition:
expression
: outerAttribute+ expression # AttributedExpression // technical, remove left recursive
| literalExpression # LiteralExpression_
| pathExpression # PathExpression_
| expression '.' pathExprSegment '(' callParams? ')' # MethodCallExpression // 8.2.10
| expression '.' identifier # FieldExpression // 8.2.11
| expression '.' tupleIndex # TupleIndexingExpression // 8.2.7
| expression '.' 'await' # AwaitExpression // 8.2.18
...............................
If I do similar in C grammar:
expr
: atom # atom_
| '__extension__'? '(' compoundStatement ')' # groupedExpression
| expr '(' argumentExpressionList? ')' # postfixExpression
| expr '.' Identifier # postfixExpression
| expr '->' Identifier # postfixExpression
| expr '[' expr ']' # postfixExpression

parsing time increased by orders of magnitude: short .c files was processed longer a long file I didn't wait for it to finish (probably increasing nonlinear)

My ask:
form expr: expr operator expr
is always ambiguous and waste time or I something bad have written?
How is possible to reduce depth of chain?
Parsing time with Antlr is much longer than parsing + semantic actions with GCC. Probably big part of this time is long depth of expressions.
[If your parser is taking a lot of time, something is wrong. In the
compilers I know, the pareer is always a tiny part of the work, which
is dominated by the lexer which has to look at each character in the
input, and optimizers that have to make multiple passes over the whole
intermediate representation. -John]

Re: Deep expression chain performance

<23-04-013@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=786&group=comp.compilers#786

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Deep expression chain performance
Date: Wed, 26 Apr 2023 16:33:54 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-04-013@comp.compilers>
References: <23-04-012@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="8681"; mail-complaints-to="abuse@iecc.com"
Keywords: C, parse, performance
Posted-Date: 26 Apr 2023 12:50:51 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Wed, 26 Apr 2023 16:33 UTC

Andy <borucki.andrzej@gmail.com> writes:
[ANTLR]
>If I do similar in C grammar:
>expr
> : atom # atom_
> | '__extension__'? '(' compoundStatement ')' # groupedExpression
> | expr '(' argumentExpressionList? ')' # postfixExpression
> | expr '.' Identifier # postfixExpression
> | expr '->' Identifier # postfixExpression
> | expr '[' expr ']' # postfixExpression
>
>parsing time increased by orders of magnitude: short .c files was processed longer a long file I didn't wait for it to finish (probably increasing nonlinear)

I am not an expert on ANTLR, but it is based on LL-parsing, and
normally LL-based parsers endlessly recurse on left recursion.
However, given that your Rust grammar has left recursion, too, ANTLR
seems to be able to cope with that problem at least in some cases.

There are other features in ANTLR that cause superlinear parsing
times: syntactic predicates and semantic predicates. IIRC there can
also be slowdowns if a long lookahead is necessary.

>My ask:
>form expr: expr operator expr
>is always ambiguous

Yes.

>and waste time

Depends on how the ambiguity is dealt with. E.g., yacc resolves the
ambiguity in some way, and suppresses the alternative parse, and
therefore has linear performance. However, the suppressed alternative
may be the one you were interested in, so better define the grammar
unambiguously. I don't see such a rule in the example above, though.

By contrast, a GLR parser or Early parser keeps the alternatives in
some way, resulting in superexponential performance (n^3 for Early,
not sure for GLR).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: Deep expression chain performance

<23-04-014@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=787&group=comp.compilers#787

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah4@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Deep expression chain performance
Date: Wed, 26 Apr 2023 13:06:51 -0700 (PDT)
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-04-014@comp.compilers>
References: <23-04-012@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="23409"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, performance
Posted-Date: 26 Apr 2023 20:41:09 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-04-012@comp.compilers>
 by: gah4 - Wed, 26 Apr 2023 20:06 UTC

On Tuesday, April 25, 2023 at 9:21:44 AM UTC-7, Andy wrote:
> I am writing C grammar.
> Grammar speed may be down caused by deep expression chain.
> For example, simple "n=0" has 20 levels:
> assignmentExpression
> conditionalExpression
> logicalOrExpression
> logicalAndExpression
> inclusiveOrExpression
> exclusiveOrExpression
> andExpression
> eqaulityExpression
> relationalExpression
> shiftExpression
> additiveExpression
> multiplicativeExpression
> castExpression
> unaryExpression
> postfixExpression
> postfixExpressionLeft
> atom
> literal
> IntegerConstant
> DeciamlConstant

A not unusual compiler design in the olden (small memory) days, is recursive
descent
for statements, and operator precedence for expressions. If you do
expressions with
recursive descent, you get, as you note, 20 levels on the stack. Operator
precedence
avoids all those levels, and works well for expressions for most languages.

Others have explained how parser generators do it.

Otherwise, the design of small-memory compilers is a lost art.

Re: Deep expression chain performance

<23-04-015@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=788&group=comp.compilers#788

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: Deep expression chain performance
Date: Thu, 27 Apr 2023 00:28:55 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-04-015@comp.compilers>
References: <23-04-012@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="23892"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, performance
Posted-Date: 26 Apr 2023 20:42:41 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Thu, 27 Apr 2023 00:28 UTC

On 2023-04-25, Andy <borucki.andrzej@gmail.com> wrote:
> I am writing C grammar.
> Grammar speed may be down caused by deep expression chain.
> For example, simple "n=0" has 20 levels:
> assignmentExpression
> conditionalExpression
> logicalOrExpression
> logicalAndExpression
> inclusiveOrExpression
> exclusiveOrExpression
> andExpression
> eqaulityExpression
> relationalExpression
> shiftExpression
> additiveExpression
> multiplicativeExpression
> castExpression
> unaryExpression
> postfixExpression
> postfixExpressionLeft
> atom
> literal
> IntegerConstant
> DeciamlConstant

ISO C specifies its grammar in an obtusely verbose way, similar to the
above. The reason is that it makes the grammar unambiguous, meaning
that it does not require any annotation or external remarks about
ambiguities having to be resolved by a certain associativity or
precedence.

If you have some grammar processor which literally follows all those
productions, doing all of their reductions, then it will likely
not perform well.

I suspect even a LALR(1) compiled grammar will have an issue
there; I'm not aware that Yacc-like parser generators do any
collapsing of cascaded reductions which do nothing but { $$ = $1; }.

Here is a concrete example. Attached below is over 200 lines of
y.output from a Yacc implementation (Bison). In that y.output
we can trace the actions of the state machine and see what happens
when a the simple token LIT is reduced to an expression.

In State 0, when the parser sees a LIT, it will shift the token
and go to State 1. There, a reduction takes place via Rule 9,
and so we have a parser_expression on the top of the stack.

Then, back to State 0, where now have a primary_expression,
and so we go to state 8.

State 8, similarly to State 1, performs a reduction:
primary expression to mult_expression.

Back to State 0, where we have to dispatch a similar
thing again for multi_expression, then additive_expression,
then expression and finally top.

Thanks for raising this issue; I suspect it's a real performance
problem. Not only is a factored-out grammar hard to read but the
performance is bad due to useless back and forth transitions between
superfluous states to propagate reductions.

Above-referenced material follows:

Grammar

0 $accept: top $end

1 top: expression ';'
2 | expression ';' expression

3 expression: additive_expression

4 additive_expression: mult_expression
5 | mult_expression '+' additive_expression

6 mult_expression: primary_expression
7 | mult_expression '*' primary_expression

8 primary_expression: '(' expression ')'
9 | LIT
10 | IDENT

Terminals, with rules where they appear

$end (0) 0
'(' (40) 8
')' (41) 8
'*' (42) 7
'+' (43) 5
';' (59) 1 2
error (256)
LIT (258) 9
IDENT (259) 10

Nonterminals, with rules where they appear

$accept (10)
on left: 0
top (11)
on left: 1 2, on right: 0
expression (12)
on left: 3, on right: 1 2 8
additive_expression (13)
on left: 4 5, on right: 3 5
mult_expression (14)
on left: 6 7, on right: 4 5 7
primary_expression (15)
on left: 8 9 10, on right: 6 7

state 0

0 $accept: . top $end

LIT shift, and go to state 1
IDENT shift, and go to state 2
'(' shift, and go to state 3

top go to state 4
expression go to state 5
additive_expression go to state 6
mult_expression go to state 7
primary_expression go to state 8

state 1

9 primary_expression: LIT .

$default reduce using rule 9 (primary_expression)

state 2

10 primary_expression: IDENT .

$default reduce using rule 10 (primary_expression)

state 3

8 primary_expression: '(' . expression ')'

LIT shift, and go to state 1
IDENT shift, and go to state 2
'(' shift, and go to state 3

expression go to state 9
additive_expression go to state 6
mult_expression go to state 7
primary_expression go to state 8

state 4

0 $accept: top . $end

$end shift, and go to state 10

state 5

1 top: expression . ';'
2 | expression . ';' expression

';' shift, and go to state 11

state 6

3 expression: additive_expression .

$default reduce using rule 3 (expression)

state 7

4 additive_expression: mult_expression .
5 | mult_expression . '+' additive_expression
7 mult_expression: mult_expression . '*' primary_expression

'+' shift, and go to state 12
'*' shift, and go to state 13

$default reduce using rule 4 (additive_expression)

state 8

6 mult_expression: primary_expression .

$default reduce using rule 6 (mult_expression)

state 9

8 primary_expression: '(' expression . ')'

')' shift, and go to state 14

state 10

0 $accept: top $end .

$default accept

state 11

1 top: expression ';' .
2 | expression ';' . expression

LIT shift, and go to state 1
IDENT shift, and go to state 2
'(' shift, and go to state 3

$default reduce using rule 1 (top)

expression go to state 15
additive_expression go to state 6
mult_expression go to state 7
primary_expression go to state 8

state 12

5 additive_expression: mult_expression '+' . additive_expression

LIT shift, and go to state 1
IDENT shift, and go to state 2
'(' shift, and go to state 3

additive_expression go to state 16
mult_expression go to state 7
primary_expression go to state 8

state 13

7 mult_expression: mult_expression '*' . primary_expression

LIT shift, and go to state 1
IDENT shift, and go to state 2
'(' shift, and go to state 3

primary_expression go to state 17

state 14

8 primary_expression: '(' expression ')' .

$default reduce using rule 8 (primary_expression)

state 15

2 top: expression ';' expression .

$default reduce using rule 2 (top)

state 16

5 additive_expression: mult_expression '+' additive_expression .

$default reduce using rule 5 (additive_expression)

state 17

7 mult_expression: mult_expression '*' primary_expression .

$default reduce using rule 7 (mult_expression)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Deep expression chain performance

<23-04-016@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=789&group=comp.compilers#789

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Deep expression chain performance
Date: Thu, 27 Apr 2023 06:17:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-04-016@comp.compilers>
References: <23-04-012@comp.compilers> <23-04-015@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="11079"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, performance
Posted-Date: 27 Apr 2023 21:56:22 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Thu, 27 Apr 2023 06:17 UTC

Kaz Kylheku <864-117-4973@kylheku.com> writes:
>ISO C specifies its grammar in an obtusely verbose way, similar to the
>above. The reason is that it makes the grammar unambiguous, meaning
>that it does not require any annotation or external remarks about
>ambiguities having to be resolved by a certain associativity or
>precedence.

They were not that keen on eliminating ambiguity. The classical
ambiguous grammar rule is there:

|selection-statement:
| if ( expression ) statement
| if ( expression ) statement else statement
| switch ( expression ) statement

They resolve that ambiguity in prose:

|An else is associated with the lexically nearest preceding if that is
|allowed by the syntax.

C is not alone in this approach.

As for why they chose to deal with expressions through BNF, my guess
is that it allowed them to split the expression syntax into different
sections, each with their own piece of the grammar, rather than having
a section on the syntax that presents an ambuguous BNF and resolves
the ambiguity in prose (and by necessity has to cover the whole
expression syntax), and then having sections on the semantics of the
various expression sub-syntaxes.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor