Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"The medium is the message." -- Marshall McLuhan


devel / comp.compilers / Figuring out grammars from examples

SubjectAuthor
* Figuring out grammars from examplesJohn R Levine
`* Re: Figuring out grammars from examplesDerek
 `- Re: Figuring out grammars from examplesDerek

1
Figuring out grammars from examples

<24-04-001@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=892&group=comp.compilers#892

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: johnl@taugh.com (John R Levine)
Newsgroups: comp.compilers
Subject: Figuring out grammars from examples
Date: Fri, 12 Apr 2024 18:39:51 -0400
Organization: Compilers Central
Sender: johnl%iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <24-04-001@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="68172"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 12 Apr 2024 18:48:31 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: John R Levine - Fri, 12 Apr 2024 22:39 UTC

There's been a surprising amount of work on taking language strings and
feeding them to a box that infers a grammar for them. One of the harder
bits is figuring out nesting constructs, which in the past has often
required hints.

In this paper, a Visibly Pushdown Grammar is large but more tractable
subset of context free grammars.

V-Star: Learning Visibly Pushdown Grammars from Program Inputs
Xiaodong Jia, Gang Tan

Accurate description of program inputs remains a critical challenge in the
field of programming languages. Active learning, as a well-established
field, achieves exact learning for regular languages. We offer an
innovative grammar inference tool, V-Star, based on the active learning of
visibly pushdown automata. V-Star deduces nesting structures of program
input languages from sample inputs, employing a novel inference mechanism
based on nested patterns. This mechanism identifies token boundaries and
converts languages such as XML documents into VPLs. We then adapted
Angluin's L-Star, an exact learning algorithm, for VPA learning, which
improves the precision of our tool. Our evaluation demonstrates that
V-Star effectively and efficiently learns a variety of practical grammars,
including S-Expressions, JSON, and XML, and outperforms other
state-of-the-art tools.

https://arxiv.org/abs/2404.04201v1

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

Re: Figuring out grammars from examples

<24-04-002@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=893&group=comp.compilers#893

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: derek@shape-of-code.com (Derek)
Newsgroups: comp.compilers
Subject: Re: Figuring out grammars from examples
Date: Sat, 13 Apr 2024 12:06:49 +0100
Organization: Compilers Central
Sender: johnl%iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <24-04-002@comp.compilers>
References: <24-04-001@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="16726"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, tools, comment
Posted-Date: 14 Apr 2024 12:51:28 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <24-04-001@comp.compilers>
 by: Derek - Sat, 13 Apr 2024 11:06 UTC

John,

> including S-Expressions, JSON, and XML, and outperforms other
> state-of-the-art tools.

They did not compare their tool against LLMs, which these days
outperform non-humans on language related tasks.
[I would like to see some actual data. In my experience, LLMs are
impressive, confident, and frequently wrong. -John]

Re: Figuring out grammars from examples

<24-04-004@comp.compilers>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=895&group=comp.compilers#895

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: derek@shape-of-code.com (Derek)
Newsgroups: comp.compilers
Subject: Re: Figuring out grammars from examples
Date: Mon, 15 Apr 2024 02:17:04 +0100
Organization: Compilers Central
Sender: johnl%iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <24-04-004@comp.compilers>
References: <24-04-001@comp.compilers> <24-04-002@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="10024"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 14 Apr 2024 22:16:37 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <24-04-002@comp.compilers>
 by: Derek - Mon, 15 Apr 2024 01:17 UTC

John,

> [I would like to see some actual data. In my experience, LLMs are
> impressive, confident, and frequently wrong. -John]

LLM's performance on fact recall is poor.

It seems to be much better than other tools when dealing with
grammars. I could not find the example I was looking for, but here are
two others: https://arxiv.org/abs/2305.19234
https://szopa.medium.com/teaching-chatgpt-to-speak-my-sons-invented-language-9d109c0a0f05

My own experience using local (i.e., very small)
models
https://shape-of-code.com/2024/02/25/extracting-named-entities-from-a-change-log-using-an-llm/

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor