DEV Community

Pratt Parsing

Jonathan Apodaca on August 11, 2017

Ever since I started programming, I have tried to come up with interesting projects for myself, even if they are purely for educational purposes. ...
Collapse
 
palle profile image
Palle • Edited

Awesome! Coincidently I have begun a similar personal project - a CYK parser - just five days ago. It can currently parse nondeterministic context free languages which includes most programming languages but I have encountered the exact same problem of operator precedence and operator associativity, which is one point I'm currently trying to resolve.

My plan is to solve these issues by performing rebalancing and sift down operations directly on the AST.

I have also made the project available on GitHub if you're interested.

Collapse
 
jrop profile image
Jonathan Apodaca

Awesome. CYK parsing is a term I have run across, but I am not familiar with how they work. You should write an article on it!

Collapse
 
palle profile image
Palle • Edited

As I have done some research on operator precedence parsing, I have found another way which is absolutely trivial to implement:

Just put a (( at the beginning of the equation, a )) at the end and replace every + with a ))+(( and every * with a )*(. By doing this your expression is correctly parenthesized without any complicated algorithms. (Found it on Wikipedia)

Collapse
 
palle profile image
Palle

I have exams at the moment but if I have some time, I will do that.

Collapse
 
kirito41dd profile image
kirito

What is the difference between pratt parsing and Antlr's method?
Are they the same ?

About Antlr's method can look this:
The Definitive ANTLR 4 Reference》-> ANTLR Reference -> Removing Direct Left-Recursion -> Left-Recursive Rule Transformations

I think they seem to be the same,I want to know your opinion. thanks!

Collapse
 
jrop profile image
Jonathan Apodaca

I forget which class of Parser ANTLR4 generates but I thought it was LALR or something similar. I am not familiar enough with how ANTLR works internally to make any equivalency comparisons with Pratt parsing.

One of the things I value is being able to maintain code directly (instead of relying on a code-generator, such as ANTLR, or YACC/BISON, etc.), so that is why I gravitated to Pratt parsing. Some folks in this field would disagree with me be countering that maintaining a grammar is easier than maintaining parsing code. However, when one does not rely on too many layers of abstraction, it comes with the benefit of flexibility.

I'm also the type of person who does not like too much "magic", and having a parser generator generate code for me that works without my understanding it does not settle well with me. Eventually I would like to study how parser generators work, but for now, Pratt parsing struck that balance of simplicity that I craved.

Collapse
 
andythomason profile image
Andy Thomson

Excellent article. I first came across this method in the '70s with Steven Pemberton and Martin Daniel's excellent pascal compiler. homepages.cwi.nl/~steven/pascal/

Collapse
 
kryptocrash profile image
KryptoCrash

How would I manage to get nested parentheses to work

Collapse
 
jrop profile image
Jonathan Apodaca • Edited

Good question! First, we define the BP of ")" to be a low number that will always be guaranteed to stop the current parse-run. For example:

bp(")") = 0

Next, we define the NUD of "(" to read the next expression, and then eat the closing ")":

nud("(") => {
  const e = expr(0);
  lexer.expect(")");
  return e;
}

For a fully working example, I have implemented this in my JavaScript calculator expression evaluator:

github.com/jrop/pratt-calculator/b...

Collapse
 
kryptocrash profile image
KryptoCrash

That's pretty nice! I actually got around with just setting a var pLayer to 0. Incrementing it for every open parenthese and decrementing it for every closing parenthese. Then just applied a bias of pLayer*3 to the operator precedence. Would this run in faster time?

Thread Thread
 
jrop profile image
Jonathan Apodaca

It's hard to say without running some performance benchmarks.

Collapse
 
kryptocrash profile image
KryptoCrash

How would I add keywords, postfix, prefix, functions and all that in this to create a fully functional programming language?

Collapse
 
jrop profile image
Jonathan Apodaca

Briefly:

  • keywords - recognition of keywords happens at the lexing stage
  • postfix - postfix operators are just operators in the "LED" context that consume no right expression
  • functions - you can break out of the Pratt parser at any moment to a traditional recursive-descent parser. For example, you could define NUD('func') = parseFunction(), where parseFunction(), reads a func keyword, and identifier, and then a parameter list. Later when you are parsing the function body, you invoke a statement parser, and the statement parser optionally calls back into the expression parser (Pratt in this case).
Collapse
 
kryptocrash profile image
KryptoCrash • Edited

I'm confused as how I would:
A. Go about separating the lexer from tokenizing ++ instead of + and then another +.

B. Add mixfix and prefix operators as well.

Thread Thread
 
jrop profile image
Jonathan Apodaca

A. In your lexer, you if you come across a "+", peek at the next char and check if it is a "+": if so, then consume it and emit "++" as a token, otherwise, just "+"
B. Implement both LED (infix) and NUD (prefix) for the same operator; your implementation will define the behavior of each to get the overall mixfix behavior!

Collapse
 
rishitkhandelwal profile image
Rishit Khandelwal • Edited

So cool!
I managed to make a parser without operation order but with the four basic operators. (Without googling :))