DEV Community

Pratt Parsing

Jonathan Apodaca on August 11, 2017

Ever since I started programming, I have tried to come up with interesting projects for myself, even if they are purely for educational purposes. ...

Read full post

Palle • Aug 12 '17 • Edited

Awesome! Coincidently I have begun a similar personal project - a CYK parser - just five days ago. It can currently parse nondeterministic context free languages which includes most programming languages but I have encountered the exact same problem of operator precedence and operator associativity, which is one point I'm currently trying to resolve.

My plan is to solve these issues by performing rebalancing and sift down operations directly on the AST.

I have also made the project available on GitHub if you're interested.

Jonathan Apodaca • Aug 12 '17

Awesome. CYK parsing is a term I have run across, but I am not familiar with how they work. You should write an article on it!

Palle • Aug 14 '17 • Edited

As I have done some research on operator precedence parsing, I have found another way which is absolutely trivial to implement:

Just put a (( at the beginning of the equation, a )) at the end and replace every + with a ))+(( and every * with a )*(. By doing this your expression is correctly parenthesized without any complicated algorithms. (Found it on Wikipedia)

Palle • Aug 12 '17

I have exams at the moment but if I have some time, I will do that.

kirito • Aug 9 '21

What is the difference between pratt parsing and Antlr's method?
Are they the same ？

About Antlr's method can look this:
《The Definitive ANTLR 4 Reference》-> ANTLR Reference -> Removing Direct Left-Recursion -> Left-Recursive Rule Transformations

I think they seem to be the same，I want to know your opinion. thanks!

Jonathan Apodaca • Aug 9 '21

I forget which class of Parser ANTLR4 generates but I thought it was LALR or something similar. I am not familiar enough with how ANTLR works internally to make any equivalency comparisons with Pratt parsing.

One of the things I value is being able to maintain code directly (instead of relying on a code-generator, such as ANTLR, or YACC/BISON, etc.), so that is why I gravitated to Pratt parsing. Some folks in this field would disagree with me be countering that maintaining a grammar is easier than maintaining parsing code. However, when one does not rely on too many layers of abstraction, it comes with the benefit of flexibility.

I'm also the type of person who does not like too much "magic", and having a parser generator generate code for me that works without my understanding it does not settle well with me. Eventually I would like to study how parser generators work, but for now, Pratt parsing struck that balance of simplicity that I craved.

Andy Thomson • Apr 1 '20

Excellent article. I first came across this method in the '70s with Steven Pemberton and Martin Daniel's excellent pascal compiler. homepages.cwi.nl/~steven/pascal/

KryptoCrash • Apr 8 '19

How would I manage to get nested parentheses to work

Jonathan Apodaca • Apr 13 '19 • Edited

Good question! First, we define the BP of ")" to be a low number that will always be guaranteed to stop the current parse-run. For example:

bp(")") = 0

Next, we define the NUD of "(" to read the next expression, and then eat the closing ")":

nud("(") => {
  const e = expr(0);
  lexer.expect(")");
  return e;
}

For a fully working example, I have implemented this in my JavaScript calculator expression evaluator:

github.com/jrop/pratt-calculator/b...

KryptoCrash • Apr 13 '19

That's pretty nice! I actually got around with just setting a var pLayer to 0. Incrementing it for every open parenthese and decrementing it for every closing parenthese. Then just applied a bias of pLayer*3 to the operator precedence. Would this run in faster time?

Jonathan Apodaca • Apr 15 '19

It's hard to say without running some performance benchmarks.

KryptoCrash • Apr 29 '19

How would I add keywords, postfix, prefix, functions and all that in this to create a fully functional programming language?

Jonathan Apodaca • Apr 29 '19

Briefly:

keywords - recognition of keywords happens at the lexing stage
postfix - postfix operators are just operators in the "LED" context that consume no right expression
functions - you can break out of the Pratt parser at any moment to a traditional recursive-descent parser. For example, you could define NUD('func') = parseFunction(), where parseFunction(), reads a func keyword, and identifier, and then a parameter list. Later when you are parsing the function body, you invoke a statement parser, and the statement parser optionally calls back into the expression parser (Pratt in this case).

KryptoCrash • Apr 29 '19 • Edited

I'm confused as how I would:
A. Go about separating the lexer from tokenizing ++ instead of + and then another +.

B. Add mixfix and prefix operators as well.

Jonathan Apodaca • May 1 '19

A. In your lexer, you if you come across a "+", peek at the next char and check if it is a "+": if so, then consume it and emit "++" as a token, otherwise, just "+"
B. Implement both LED (infix) and NUD (prefix) for the same operator; your implementation will define the behavior of each to get the overall mixfix behavior!

Rishit Khandelwal • Dec 24 '20 • Edited

So cool!
I managed to make a parser without operation order but with the four basic operators. (Without googling :))