Ever since I started programming, I have tried to come up with interesting projects for myself, even if they are purely for educational purposes. ...
For further actions, you may consider blocking this person and/or reporting abuse
Awesome! Coincidently I have begun a similar personal project - a CYK parser - just five days ago. It can currently parse nondeterministic context free languages which includes most programming languages but I have encountered the exact same problem of operator precedence and operator associativity, which is one point I'm currently trying to resolve.
My plan is to solve these issues by performing rebalancing and sift down operations directly on the AST.
I have also made the project available on GitHub if you're interested.
Awesome. CYK parsing is a term I have run across, but I am not familiar with how they work. You should write an article on it!
As I have done some research on operator precedence parsing, I have found another way which is absolutely trivial to implement:
Just put a (( at the beginning of the equation, a )) at the end and replace every + with a ))+(( and every * with a )*(. By doing this your expression is correctly parenthesized without any complicated algorithms. (Found it on Wikipedia)
I have exams at the moment but if I have some time, I will do that.
What is the difference between pratt parsing and Antlr's method?
Are they the same ?
About Antlr's method can look this:
《The Definitive ANTLR 4 Reference》-> ANTLR Reference -> Removing Direct Left-Recursion -> Left-Recursive Rule Transformations
I think they seem to be the same,I want to know your opinion. thanks!
I forget which class of Parser ANTLR4 generates but I thought it was LALR or something similar. I am not familiar enough with how ANTLR works internally to make any equivalency comparisons with Pratt parsing.
One of the things I value is being able to maintain code directly (instead of relying on a code-generator, such as ANTLR, or YACC/BISON, etc.), so that is why I gravitated to Pratt parsing. Some folks in this field would disagree with me be countering that maintaining a grammar is easier than maintaining parsing code. However, when one does not rely on too many layers of abstraction, it comes with the benefit of flexibility.
I'm also the type of person who does not like too much "magic", and having a parser generator generate code for me that works without my understanding it does not settle well with me. Eventually I would like to study how parser generators work, but for now, Pratt parsing struck that balance of simplicity that I craved.
Excellent article. I first came across this method in the '70s with Steven Pemberton and Martin Daniel's excellent pascal compiler. homepages.cwi.nl/~steven/pascal/
How would I manage to get nested parentheses to work
Good question! First, we define the BP of ")" to be a low number that will always be guaranteed to stop the current parse-run. For example:
Next, we define the NUD of "(" to read the next expression, and then eat the closing ")":
For a fully working example, I have implemented this in my JavaScript calculator expression evaluator:
github.com/jrop/pratt-calculator/b...
That's pretty nice! I actually got around with just setting a var pLayer to 0. Incrementing it for every open parenthese and decrementing it for every closing parenthese. Then just applied a bias of pLayer*3 to the operator precedence. Would this run in faster time?
It's hard to say without running some performance benchmarks.
How would I add keywords, postfix, prefix, functions and all that in this to create a fully functional programming language?
Briefly:
NUD('func') = parseFunction()
, whereparseFunction()
, reads afunc
keyword, and identifier, and then a parameter list. Later when you are parsing the function body, you invoke a statement parser, and the statement parser optionally calls back into the expression parser (Pratt in this case).I'm confused as how I would:
A. Go about separating the lexer from tokenizing ++ instead of + and then another +.
B. Add mixfix and prefix operators as well.
A. In your lexer, you if you come across a "+", peek at the next char and check if it is a "+": if so, then consume it and emit "++" as a token, otherwise, just "+"
B. Implement both LED (infix) and NUD (prefix) for the same operator; your implementation will define the behavior of each to get the overall mixfix behavior!
So cool!
I managed to make a parser without operation order but with the four basic operators. (Without googling :))