DEV Community

Adam Crockett πŸŒ€
Adam Crockett πŸŒ€

Posted on

Brace matching, it's harder than it sounds!

When writing a language with braces (I can see why whitespace sensitive languages are a thing now), one vital thing you will need to know is what thing is inside what brace. I am writing a sort of subset of JavaScript in the style of es5 (more like es 5.5 because some es6 features are just good!). I need to know how many braces are within a range of line no's, between L1 and L5 for example.

Given the sample:

const input = `
{
    {
     // Depth 2
    }
}
`;
Enter fullscreen mode Exit fullscreen mode

Initially I thought line by line we step through the input, when we encounter the token { L_BRACE we should increment a count by one, when we encounter a } R_BRACE we should decrement.
In the sample we should have the following count.

0
1
2
2
2
1
0
Enter fullscreen mode Exit fullscreen mode

That looks great doesn't it! Not only do we know block we are in we also know the depth, ideal, but an object literal is not a block scope, so how do we avoid that, what about destructuring? Okay so we could avoid this whole mess with a simple (yet incorrect assumption), what do blocked scoped things have in common?

function name(){
function (){
class Foo {
class {
for () {
try {
while {
do {
with {
if {
switch {
{

Enter fullscreen mode Exit fullscreen mode

Okay that's all I could remember this time In the morning β˜•. In answering my previous question Where should the brace sit? At the End Of Line (EOL), so I could check that the opening brace is preceded by a whitelist of words, or it is not an assignment AND the brace is at EOL. Great sorted yes? No, because some programmers like my brother care not for good formatting and could do this.

function anoying () { const imATroll = true;
    // I laugh at your feable algorithms
}
Enter fullscreen mode Exit fullscreen mode

My brother is such a pain, because now the brace is not at the end of the line, we can't use this algorithm. I'm stumped πŸ™„, I'm sure I will solve it and that's my goal for this week.

What other problems could I solve?
Well mismatched braces are easier to detect, nievely you could filter the lines containing the braces that fit the above unsolved criteria, then count the length, if the number is odd, we can say stop! This program is broken... But where?

Looping through line by line will not tell you where the program broke because we need the entire program to determine what looks a bit off.

This problem is so simple in theory, anyway wish me luck and add your suggestions in the comments!

Top comments (4)

Collapse
 
rgaiken profile image
rgaiken

Try using a stack for all bracket elements. So everything like ({[ and whatever else I'm forgetting. Store the bracket type along with the line number and any other useful debugging. If you get a closing bracket, check it against the top of the stack. If it completes the top of stack bracket, pop it. If it doesn't match, throw an error, using the debugging info from both the current line and the top of the stack. At end of file, the stack should be empty.

Collapse
 
adam_cyclones profile image
Adam Crockett πŸŒ€

Now this is the traditional approach, I admit I dismissed this as I need to scrape the content between braces and know the depth, as this is a scss + js like language, I will take a look at using a stake and additional scraping content. Thanks for the suggestion 😁

 
adam_cyclones profile image
Adam Crockett πŸŒ€

To give a little context to what I am doing and all the progress so far take a look at the 1st part of the series. This is a research project for me to learn about algorithms and language design, I am learning so much. I do normally take the same stance, use existing tools don't rewrite, don't overengineer etc, but like I say it's a bit of fun.

The language is a smashing together of CSS and JS and I plan to do the actual parsing in with something like PEG's and parser combinators. Architecturally speaking I have several challenges to face so I have gone down the Wasm route.

I have seen Yacc around and Marpa and of course PEG, I choose what I felt was making the most progress in the time I have. Anyway take a read you might have some more ideas, cheers Andy.

Collapse
 
adam_cyclones profile image
Adam Crockett πŸŒ€

I am writing a language as I say, so this you are correct but I am responsible for writing the tokenizer/lexer/parser. The joke about my brother is just me being silly.