Yutaka HARA

Posted on Feb 6, 2021

How CRuby decides an `if` is a modifier

#ruby #parser

Ruby has two styles to write if.

if foo then bar end
foo if bar

This reads natural to human, but not to machines. For example, can you tell if this code is valid or not?

p if 1 then 2 else 3 end

The answer is:

$ ruby -e 'p if 1 then 2 else 3 end'
-e:1: syntax error, unexpected `then', expecting end-of-input

Because the if here is recognized as "modifier if", not "keyword if". So how does Ruby decides the type of if?

parse.y

The answer should be in the parse.y, which defines Ruby's grammer.

In the parse.y, you see keyword_if and modifier_if. It means the type of if is decided by the lexer, not the parser.

lex.c.blt

By grepping modifier_if, you will find lex.c.blt has a table of keywords in the function rb_reserved_word.

#line 31 "defs/keywords"
      {gperf_offsetof(stringpool, 33), {keyword_if, modifier_if}, EXPR_VALUE},

parse.y

The lexer starts from yylex. It calls parser_yylex, which handles the symbols like +, -, etc. If the character is not a symbol, parse_ident is called.

parse_ident checks if a keyword begins from the current position with rb_reserved_word. The returned kw is a member of the table we've seen in lex.c.blt.

    /* See if it is a reserved word.  */
    kw = rb_reserved_word(tok(p), toklen(p));

In the case of if keyword, kw->id[0] corresponds to keyword_if and kw->id[1] corresponds to modifier_if.

Actually id has two values to distinguish keywords and modifiers. According to lex.c.blt, Ruby has five modifiers.

x if y
x unless y
x while y
x until y
x rescue y

When an `if` is a modifier

This is the condition that distinguishes keyword_if and modifier_if. In short, an if is a keyword if the lexer state is EXPR_BEG; otherwise, it is a modifier.

            if (IS_lex_state_for(state, (EXPR_BEG | EXPR_LABELED)))
                return kw->id[0];
            else {
                if (kw->id[0] != kw->id[1])
                    SET_LEX_STATE(EXPR_BEG | EXPR_LABEL);
                return kw->id[1];
            }

The lexer state

Among the states of the lexer, EXPR_BEG, EXPR_END and EXPR_ARG are the most important. They decides operators like +, - is unary or binary. For example:

1 - 2: This is binary minus because the state is EXPR_END after the 1.
foo(-1): This is unary minus because the state is EXPR_BEG after the (.

EXPR_ARG is a bit tricky; On this state, the meaning of - changes by the space after it.

foo - 1: binary minus
foo -1: unary minus

What is interesting is that this rule is not so difficult for humans. The former "looks like" binary and the latter "looks like" unary. So you will actually never be bothered by this, unless you are implementing the parser.

keyword if and modifier if

Now you can tell an if is a keyword or modifier by checking the lexer state.

foo() if ...: This is modifier_if because the state is EXPR_END after the ).
foo(if ...): This is keyword_if because the state is EXPR_BEG after the (.
foo if ...: This is modifier_if because the state is EXPR_ARG after the before if.

Why this matters to me

I think most Rubyists does not care about corner cases like this; However I needed to figure out this because I'm making my original programming language Shiika which has Ruby-like syntax.

As you've seen, parsing Ruby-like syntax is not easy, especially parsing method calls without parentheses. I'm happy if this entry helps someone who want to make a Rubyish language.

DEV Community

How CRuby decides an `if` is a modifier

parse.y

lex.c.blt

parse.y

When an `if` is a modifier

The lexer state

keyword if and modifier if

Why this matters to me

Top comments (0)

Read next

The Last Saree: Connoisseurship in the Age of AI

💡Only 20% of Developers are Happy at Work?

How a Viral Tweet Landed Me Multiple Job Interviews

Experimenting with "following" tab — will give more details once more fully launched. Folks have been asking for something like this for a while.

parse.y

lex.c.blt

parse.y

When an if is a modifier

The lexer state

keyword if and modifier if

Why this matters to me

Read next

The Last Saree: Connoisseurship in the Age of AI

💡Only 20% of Developers are Happy at Work?

How a Viral Tweet Landed Me Multiple Job Interviews

Experimenting with "following" tab — will give more details once more fully launched. Folks have been asking for something like this for a while.

When an `if` is a modifier