Konstantin Grechishchev

Posted on Aug 13, 2022

Writing a new programming language. Part III: Type System

#rust #tutorial #programming #grammar

Hello everyone!

Welcome to the third part of the new LR-language tutorial series. Here is the link for Part II. Last time we've successfully implemented the first version of the LR-language interpreter. We were able to write programs with variables of the integer type and use them in the arithmetical expressions, assigning the results to other variables.

The state of the code by the end of this tutorial could be found in the part_3 branch of the GitHub repository. The changes made today could be seen in the e5e04bc commit

Goal for today

The goal for today is to extend the language to support String and float types as well! We will also improve runtime error reporting a bit. We would like to enhance LR-language to be able to run the following program:

{
    let hello: String = "Hello";
    let world: String = "World";
    let hello_word: String = hello + " " + world;

    let pi: Float = 3.14;
    let r: Int = 5;
    let square: Float = 2 * pi * r * r;
    let value: String = "The square of the circle with the r = " + r + " is " + square;
}

Representing values and types

Values

As a recall, here is our current representation of the expression in the AST:

pub enum Expr {
    Number(i32),
    Identifier(String),
    Op(Box<Expr>, Opcode, Box<Expr>),
}

The Number(i32) is the topic of today's discussion. It is produced every time we see a numeric literal in the LR-program code:

Num: i32 = {
    r"-?[0-9]+" => i32::from_str(<>).unwrap()
};
Term: Box<Expr> = {
    Num => Box::new(Expr::Number(<>)),
    Identifier => Box::new(Expr::Identifier(<>)),
    "(" <Expr> ")"
};

However, we would like to support more types now, so it would make sense to rename Number to Constant. The i32 type as a value would nigher work any longer, so we replace it with the following enum:

pub enum Value {
    Int(i32),
    Float(f32),
    String(String),
}

The updated section of the grammar looks like the following:

IntNum: i32 = {
    r"-?[0-9]+" => i32::from_str(<>).unwrap()
};

FloatNum: f32 = {
    r"-?[0-9]+\.[0-9]+" => f32::from_str(<>).unwrap()
};

StringLiteral: String = {
    r#""[^"]*""# => <>[1..<>.len() - 1].to_owned()
};

Term: Box<Expr> = {
    IntNum => Box::new(Expr::Constant(Value::Int(<>))),
    FloatNum => Box::new(Expr::Constant(Value::Float(<>))),
    StringLiteral => Box::new(Expr::Constant(Value::String(<>))),
    Identifier => Box::new(Expr::Identifier(<>)),
    "(" <Expr> ")"
};

We introduce 2 new types of literals: the number containing a dot (FloatNum) and any characters in double quotes (StringLiteral). The weird <>[1..<>.len() - 1] expression simply removes the first and last character of the literal as we would like to get the string literal inside double quotes, but without them.

Types

There are two ways to deal with variable types. The first option is to derive them based on the value assigned. The second is to explicitly specify the type when you declare a variable. The first one is often the variation of the second with the type specifier being optional, so to keep it simple, we will implement option 2 for now.

First, we need to define the Type enum:

pub enum Type {
    Int,
    Float,
    String,
}

and the grammar for it:

Type: Type = {
    "Int" => Type::Int,
    "String" => Type::String,
    "Float" => Type::Float,
}

Nothing special here.

As the next step, we make a slight modification to the Statement production rule

Statement: Statement = {
    "let" <Identifier> ":" <Type> "=" <Expr> ";" => Statement::new_definition(<>),
    <Identifier> "=" <Expr> ";" => Statement::new_assignment(<>)
};

While we keep the assignment rule the same, we make it mandatory to specify the type when declaring a variable. The <> operator will now pass 3 arguments to the new_definition method, so we have to modify it as well:

pub enum Statement {
    Assignment {
        identifier: String,
        expression: Box<Expr>,
    },
    Definition {
        identifier: String,
        expression: Box<Expr>,
        value_type: Type,
    },
}

impl Statement {
    pub fn new_assignment(identifier: String, expression: Box<Expr>) -> Self {
        Self::Assignment {
            identifier,
            expression,
        }
    }

    pub fn new_definition(identifier: String, value_type: Type, expression: Box<Expr>) -> Self {
        Self::Definition {
            identifier,
            value_type,
            expression,
        }
    }
}

The StatementBody struct cannot be reused any longer, so we just get rid of it.

Display traits

These are the only changes we had to make to add type system support to our grammar and syntax tree. Before we start making changes to the interpreter part, let's implement the Display for our AST structs. This would help to print them using {} in a human-readable form and allow us to create nice runtime error messages. Here is an example of the Display trait implementation for Expr enum:

impl Display for Expr {
    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
        match self {
            Expr::Constant(v) => write!(f, "{}", v),
            Expr::Identifier(id) => write!(f, "{}", id),
            Expr::Op(e1, op, e2) => write!(f, "({} {} {})", e1, op, e2),
        }
    }
}

The implementation of it for other types looks very similar, so I omit it here, but you can find them in the GitHub repository.

Update the interpreter

The grammar and AST are ready, however, our code does not yet compile, because we need to make changes to the Frame struct to store value type as well as implement the arithmetical expressions for the new enum.

Variable storage

We start with a trivial change to replace the type of local_variables field HashMap<String, i32> with local_variables: HashMap<String, Value>. All the methods of the Frame struct should now accept and return Value instead of i32 correspondingly.

We should also perform an extra check in the assign_value and define_variable methods: we don't want to allow assigning the value of, let's say, String type to the variable declared as Integer:

pub fn assign_value(&mut self, variable_name: &str, value: Value) -> Result<(), VariableError> {
    if let Some(variable) = self.local_variables.get_mut(variable_name) {
        if Type::from(variable.deref()) == Type::from(&value) {
            *variable = value;
            Ok(())
        } else {
            Err(VariableError::TypeMismatch(
                Type::from(&value),
                variable_name.to_owned(),
                Type::from(variable.deref()),
            ))
        }
    } else if let Some(parent) = self.parent.as_mut() {
        parent.assign_value(variable_name, value)
    } else {
        Err(VariableError::UndefinedVariable(variable_name.to_owned()))
    }
}

In order to represent this type of error, we introduce the new error enum VariableError and also move the UndefinedVariable error type to it:

#[derive(Error, Debug)]
pub enum VariableError {
    #[error("Variable {0} is not defined")]
    UndefinedVariable(String),
    #[error("Unable to assign the value of type {0} to variable '{1}' of type {2}")]
    TypeMismatch(Type, String, Type),
}

Please refer to the frame.rs to see the full source code of the updated Frame struct

Arithmetic operations

If we try to compile the code now, the rust compiler will return us an error explaining that it does not know how to apply the +, -, * and / between the instances of Value enum. Fortunately, all we have to do is to implement Add, Sub, Mul and Div traits for it!

The main problem to solve here is to figure out the result type of the operation. It is obvious that 5 + 5 is 10, but what should happen when you write 5 + "hello"? Shall it be an error or shall it return a string containing 5hello?

Let's establish the following rules:

The result of the Int division is Int, the % remainder operations in not supported for now.
For any operation involving float and int, the result type should be float and we would cast int to float before the operation.
Addition of anything to the string results in a new string. No other operations except + are supported for a string.

With this in mind, we can implement the Add trait for the Value:

impl Add for Value {
    type Output = Result<Value, OperationError>;

    fn add(self, other: Self) -> Self::Output {
        let value = match self {
            Value::Int(v1) => match other {
                Value::Int(v2) => Value::Int(v1 + v2),
                Value::Float(v2) => Value::Float(v1 as f32 + v2),
                Value::String(v2) => Value::String(v1.to_string() + v2.as_str()),
            },
            Value::Float(v1) => match other {
                Value::Int(v2) => Value::Float(v1 + v2 as f32),
                Value::Float(v2) => Value::Float(v1 + v2),
                Value::String(v2) => Value::String(v1.to_string() + v2.as_str()),
            },
            Value::String(v1) => match other {
                Value::Int(v2) => Value::String(v1 + v2.to_string().as_str()),
                Value::Float(v2) => Value::String(v1 + v2.to_string().as_str()),
                Value::String(v2) => Value::String(v1 + v2.as_str()),
            },
        };
        Ok(value)
    }
}

Note that despite add is never returning an error, the return type of it is still Result<Value, OperationError>. The reason for it is to keep it consistent with other traits implementations and avoid making changes to the method signature when we add new types in the future.

The error enum is having only one variant for now:

#[derive(Error, Debug)]
pub enum OperationError {
    #[error("Operation {0} {1} {2} is not defined")]
    IncompatibleTypes(Type, Opcode, Type),
}

We also define a helpful macro to produce the error in the operations which could fail:

macro_rules! error {
    ($type_1:ident, $op:ident, $type_2:ident) => {
        Err(OperationError::IncompatibleTypes(
            Type::$type_1,
            Opcode::$op,
            Type::$type_2,
        ))
    };
}

We can use it to implement e.g. a multiplication operation:

impl Mul for Value {
    type Output = Result<Value, OperationError>;

    fn mul(self, other: Self) -> Self::Output {
        match self {
            Value::Int(v1) => match other {
                Value::Int(v2) => Ok(Value::Int(v1 * v2)),
                Value::Float(v2) => Ok(Value::Float(v1 as f32 * v2)),
                Value::String(_) => error!(Int, Mul, String),
            },
            Value::Float(v1) => match other {
                Value::Int(v2) => Ok(Value::Float(v1 * v2 as f32)),
                Value::Float(v2) => Ok(Value::Float(v1 * v2)),
                Value::String(_) => error!(Float, Mul, String),
            },
            Value::String(_) => match other {
                Value::Int(_) => error!(String, Mul, Int),
                Value::Float(_) => error!(String, Mul, Float),
                Value::String(_) => error!(String, Mul, String),
            },
        }
    }
}

I omit implementations of Sub and Div traits as they are identical to the above. You can refer to the operations.rs file to have a look at them.

Putting everything together

The only part left is to make a set of "cosmetic" changes to the evalutate_expression method. Here is the updated code of it:

#[derive(Error, Debug)]
pub enum ExpressionError {
    #[error("Unable to evalutate expression {0}: {1}")]
    VariableError(String, VariableError),
    #[error("Unable to evalutate expression {0}: {1}")]
    OperationError(String, OperationError),
}

pub fn evalutate_expression(frame: &mut Frame, expr: &Expr) -> Result<Value, ExpressionError> {
    match expr {
        Expr::Constant(n) => Ok(n.clone()),
        Expr::Op(exp1, opcode, exp2) => {
            let result = match opcode {
                Opcode::Mul => {
                    evalutate_expression(frame, exp1)? * evalutate_expression(frame, exp2)?
                }
                Opcode::Div => {
                    evalutate_expression(frame, exp1)? / evalutate_expression(frame, exp2)?
                }
                Opcode::Add => {
                    evalutate_expression(frame, exp1)? + evalutate_expression(frame, exp2)?
                }
                Opcode::Sub => {
                    evalutate_expression(frame, exp1)? - evalutate_expression(frame, exp2)?
                }
            };
            result.map_err(|e| ExpressionError::OperationError(expr.to_string(), e))
        }
        Expr::Identifier(variable) => frame
            .variable_value(variable)
            .map_err(|e| ExpressionError::VariableError(expr.to_string(), e)),
    }
}

First, we introduce an ExpressionError enum to wrap new types of errors defined above. Note that we use the Display traits implemented for AST structs to form an error message. It allows us to return nice runtime errors to a user, for example:

Unable to evaluate expression ("hello" / 5): Operation String / Int is not defined"

Second, the variable_value method and arithmetic operation could now return an error, so we need to handle it and cast to ExpressionError.

Summary

That is it! We can now run the program mentioned above. Here is how the stack frame looks after the execution:

Frame {
    parent: None,
    local_variables: {
        "r": Int(
            5,
        ),
        "hello_word": String(
            "Hello World",
        ),
        "square": Float(
            157.0,
        ),
        "world": String(
            "World",
        ),
        "pi": Float(
            3.14,
        ),
        "value": String(
            "The square of the circle with the r = 5 is 157",
        ),
        "hello": String(
            "Hello",
        ),
    },
}

Amazing, isn't it?

The plan for part IV is to add support for the boolean type and to implement the if statements!

I'll publish it if this post gets 15 reactions!

Stay in touch!

DEV Community

Writing a new programming language. Part III: Type System

Goal for today

Representing values and types

Values

Types

Display traits

Update the interpreter

Variable storage

Arithmetic operations

Putting everything together

Summary

Top comments (0)

Read next

Top APIs Every Frontend Developer Should Use in 2025

Why Are So Many Programmers Drawn to Religion? A Curious Exploration

New AI Model Uses Document Screenshots to Revolutionize Search Across Text and Images

Understanding Color Gradients in Digital Design: A Comprehensive Guide