Hello everyone, in the last article I introduced the Turtle Graphics Android App project with implementation details and resources about the scripting language, editor, generating documentation…etc, after I published the app and got more than 2000 downloads a few times and good ratings and feedback, I decided to add support for code formatting and in this article, I will talk in detail how simple code formatter work and how I implemented in Turtle Graphics app
As programmers Code formatter are an essential tool in our day-to-day jobs, They make it more easier to read the code if it is formatted, but did you ask yourself how it works?
Before talking about Code formatter, lets first talk about how Compilers represent your code from text to data structure to do the process on it such as type checking.
Lets start our story from your file that contains a simple hello world example
fun main() {
print("Hello, World!")
}
The first step is to read this text file and convert it into a list of tokens, A token is a class that represents keyword, number, bracket, string, …etc with this position in the source code for example
data class Token (
val kind : TokenKind,
val literal : String,
val line : Int,
)
We can also saved the file name, column start and end so when we want to report error we can provide useful info about the position for example
Error in File Main Line 10: Missing semicolon :D
This step called scanner, lexer or tokenizer and at the end we will end up with List of tokens for example
{ FUN_KEYWORD, "fun", 1 }
{ IDENTIFIER, "main", 1 }
{ LEFT_PAREN, "(", 1 }
{ RRIGHT_PAREN, ")", 1 }
{ LEFT_BRACE, "{", 1 }
{ IDENTIFIER, "print", 2 }
{ LEFT_PAREN, "(", 2 }
{ STRING, "Hello, World!", 2 }
{ RRIGHT_PAREN, ")", 2 }
{ RIGHT_BRACE, "}", 3 }
The result is list of tokens
val tokens : List<Token> = tokenizer(input)
Note that in this step we can check for some errors such as un terminated string or char, un supported symbols …etc
After this step, you will forget your text file and deal with this list of tokens, and now we should convert some tokens into nodes depending on our language grammar for when we saw FUN_KEYWORD that means we will build a function declaration node and we expect name, paren, parameters …etc
In this step, we need a data structure to represent the program in a way we can traverse and validate it later and it is called Abstract Syntax Tree (AST), each node in AST represent statement such as If, While, Function declaration, var declaration …etc or expressions such as assignments, unary …etc, each node store required information to use them later in the next steps for example
Function Declaration
data class Function (
var name : String,
var arguments : List<Argument>,
var body : List<Statement>
)
Variable Declaration
data class Var (
var name : String
var value : Expression
)
This step is called parsing and we will end up with an AST object that we can use latter to traverse all nodes.
var astNode = parse(tokens)
If the language statically types such as Java, C, Go …etc we will go to the Type Checker step, the goal for this step is to check that the user use type correctly for example, if the use declare a variable with int type it should store only integers on it, the if condition must be a boolean type or an integer in a language like C …etc
After this step, we will end up with the same AST node but now we know that it is valid and we can now compile it to any target or evalute it, But also we can do the formatting, static analysis, optimization, check code style …etc
For example suppose that we want all developers to declare variables without using _ inside the name, to check that we will traverse our AST node to find all Var nodes and check them
fun checkVarDeclaration(node : Var) {
if (node.name.contains("_") {
reportError("Ops your variable name ${node.name} contains _")
}
}
But now we need to format it, so how to do that? It's the same we traverse our AST and for each node, we will write it back to text but formatted for example
fun formatVarDeclaration(node : Var) : String {
var builder = StringBuilder()
builder.append(indentation)
builder.append("var ")
builder.append(node.name)
builder.append(" = ")
builder.append(formatValue(node.value))
builder.append("\n")
return builder.toString()
}
In this simple method, we rewrite the node to string but with correct indentations and add a new line after it so now 2 variables are declared in the same line, the value also is formatted using another function you can use Visitor design pattern to make it easy to handle all nodes.
At the end of this step, we end up with a string that represents the same input file but formatted and then we write it back to the file.
This is the basic implementation of code formatter, a real production code formatter must handle more cases for example what if the code is not valid?, should i format only valid code? should we read the whole program every time we want to format or compile the code?
Now back to Turtle graphics, In this project i already done all the required steps before and has a ready AST, so i just rewrite it with code like you saw above ^_^ i read it from the UI format it and write it back to UI in my case
If you are interested and want to read more I suggest
- Read at last one Compiler book such as Craftring interpreters
- Read about Language Server Portcol (LSP)
- Watch Typescript Compiler explained by the Author Anders Hejlsberg
- Think if you have Your program as AST what else you can do with it
I hope you enjoyed my article and you can find me on
Enjoy Programming 😋.
Top comments (0)