Table of contents
- Introduction
- Building a basic Scanner to identify spaces
- Lexical analysis
- Token type
- Token Data class
- Code for the basic Scanner
- Token Data class
- Starting and stopping
- Testing
- Full code
- Next post
- Resources
My app on the Google play store
Introduction
- So basically I have my app (Not to brag but I have two active users now) which is a mobile moderation app for Twitch. I want to add extra functionality in the chat section. Which is a bit of a problem because we have to quite literally re-build the twitch chat functionality from scratch. Initially this might not seem like a challenge (how foolish I was). Because if you check out the functionality of the Twitch chat feature, you can see how fancy it gets with all the commands and pop ups and quickly realize that its basically its own DSL programming language. After trying to recreate its functionality with a unholy amount of regex and if statements. I have decided to just say
F*** it, lets build our own language!!!
. With the aid of the book Crafting Interpreters by Robert Nystrom we shall attempt to build just that
Building a basic Scanner to identify spaces
- if you want the actual details of what we are doing check out Crafting Interpreters scanner chapter but long story short we want our scanner to be able to take a String identify all the spaces, create a
Token
and add that Token to a list (This list will get passed to a parser in later blog posts) - For the actual finished Scanner we want it to be able to identify things like,
@testUsername
\modCommand
and have the UI act accordingly but for right now lets just get it to identify empty spaces and create Tokens for it
Lexical analysis
- So the first step is to do a little bit of
lexical analysis
, which, as the book describes is this:scan through the list of characters and group them together into the smallest sequences that still represent something. Each of these blobs of characters is called a lexeme
- We can then take those
lexeme
(which is just blank spaces for us) combined them with extra data to create ourToken
, which is what we need to pass into a parser. Now aToken
is going to consist of 3 things:
1) Token type(a Enum we create)
2) Literal value(the empty space character)
3) Location information(index where it is found)
Token type
- As stated previously it is just a Enum:
enum class TokenType {
// just empty space characters
EmptySpace
}
Token Data class
- Since the Token class is really just meant to hold data about the character we have identified, it really is a great choice to use a data class:
data class Token(
val type: TokenType,
val lexeme: String,
val startIndex:Int
)
Code for the basic Scanner
- To start our scanner is going to contain 4 variables:
class Scanner{
private var source:String = ""
private val tokens = mutableListOf<Token>()
private var start = 0
private var current = 0
}
-
source
: string to scan,tokens
: list of tokens found ,start
: where we are starting ,current
: where scanner currently is
Moving through the scanner
- to allow our Scanner to move through the string we are going to create a simple function called
advance()
:
private fun advance():Char{
return this.source[current++]
}
- This function will return the current Char our scanner is on and increase the
current
variable by one, which moves our scanner along as well
Adding Token to token list
- Now we need to create a function that will allow us to add a token to the token list:
private fun addToken(type:TokenType){
val text = source.subSequence(start,current).toString()
val token = Token(type,text,current)
tokens.add(token)
}
- basically we give it a TokenType, create and identify the space character(text), create the token and add it to the list
Scanning for tokens:
- Now we want to do some actual scanning and identify some token, which is done with:
private fun scanToken(){
val char = advance()
when(char){
' ' ->{addToken(TokenType.EmptySpace)}
}
}
- So
val char = advance()
will get the current character. Then its just a simplewhen(){}
statement to identify when a empty space character is found
Starting and stopping
- Now we need to be able to start scanning and tell it when to stop, we will do this with a while loop:
private fun scanTokens(){
while(!isAtEnd()){
start = current
//start scanning tokens here
scanToken()
}
}
private fun isAtEnd():Boolean{
return this.current >= source.length
}
- This might seem a little strange, but remember that the current is increased by +1 when advanced is called.
Testing
- To prove to those doubters(myself) that we have succesfully scanned the tokens we can run these tests:
//UNDER TEST
private val underTest = Scanner()
@Test
fun testing_clear_chat_parsing_clear_chat_command() {
/* Given */
val sourceStringWithSevenSpaces = "It do be like that sometimes another one"
val sourceStringWithTwoSpaces = "It do "
/* When */
scannerUnderTest.setSource(sourceStringWithTwoSpaces)
val actualAmountOfTokens = underTest.getTokenList().size
/* Then */
Assert.assertEquals(2,actualAmountOfTokens)
}
Full code:
enum class TokenType {
//a @username word
MENTION,
// everything that is NOT a @username word
WORD,
// just empty space characters
EmptySpace
}
class Scanner{
private var source:String = ""
private val tokens = mutableListOf<Token>()
private var start = 0
private var current = 0
fun setSource(source:String){
this.source = source
scanTokens()
}
private fun scanTokens(){
while(!isAtEnd()){
start = current
//start scanning tokens here
scanToken()
}
}
private fun isAtEnd():Boolean{
return this.current >= source.length
}
private fun advance():Char{
return this.source[current++]
}
private fun scanToken(){
val char = advance()
when(char){
' ' ->{addToken(TokenType.EmptySpace)}
}
}
private fun addToken(type:TokenType){
val text = source.subSequence(start,current).toString()
val token = Token(type,text,current)
tokens.add(token)
}
fun getTokenList():List<Token>{
return this.tokens
}
}
data class Token(
val type: TokenType,
val lexeme: String,
val startIndex:Int
)
Next post
- The next post we will identify lexemes of
@someUsername
and/someCommand
. Which is really what we want
Resources
- Crafting Interpreters scanner chapter by Robert Nystrom
Conclusion
- Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.
Top comments (0)