🇵🇹 Versão em português desse texto aqui!
So I've started to study the Elixir language and decided to solve some coding exercise with it in order to better understand some concepts.
Since I liked my solution, I decided to make a post explaining the thought process behind it. Suggestions to further improve the code or to better follow Elixir's conventions are more than appreciated! (:
The Problem
I want to make a function that receives a string as input, supposedly a mathematical expression or some code snippet, and tells me whether all of it's opening parenthesis/brackets/braces have a correct closing match.
Some expressions as examples:
1 * 2 (3 + [4 / 5])
# should return true
9 / [(8 + 7] - 6)
# should return false
Make a plan
Before coding, I think it is a good ideia do solve the problem theoretically, and then try to implement it.
For this exercise, I believe using the stack idea is a very good approach. Basically, a stack is a set of items that have two operations: push and pop.
- The push operation will put a new item in the set
- The pop operation will remove the last pushed item
So the stack follows the "Last In, First Out" dynamic. When I put an item in the stack, this item will be the first one to get out when I start removing them.
How can we use a stack here? Well, my theoretical solution would be to iterate over all the characters in the input string, and if:
- It is an opening char (parens/brackets/braces), put it in a stack
- It is a closing char, check the last item in the stack: If it matches, we pop it and continue the recursion. If it doesn't, the input string is not valid
- If it is something else, ignore
If the whole input string is iterated and I never found an error, this means that all closers had a matching opener, but it doesn't necessarily means the string is valid. This input is an example:
1 + (2 * [3]
All the closers had a matching opener, but not all the openers were closed. This means that if we survive the iteration, we also have to check if our final stack is empty. If it is, all openers were closed, and the input was valid. If it is not, the input was not valid.
The Implementation
Good, we have a plan, so now we have to translate it into Elixir.
For starters, I'm creating a module Parens
with a function check
that will receive as input a string.
This function check
will have to iterate over the string, so we can use the String
module's split/2
function. So we have our first piece of code like this:
defmodule Parens do
def check(expression) do
expression
|> String.split("")
end
end
I'm using the pipe operator |>
here, meaning that whatever comes before it, is going to be the first argument of whatever comes after it. In this case, expression
is going to be the first argument of String.split
.
The second argument is an empty string so we can split the expression at every character. The result of this is a list, like this:
"a + b * (c / d)" |> String.split("")
# ["", "a", " ", "+", " ", "b", " ", "*", " ", "(", "c", " ", "/", " ", "d", ")", ""]
On our theoretical solution, every character but openers/closers were to be ignored, so we can apply a filter in this resulting list. To do that, we can use Enum.filter/2
, which receives two arguments.
The first one is something that "is enumerable", which means that this something must implement the Enum protocol.
Luckly, lists do that, so we can pass our resulting list as the first argument to our filter.
The second argument is a function that receives an element of our list and then decide if it should or shouldn't be in the filtered result. More precisely, if this function returns a falsy value (false
or nil
), then the element will not be in the resulting filtered list. If it returns a truthy value (anything else), it is going to be in the filtered result.
In our case, this function will verify if each element of our list is a parens/brackets/braces and keep it if it is, removing it otherwise. One way to do that is like this:
fn x -> x in ["{", "[", "(", "}", "]", ")"] end
This anonymous function will receive something (x) and see if it belongs to that list, which happens to be a list with only our parens/brackets/braces.
We could also use the capture notation, like this:
&(&1 in ["{", "[", "(", "}", "]", ")"])
The capture notation is similar, but instead of declaring x, we just use &1, &2, &3... for all of our arguments.
Good, so our code right now looks like this:
defmodule Parens do
def check(expression) do
expression
|> String.split("")
|> Enum.filter(&(&1 in ["{", "[", "(", "}", "]", ")"]))
end
end
The next step is to iterate over our resulting list and use our "stack strategy". To do that, I decided to create a private function called iterate
, where I'll recursively walk over our elements, passing the stack around, until we checked all of our elements. Since I'll need the stack, this function will have an arity of 2, which means it will have 2 arguments.
The first thing I do when I'm thinking about recursion it to write the stop condition. In this case, it should stop once our list of characters is empty. I'll make use of the wonderful pattern matching functionality to do that:
defp iterate([], stack), do: stack
This means that this function will only execute when the first element is an empty list and the second element is whatever it arrives here.
In this case, this function will do nothing and just return the stack, so we can later verify if it is empty or not.
A very important thing to note here is that we also have to pass a stack
when we first call the iterate function. Our stack will start empty, so our pipeline will be like this:
def check(expression) do
expression
|> String.split("")
|> Enum.filter(&(&1 in ["{", "[", "(", "}", "]", ")"]))
|> iterate([])
end
Note: We could also use a default value, but I'll keep it like this.
Now the hard part. After writing a stop condition to our recursion, we have to write the recursive step (or steps).
Let's check our plan again. We have to look at each element, and if it is an opener, we put it in the stack and go on. Let's do this part first.
I love pattern matching, so I'll be using it here again. And to further help us, I'll also use guards to decide if the function will be executed or not:
defp iterate([head | tail], stack) when head in ["{", "[", "("] do
end
This function will only be executed if the variable head
belongs to that list of openers. But where is this variable coming from?
We are pattern matching the first argument with [head | tail]
, so this function will need for the first argument to be a list with a first element. This first element will be bound to head
. The rest of the list will be bound to tail
. If you are in doubt, open the interactive elixir shell in your terminal (write iex and press enter) and try this code:
[head | tail] = [1,2,3,4]
head
#=> 1
tail
#=> [2,3,4]
[head2 | tail2] = [0]
head2
#=> 0
tail2
#=> []
[head3 | tail3] = []
# Will raise a matching error!
The head and tail are important concepts. The head is the first element of a list, and the tail is the list itself, but without the first element!
Okay, back to our function. What will it do if the character is an opener? It should push it to the stack and continue iterating. We can do that simply by calling iterate
again with the right arguments.
Since we already checked the first element, we can pass the list without the first element. That is precisely what tail
is! Good, but how can we push the element to the stack?
Elixir offers a good way to put an item in a list, and it is simply like this:
[new_element | stack]
This is a list which its head is new_element
, and its tail is the stack
. You can always go to iex and make some experiments:
[1 | 9,8,7]
#=> [1,9,8,7]
So our function looks like this:
defp iterate([head | tail], stack) when head in ["{", "[", "("],
do: iterate(tail, [head | stack])
When it is an opener, we simply call iterate with tail
and pass the stack with a newly pushed head
item.
Good! Now that we have this part done, let's check the plan again.
We already know how to stop the recursion;
We already took care of the "ignore if not opener/closer" issue;
We already know how to push an opener to the stack and continue;
We now have to do something when the element is a "closer". When that happens, the idea was to look at the last added item of our stack, which happens to be the head of stack
.
This part of the code was heavily refactored, but I'll tell how I did it first time. For each possible closer, which are }
, ]
and )
, I made its single pattern matched function:
defp iterate(["}" | tail], stack)
defp iterate(["]" | tail], stack)
defp iterate([")" | tail], stack)
And for each one of those, I made a simple if/else statement:
defp iterate(["}" | tail], stack) do
if hd(stack) == "{" do
iterate(tail, tl(stack))
else
false
end
end
First of all, I used the functions hd
and tl
, which returns the head and the tail of a list, respectively. It would be the same as this:
defp iterate(["}" | tail], [stack_head | stack_tail]) do
if stack_head == "{" do
iterate(tail, stack_tail)
else
false
end
end
Then, since this function will only execute if the element is exactly }
, I'm verifying if the last added item in the stack (its head) is {
, which would be a match.
If it is, we have to pop it. The "popped" version of the stack is simply its tail, so that's why we don't pass the stack to the next iteration, but rather the stack_tail
.
If it doesn't match, we know that the input string is not valid, so I'm returning a false
here.
The else
part is not really needed, because in order to go inside it, the comparison should be falsy. When that happens, an if
statement alone would return nil
, but I'll keep it with this else
for now. We can refactor later.
This looks like a valid "alpha version" of our iterate
function. What is missing is verifying if the stack is empty or not after the stop condition. A way to do that is to add a comparison with []
in the stop condition or in the pipeline of our check
function. Enum.empty?
would also work, as it does exactly what its name suggests. I'll show the "comparison with []
" first because I can talk briefly about the Kernel
module.
The Kernel
module has all the functions we can use "natively", without calling a module. So if you want so sum two integers, you can do both:
1+2
#=> 3
Kernel.+(1,2)
#=> 3
The ==
comparison is also part of the Kernel
module, so we can do this:
def check(expression) do
expression
|> String.split("")
|> Enum.filter(&(&1 in ["{", "[", "(", "}", "]", ")"]))
|> iterate([])
|> Kernel.==([])
end
This will take whatever returns from the iterate
call and use it as the first argument of Kernel.==
. The second argument is []
, so we are going to return the comparison of the result of iterate
with []
.
If the input is invalid, the iterate
may return false
, and then the comparison with []
will also be false.
Another possibility is that the iterate
will return a not empty stack. Then, the comparison with []
will be false as well.
This actually solves it, but for me it is really ugly.
For starters, sometimes iterate
returns a boolean, and sometimes it returns a list, which could be empty or not.
Now that I wrote about Kernel
a little bit, I think we should not use it in this case (:
To avoid it, we can put this comparison in our stop condition, like this:
defp iterate([], stack), do: stack == []
Another way is to pattern match the second argument as well!
defp iterate([], []), do: true
defp iterate([], _stack), do: false
If the first argument is an empty list and the second argument is also an empty list, we return true
.
If the first argument is an empty list and the second argument is anything (the _ denotes that this variable will not be used), we return false.
Note: We could simply use _ instead of _stack, but I think it is nice to put some name to it in the name of readability, since it is not obvious what the second argument could be
Okay, so this apparently works, right?
If we try to run our code now, we are going to sometimes get an error!
To test it, we can use our module inside iex. To do that, save the code in a file, for example, parens.ex
. Now, run iex parens.ex
. You'll be able to use the Parens
module we just created inside iex!
If you try to check a string where a closer would be found before any opener, the code would try to get the head of an empty stack, which raises an error. You can verify it:
Parens.check("a}")
#> Raises error!
Parens.check("(a}")
#=> false
We can fix this by checking if the stack is empty before trying to get its head.
Or...
We could simply pattern match the second argument to empty list when we have a closer, like this:
defp iterate([head | _], []) when head in ["}", "]", ")"], do: false
The guard will ensure that head
is a closer, and when the stack is an empty list, we just return false
before trying to get the head of an empty list (which raises an error).
The first version of our solution could then be this:
defmodule Parens do
def check(expression) do
expression
|> String.split("")
|> Enum.filter(&(&1 in ["{", "[", "(", "}", "]", ")"]))
|> iterate([])
end
defp iterate([], []), do: true
defp iterate([], _stack), do: false
defp iterate([head | tail], stack) when head in ["{", "[", "("],
do: iterate(tail, [head | stack])
defp iterate([head | _], []) when head in ["}", "]", ")"],
do: false
defp iterate(["}" | tail], stack) do
if hd(stack) == "{" do
iterate(tail, tl(stack))
else
false
end
end
defp iterate(["]" | tail], stack) do
if hd(stack) == "[" do
iterate(tail, tl(stack))
else
false
end
end
defp iterate([")" | tail], stack) do
if hd(stack) == "(" do
iterate(tail, tl(stack))
else
false
end
end
end
Nice! So now our code actually works!
But it doesn't really give me good vibes...
Refactor!
First, we are using lists like ["{", "[", "("]
, ["{", "[", "("]
and ["{", "[", "(", "}", "]", ")"]
, which could be extracted to module attributes.
Module attributes are values that can be used by any method in a module. To define them, we can write @attribute_name
and then its value. We can do it like this:
defmodule Parens do
@opener ["{", "[", "("]
@closer ["}", "]", ")"]
end
This is a nice little addition, so now we can rewrite our guards and filter like this:
Enum.filter(&(&1 in ["{", "[", "(", "}", "]", ")"]))
Enum.filter(&(&1 in @opener || &1 in @closer))
defp iterate([head | tail], stack) when head in ["{", "[", "("]
defp iterate([head | tail], stack) when head in @opener,
defp iterate([head | _], []) when head in ["}", "]", ")"], do: false
defp iterate([head | _], []) when head in @closer, do: false
But we still have 3 big functions that are basically the same, which are the ones that decide what to do when the element we are checking is a closer character.
I'll write what I did and then explain it:
defmodule Parens do
@pairs %{"{" => "}", "[" => "]", "(" => ")"}
defp iterate([head | tail], [stack_head | stack_tail]) when head in @closer,
do: @pairs[stack_head] == head && iterate(tail, stack_tail)
end
So first, I created a @pairs
module attribute which is a map. Each key of the map is an opener character and it maps to a closer character (maps are like hashes and dictionaries, if you are coming from ruby or python).
Then, I made an iterate/2
function that has a guard. This guard will ensure that head
variable (the first element of our list) is a closer (so it is one of )
, ]
or }
).
I also used a &&
boolean operation here:
@pairs[stack_head] == head && iterate(tail, stack_tail)
If the left side is a falsy value, then this value is returned and the right side is not executed (which stops our recursion).
If the left side is truthy, then whatever is at the right side is returned. In this case, the right side is a function call, continuing the recursion.
Now I'll look at @pairs[stack_head]
. Remember that @pairs
is a map, so @pairs["{"]
, for example, returns "}"
.
If whatever is the head of our stack is an opener, then it maps to some closer (@pairs[stack_head]
, then, is some closer). If this closer equals to head
(the element we are checking itself), then the comparison returns true
, which will then return the right side of the &&
, continuing our recursion!
If not, then the comparison will return false
, and not execute the right side of the &&
.
So this is enough to check if the parens are matching and stop the recursion otherwise.
So our second version of the program is:
defmodule Parens do
@pairs %{"{" => "}", "[" => "]", "(" => ")"}
@opener ["{", "[", "("]
@closer ["}", "]", ")"]
def check(expression) do
expression
|> String.split("")
|> Enum.filter(&(&1 in @opener || &1 in @closer))
|> iterate([])
end
defp iterate([], []), do: true
defp iterate([], _stack), do: false
defp iterate([head | tail], stack) when head in @opener,
do: iterate(tail, [head | stack])
defp iterate([head | _], []) when head in @closer, do: false
defp iterate([head | tail], [stack_head | stack_tail]) when head in @closer,
do: @pairs[stack_head] == head && iterate(tail, stack_tail)
end
That's it for today
Thanks for reading this. I'm enjoying Elixir very much. Pattern matching is quite powerful and fun.
Please correct any mistakes I might have made (:
I hope you learned something today, and have a good day.
Top comments (9)
Sure, there are many ways to solve the same coding problem, but each language has its own nitty-gritty details where you/we/people can distinguish those benefits to use it.
Obviously,
pattern matching
is one of the most powerful tools in Elixir/Erlang/FP.You use it - great, but right before that, a couple of unnecessary data transformation happening:
String.split
andEnum.filter
What I am trying to say is that you can pattern match on a string without converting a string to a list and filtering stuff - it is basically a
scanning
(we can use Regex as well, but in most cases pattern matching is even faster).The idea is to use the accumulator (
matches
in our case ) with tail-call optimization: if it iseven
number, everything matches, otherwisefalse
.Very quick implementation with tests is down below:
(Much less code and much faster)
BTW, great job with explaining things, @lucassperez !
Very nice piece of code! I'm still fairly new to Elixir and Functional Programming, but I appreciate your response a lot. Using the pattern match to extract the parentesis is very cool, I didn't know we could pattern match with
<<x::binary-size(1)>>
, but the syntax is very neat. There is so much to learn.I didn't try to refactor this code using Tail Call Optmization, but it is very elegant. Thanks for sharing so much (:
The only problem I can see is not verifying if the parens we get are matching, we are only verifying if we get an even number of parens. I believe that passing something like
(a + [ b - c ) ]
will return true.Yeah, I see.
Spent a little bit time on it again. The code is a little bit ugly since I left a lot of comments and in some places unnecessary guards for the sake of readability.
BTW, you can run tests this way -
elixir match_parens.exs --test
Nice!
I tried to basically remake this post's implementation but using a string as a stack (and not a list), so I'm not using lists anywhere and just evaluating if the stack is empty at the end of everything.
Also, there is an early escape when parens don't match, so we don't have to read the whole input:
This is what I came up with:
I was testing the time execution by running this is IEx:
:timer.tc(fn -> Parens.is_matching?("(((((((((([[{[]}]]))))()([[[]]]))))))){({({({{{{({()})}{({})}}}})})})}[]") end)
I got to the conclusion that using a map to check if the parens are matching is faster that a list with the valid matches, but not by much.
There is a way to avoid
@valid_matches
or@pairs
(in your case) and make it even faster. Again, pattern matching is the Boss, right?!So we can avoid a map and a list by this (in my case):
But again there is a way to make it even faster :)
Great job with cleaning up your solution!
At first I didn't want to make so many repeated function definitions to match specifically
"}"
,"]"
or")"
, I thought it wasn't that readable and too repetitive, but I changed my mind over a few months, and now looking at the perspective of speed, it looks pretty good.Maybe a way to make it faster is to break the recursion everytime the accumulator was set to false and return the false itself? Or maybe the very first definition of
run_check/2
could be?
I'm not sure how to make it faster, does it involve some change in strategy? 🤔
Many thanks! 😁
If we have 2 inputs:
{({})}
and 2.{ ( { } ) }
In
1
we have to perform 6 iterations to walk through it and in2
it's 11, right?So is there a way we can walk through both inputs with the same amount of iterations? If yes - we can apply the same approach to our solution.
(Pattern matching is whispering here something for us again 🙂 )
Well, if we could group the whitespaces together during the match, that would be great. The program would just have to check non-whitespace characters.
Something like this would work for only a single space
And then repeat for brackets and parenthesis.
It would be perfect to match an unspecified amount of whitecharacters at once. Using a regex it would be a breeze, but I'm sure that is not what we want here.
In fact, we would like to match everything that is not one of
{ [ ( ) ] }
, with unspecified length, not just whitespaces. Then we could ignore everything that is irrelevant, essentially doing a filter inside the match.After achieving a way to do that, parsing
abc {}
would require the same amount of iterations as{}
.But how do I do it?
Matches like
"{" <> irrelevant <> "{"
fail because the size of the variableirrelevant
is unknown, and"{" <> <<irr::binary>> <> "{"
fails because a binary field without a size is only allowed as the last member of the match (basically the same problem, Elixir doesn't know up to what point to check).I could write like 200 headers of the function for one, two, three, ..., 200 character length for the
irrelevant
variable, but that would be tedious. I am 100% sure there is a way to make Elixir itself write some Elixir code for me, maybe use some range to dynamically create these functions, and leave a last header matching whatever didn't match before. It would speed up the parsing for most cases, and in the worst scenario it would be the same as it is now.I cleaned up/refactored my solution. Check it out down below.
There is a section
### making a recursive step wider ###
- that code explains my thoughts. I left some logs in that part, so if you run code fromiex
you will clearly see what I was trying to say and the output looks like this:BTW, hope you use
iex
- very powerful tool...