As a student at 42 one of the first projects you have to accomplish by yourself is the reimplementation of a shell. There are many things to do to have a working shell so today I will only focus on the parse tree.
A parse tree is what represents the syntactic structure of our commands and will allow us to easily perform those commands typed by the user.
So we want to be able to translate something like this:
ls | cat -e
Into a shell job that we will be able to execute as a whole command.
The first step is to transform the text input into tokens. That way we will have 4 tokens that represents our text input in a more easier way to process.
Each token has a value (the text input) and a type (if it is an operator or a command argument):
ls -> WORD
| -> PIPE
cat -> WORD
-e -> WORD
Once parsed, we expect to have something that could be rendered like this:
pipe_sequence
/ \
simple_command simple_command
| / \
cmd_name cmd_name cmd_suffix
| | |
'ls' 'cat' cmd_word
|
'-e'
So the second step is to come up with a grammar. A grammar is a set of rules to be followed by the parser to generate the parse tree.
This is a simple version of the grammar of sh
:
pipe_sequence : simple_command
| pipe_sequence '|' simple_command
;
cmd_suffix : cmd_word
| cmd_word cmd_word
;
simple_command : cmd_name
| cmd_name cmd_suffix
;
cmd_name : WORD
;
cmd_word : WORD
;
What this grammar tells us is that:
- A
pipe_sequence
is asimple_command
followed by 0 or moresimple_command
- A
cmd_suffix
is one or morecmd_word
- A
simple_command
is acmd_name
followed or not by acmd_suffix
- A
cmd_name
orcmd_word
is a WORD token
Each rule will be implemented as a function.
Let's start with cmd_word
/*
** struct s_parser is a data type containing informations about the current parser
** (like the tokens list)
*/
struct s_cmd_word *cmd_word(struct s_parser *p)
{
struct s_cmd_word *w = NULL;
/*
** p->tokens is a linked list containing all the tokens
** p->tokens->type is an enum corresponding to the token's type
*/
if (p->tokens->type == T_WORD)
{
w = malloc(sizeof(struct s_cmd_word));
w->data = strdup(t->data);
p->tokens = p->tokens->next
return (w);
}
else
return (NULL);
}
Really simple so far. If the current token is a WORD
then we return a struct s_cmd_word *
containing the actual data. But if the current token is not a word (it could be an operator) we return NULL
. And we should not forget to advance to the next token once we used it.
What about pipe_sequence
?
struct s_pipe_sequence *pipe_sequence(struct s_parser *p)
{
struct s_pipe_sequence *ps;
struct s_simple_command *sc;
ps = malloc(sizeof(struct s_pipe_sequence));
while (sc = simple_command(p))
{
/* This function pushes a new element into a linked list */
ft_lstpush(sc, &ps->simple_commands);
if (p->tokens->type == T_PIPE)
p->tokens = p->tokens->next;
}
if (ps->sc)
return (ps);
free(ps);
return (NULL);
}
The other functions should be as easy to write as the above one.
At the end you should have a parse tree based on the grammar above.
Of course this is a very simplified version of the actual shell grammar but really this is all it takes to be able to generate by hand a parse tree.
If you want to implement a shell you should first visit this website. It's THE place to read first to understand how a shell like sh
behaves.
Thanks for reading :)
Top comments (0)