DEV Community

Cover image for AST parser with PHP and Bison
Anton Sukhachev
Anton Sukhachev

Posted on • Edited on

AST parser with PHP and Bison

[original post]

Read this post if you don't know what Bison is.

I already have the Bison AST parser, but this time I will do it without PHP FFI.

First, we need to install the PHP skeleton package to build the parser and tree printer package to print AST.

composer require --dev mrsuh/php-bison-skeleton
composer require mrsuh/tree-printer
Enter fullscreen mode Exit fullscreen mode

It will more readable if we separate code from printer.php to individual files.

.
├── /ast-parser
    ├── /bin
    │   └── parse.php # entry point for parser
    ├── /lib
    │   └── parser.php # generated file
    ├── /src
    │   ├── Lexer.php
    │   └── Node.php # AST node
    └── grammar.y       
Enter fullscreen mode Exit fullscreen mode

To print AST with the tree printer package Node class must implement Mrsuh\Tree\NodeInterface.

src/Node.php

<?php

namespace App;

use Mrsuh\Tree\NodeInterface;

class Node implements NodeInterface
{
    private string $name;
    private string $value;
    /** @var Node[] */
    private array $children;

    public function __construct(string $name, string $value, array $children = [])
    {
        $this->name     = $name;
        $this->value    = $value;
        $this->children = $children;
    }

    public function getChildren(): array
    {
        return $this->children;
    }

    public function __toString(): string
    {
        $line = $this->name;
        if (!empty($this->value)) {
            $line .= sprintf(" '%s'", $this->value);
        }

        return $line;
    }
}
Enter fullscreen mode Exit fullscreen mode

Lexer is not modified from previous post, but this time we will put it in a separate file src/Lexer.php.

src/Lexer.php

<?php

namespace App;

class Lexer implements LexerInterface {

    private array $words;
    private int   $index = 0;
    private int   $value = 0;

    public function __construct($resource)
    {
        $this->words = explode(' ', trim(fgets($resource)));
    }

    public function yyerror(string $message): void
    {
        printf("%s\n", $message);
    }

    public function getLVal()
    {
        return $this->value;
    }

    public function yylex(): int
    {
        if ($this->index >= count($this->words)) {
            return LexerInterface::YYEOF;
        }

        $word = $this->words[$this->index++];
        if (is_numeric($word)) {
            $this->value = (int)$word;

            return LexerInterface::T_NUMBER;
        }

        return ord($word);
    }
}
Enter fullscreen mode Exit fullscreen mode

For example, Lexer will translate the expression 10 + 20 - 30 into this:

word token value
10 LexerInterface::T_NUMBER (258) 10
+ ASCII (43)
20 LexerInterface::T_NUMBER (258) 20
- ASCII (45)
30 LexerInterface::T_NUMBER (258) 30
LexerInterface::YYEOF (0)

It's time to create the grammar.y file and build lib/parser.php
You can define %code blocks, so Bison will render code as is in printer.php

grammar.y

%code imports { // code imports };
%code parser  { // code parser }; 
%code init    { // code init };           
Enter fullscreen mode Exit fullscreen mode

printer.php

<?php

// code imports

class Parser {
    // code parser
    public function __construct() {
        // code init
    }
}
Enter fullscreen mode Exit fullscreen mode

We will use block %code parser to define variables and methods to store AST into the Parser class.
Bison has reserved the symbol $ in grammar actions.
It's very sad for PHP developers, but we can call the function setAst() with self::setAst() instead of $this->setAst().

grammar.y

%define api.parser.class {Parser}
%define api.namespace {App}
%code parser {
    private Node $ast;
    public function setAst(Node $ast): void { $this->ast = $ast; }
    public function getAst(): Node { return $this->ast; }
}

%token T_NUMBER

%left '-' '+'

%%
start:
  expression                 {  self::setAst($1); }
;

expression:
  T_NUMBER                   { $$ = new Node('NUMBER', $1); }
| expression '+' expression  { $$ = new Node('OPERATION_PLUS', '', [$1, $3]);  }
| expression '-' expression  { $$ = new Node('OPERATION_MINUS', '', [$1, $3]);  }
;
Enter fullscreen mode Exit fullscreen mode
bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o lib/parser.php grammar.y
Enter fullscreen mode Exit fullscreen mode

Command options:

  • -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 - path to skeleton file
  • -o parser.php - output parser file
  • grammar.y - our grammar file

And final PHP file is the entry point bin/parse.php.

bin/parse.php

<?php

require_once __DIR__ . '/../vendor/autoload.php';

use App\Lexer;
use App\Parser;
use Mrsuh\Tree\Printer;

$lexer  = new Lexer(STDIN);
$parser = new Parser($lexer);
if (!$parser->parse()) {
    exit(1);
}

$printer = new Printer(STDOUT);
$printer->print($parser->getAst());
Enter fullscreen mode Exit fullscreen mode

We need to add a special autoload section to composer.json for generated lib/parser.php file.

composer.json

{
    "autoload": {
        "psr-4": {
            "App\\": "src/"
        },
        "files": ["lib/parser.php"]
    },
    ...
}
Enter fullscreen mode Exit fullscreen mode

Ok. Our parser is ready and we can test it:

php bin/parse.php <<< "1 + 2 - 3"
.
├── OPERATION_MINUS
    ├── OPERATION_PLUS
    │   ├── NUMBER '1'
    │   └── NUMBER '2'
    └── NUMBER '3'
Enter fullscreen mode Exit fullscreen mode

Try to parse big expression:

php bin/parse.php <<< "1 + 2 - 3 + 4 - 5 + 6 - 7 + 8 - 9 + 10"
.
├── OPERATION_PLUS
    ├── OPERATION_MINUS
    │   ├── OPERATION_PLUS
    │   │   ├── OPERATION_MINUS
    │   │   │   ├── OPERATION_PLUS
    │   │   │   │   ├── OPERATION_MINUS
    │   │   │   │   │   ├── OPERATION_PLUS
    │   │   │   │   │   │   ├── OPERATION_MINUS
    │   │   │   │   │   │   │   ├── OPERATION_PLUS
    │   │   │   │   │   │   │   │   ├── NUMBER '1'
    │   │   │   │   │   │   │   │   └── NUMBER '2'
    │   │   │   │   │   │   │   └── NUMBER '3'
    │   │   │   │   │   │   └── NUMBER '4'
    │   │   │   │   │   └── NUMBER '5'
    │   │   │   │   └── NUMBER '6'
    │   │   │   └── NUMBER '7'
    │   │   └── NUMBER '8'
    │   └── NUMBER '9'
    └── NUMBER '10'
Enter fullscreen mode Exit fullscreen mode

Great!

You can get the parser source code here and test it by yourself.

Some useful links:

Top comments (0)