Sylver is a language agnostic platform for building custom source code
analyzers (think eslint for every language).
This might be a lot to unpack, so let us explore this tool
by solving a real-world problem: our application's configuration is stored in
complex JSON documents, and we'd like to build a tool to automatically validate
these documents against our business rules.
In this series of tutorials, we'll go from having zero knowledge of Sylver or
static analysis to building a fully-fledged linter for our configuration files.
We will use JSON as an example, but the tools and techniques presented
apply to many data formats and even complete programming languages!
Also, note that while we will be building everything from scratch using
Sylver's domain-specific languages (DSL), a catalog of built-in specifications for the most common languages will be included in future releases of the tool.
In part 1, we will discover Sylver's meta language, a DSL used to describe the shape
of programming languages and data formats. After completing this tutorial, we'll have
a complete spec for the JSON language, allowing us to turn JSON documents into
Sylver parse trees.In part 2, we will load the parse trees into Sylver's query engine,
and we will find nodes that violate our business rules using an SQL-like query
language.In part 3, we will learn how to turn our language spec and queries into a set of
linting rules so that we can run them conveniently and share them.
Installation
Sylver is distributed as a single static binary. Installing it is as simple as:
- Go to https://sylver.dev to download the binary for your platform
- Unpack the downloaded archive
- Move the
sylver
binary to a location in your$PATH
Prelude
Let us create a blank workspace for this tutorial!
We'll start by creating a new folder and a json.syl
file to write
the Sylver specification for the JSON language.
mkdir sylver_getting_started
cd sylver_getting_started
touch json.syl
We will also store our test JSON file in config.json
.
{
"variables": [
{
"name": "country",
"description": "Customer's country of residence",
"values": ["us", "fr", "it"]
},
{
"name": "age",
"description": "Cusomer's age",
"type": "number"
}
]
}
This file specifies the variables used to describe customer profiles in a fictional
customer database.
Variables are assigned a type or a list of potential values.
Types definition
Sylver parses raw text into typed structures called parse trees. The first step in defining a language spec is to define a set of types for our tree nodes.
We'll add the following node types declarations to our json.syl
spec:
node JsonNode { }
node Null: JsonNode { }
node Bool: JsonNode { }
node Number: JsonNode { }
node String: JsonNode { }
node Array: JsonNode {
elems: List<JsonNode>
}
node Object: JsonNode {
members: List<Member>
}
node Member: JsonNode {
key: String,
value: JsonNode
}
These declarations resemble object type declarations in many mainstream languages.
The :
syntax denotes inheritance.
Now that we have a set of types to describe JSON documents, we need to specify how to build a
parse tree from a sequence of characters. This process is done in two steps:
- Lexical analysis: in this step, individual characters that form an indivisible entity (such as the digits of a number or the characters of a string) are grouped together into tokens. Some tokens are only one character-wide (for example, the brackets and semicolons in JSON).
- Syntactic analysis, in which tree nodes are built for the stream of tokens.
Lexical analysis
Tokens are described using declarations of the form term NAME = <term_content>
where
<term_content>
is either a literal surrounded by single-quotes ('
) or a regex
between backticks (`
). The regexes use a syntax similar to Perl-style regular expressions.
Characters in the input string that match one of the terminal literals
or regexes will be grouped into a token of the given name.
term COMMA = ','
term COLON = ':'
term L_BRACE = '{'
term R_BRACE = '}'
term L_BRACKET = '['
term R_BRACKET = ']'
term NULL = 'null'
term BOOL_LIT = `true|false`
term NUMBER_LIT = `\-?(0|([1-9][0-9]*))(.[0-9]+)?((e|E)(\+|-)?[0-9]+)?`
term STRING_LIT = `"([^"\\]|(\\[\\/bnfrt"])|(\\u[a-fA-F0-9]{4}))*"`
ignore term WHITESPACE = `\s`
Term rules for numbers and strings are slightly involved in accounting for some
of JSON's peculiarities.
Note that the WHITESPACE
term declaration (matching a single whitespace character) is prefixed with the ignore
keyword. This means that WHITESPACE
tokens do not affect
the structure of the document and can be ignored during syntactic analysis.
Syntactic analysis
In this last part of the language spec, we write rules describing how tree nodes are built by matching tokens from the input stream.
For example, a rule specifying: "if the current token is a STRING_LIT, build a
String node" can be written as follows:
rule string = String { STRING_LIT }
Rules can refer to other rules to construct nested nodes.
For example, here is a rule specifying that a Member
node (corresponding to an object member
in JSON) can be built by building a node using the string
rule and then matching a COLON
token followed by any valid JSON value:
rule member = Member { key@string COLON value@main }
Nested nodes are associated with a field using the @
syntax.
The main
rule is the entry point for the parser, so in our case, it designates any valid JSON value.
A valid JSON document can be made of a 'null' literal, a number, a boolean value,
a string, an array of JSON values, or a JSON object, which is reflected in the main rule:
rule main =
Null { NULL }
| Number { NUMBER_LIT }
| Bool { BOOL_LIT }
| string
| Array { L_BRACKET elems@sepBy(COMMA, main) R_BRACKET }
| Object { L_BRACE members@sepBy(COMMA, member) R_BRACE }
The sepBy(TOKEN, rule_name)
syntax is used to parse nodes using the main
rule,
while matching a TOKEN
token between every parsed node.
Conclusion
We now have a complete language spec for the JSON language:
node JsonNode { }
node Null: JsonNode { }
node Bool: JsonNode { }
node Number: JsonNode { }
node String: JsonNode { }
node Array: JsonNode {
elems: List<JsonNode>
}
node Object: JsonNode {
members: List<Member>
}
node Member: JsonNode {
key: String,
value: JsonNode
}
term COMMA = ','
term COLON = ':'
term L_BRACE = '{'
term R_BRACE = '}'
term L_BRACKET = '['
term R_BRACKET = ']'
term NULL = 'null'
term BOOL_LIT = `true|false`
term NUMBER_LIT = `\-?(0|([1-9][0-9]*))(.[0-9]+)?((e|E)(\+|-)?[0-9]+)?`
term STRING_LIT = `"([^"\\]|(\\[\\/bnfrt"])|(\\u[a-fA-F0-9]{4}))*"`
ignore term WHITESPACE = `\s`
rule string = String { STRING_LIT }
rule member = Member { key@string COLON value@main }
rule main =
Null { NULL }
| Number { NUMBER_LIT }
| Bool { BOOL_LIT }
| string
| Array { L_BRACKET elems@sepBy(COMMA, main) R_BRACKET }
| Object { L_BRACE members@sepBy(COMMA, member) R_BRACE }
The last step is to test it on our test file!
This is done by invoking the following command:
sylver parse --spec=json.syl --file=config.json
Which yields the following parse tree:
Object {
. ● members: List<Member> {
. . Member {
. . . ● key: String { "variables" }
. . . ● value: Array {
. . . . ● elems: List<JsonNode> {
. . . . . Object {
. . . . . . ● members: List<Member> {
. . . . . . . Member {
. . . . . . . . ● key: String { "name" }
. . . . . . . . ● value: String { "country" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "description" }
. . . . . . . . ● value: String { "Customer's country of residence" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "values" }
. . . . . . . . ● value: Array {
. . . . . . . . . ● elems: List<JsonNode> {
. . . . . . . . . . String { "us" }
. . . . . . . . . . String { "fr" }
. . . . . . . . . . String { "it" }
. . . . . . . . . }
. . . . . . . . }
. . . . . . . }
. . . . . . }
. . . . . }
. . . . . Object {
. . . . . . ● members: List<Member> {
. . . . . . . Member {
. . . . . . . . ● key: String { "name" }
. . . . . . . . ● value: String { "age" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "description" }
. . . . . . . . ● value: String { "Customer's age" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "type" }
. . . . . . . . ● value: String { "number" }
. . . . . . . }
. . . . . . }
. . . . . }
. . . . }
. . . }
. . }
. }
}
In the next part, we'll define business rules to validate our JSON
configuration (for example, the possible values for each variable must be
of the same type), and we will use a query DSL to identify the tree nodes that violate these rules.
Top comments (0)