TL; DR
- Go language has various static analysis tools, but to check your own rules, you need to create your own tools each time.
- To enable more general checks with a single tool, I created a tool to analyze the Go language AST (Abstract Syntax Tree) with Rego, a general-purpose policy language.
https://github.com/m-mizutani/goast
The rule "always take context.Context as the first argument" checked by CI.
Motivation
Various static analysis tools are available for the Go language, and existing static analysis tools can check general best practices. For example, gosec is a tool to check secure Go coding, and I use it myself. However, coding rules in software development are not only based on best practices, but can also be software- or team-specific. For example
- All functions must take
context.Context
as an argument in some package - All functions must call an audit logging function at least once in some package
- The
User
structure must be initialized by the functionNewUser()
. - Some function must be called only in a specific package.
These rules can be checked by humans during review, but it is difficult to rely on humans alone because humans are always capable of making mistakes. Also, even if the review checking works when there are only a few rules, the more rules there are, the more likely they are to be missed inadvertently due to distraction. It also makes it difficult to focus on the essential reviews.
The Go language provides official tools and frameworks for static analysis, making it easy to create your own tools. On the other hand, however, creating an analysis tool for each rule seems to be a bit difficult from the standpoint of implementation and maintenance costs. Therefore, I thought it would be better to create a tool that can check static code as universally as possible.
Separate "rule" and "implementation" by Rego
Although not only for static analysis, one of the key points in creating a versatile checking tool is how to let users describe rules. It is too costly to create an original description language, and if the rules are given in structural data such as YAML or JSON, the expressive power will be limited and the versatility will be reduced.
A useful tool for such applications is the policy description language Rego. Rego is a general-purpose language that can be used to evaluate structured data by OPA. Some of the most popular uses include checking the status of resources used in cloud environments, checking the content of Infrastructure as Code descriptions, and checking authorization for access to servers. Please see this document for more detail of Rego.
By using Rego, the checking implementation and rules can be completely separated. The implementation is responsible for reading files, reading policies, passing data for evaluation, and outputting evaluation results, while the rules are written only in Rego. This allows a separation of interest between those who implement the tool and those who think about the rules.
Implementation
Then, I implemented generic static analysis tool for Go language, goast.
https://github.com/m-mizutani/goast
The tool reads Go code and evaluates the AST (Abstruct Syntax Tree, syntax abstract tree), an abstract representation of the code, by a policy written in Rego. The parser
package is used to get the AST of the Go source code, which is then evaluated by Rego's policy. The evaluation can pass the AST of the whole file only once, or provide a mode to evaluate it node by node of the AST.
Let's look AST of Go code
I am also a beginner in AST in Go, so I can't imagine AST at all just by looking at the code. Therefore, I added a function to goast to dump AST for confirmation.
package main
import "fmt"
func main() {
fmt.Println("hello")
}
goast can output AST dump by a following command.
$ goast dump --line 6 examples/println/main.go | jq
{
"Path": "examples/println/main.go",
"Node": {
"X": {
"Fun": {
"X": {
"NamePos": 44,
"Name": "fmt",
"Obj": null
},
"Sel": {
"NamePos": 48,
"Name": "Println",
"Obj": null
}
},
"Lparen": 55,
"Args": [
{
"ValuePos": 56,
"Kind": 9,
"Value": "\"hello\""
}
],
"Ellipsis": 0,
"Rparen": 63
}
},
"Kind": "ExprStmt"
}
AST structural data tends to be relatively large, and even the 7 lines of code described above would be 1,408 characters of JSON data. Therefore, for ease of reading, only the sixth line of the code (fmt.Println("hello")
) is output. Path
is the path of the read file, Node
is a dump of ast.Node passed by ast.Inspect, and Kind
is the type information of Node
. s type information.
As you can imagine, here .Node.X.Fun
represents information about the calling function, and .Node.X.Args
represents the arguments. For example, you could use this to describe a rule such as "prohibit the invocation of a particular function". You can also use following contexts as additional condition for example.
- Allow/Prohibit calls within a specific package
- Allow/Prohibit certain arguments
- Allow/Prohibit direct passing of literals
Describe a rule
Now let's write Rego rules from the output AST. This time, let's describe a simple rule to forbid calling fmt.Println
.
package goast
fail[res] {
input.Kind == "ExprStmt"
input.Node.X.Fun.X.Name == "fmt"
input.Node.X.Fun.Sel.Name == "Println"
res := {
"msg": "do not use fmt.Println",
"pos": input.Node.X.Fun.X.NamePos,
"sev": "ERROR",
}
}
goast's rule schema is following.
-
package
must begoast
- Input:
input
has metadata such asPath
ใKind
andNode
as actual AST. - Output: Put following structure data into
fail
if violation detected-
msg
(string): Detail message of violation -
pos
(int): Number to indicate position in the source code file -
sev
(string): Severity, choose one fromINFO
,WARNING
, andERROR
-
First, the three lines at the beginning of the rule detect the fmt.Println
that was just dumped, and since the message in the format just dumped is passed directly to Rego as input
, inspection of Kind
, Node.X.Fun.X.Name
and Node.X.Fun.Sel.Name Name
to determine that it is an expression of a function call.
If you are not familiar with ASTs, it may be difficult to understand what pos
means, but in this case it is a number that indicates the number of bytes from the beginning of the file and is stored in some fields such as NamePos
and ValuePos
. By putting the number in a response, goast will convert it to the number of lines in the file where the violation occurred, and the final output will indicate the number of lines.
You can detect violations by saving the Go code as main.go
and the rule as policy.rego
and running
$ goast eval -p policy.rego main.go
[main.go:6] - do not use fmt.Println
Detected 1 violations
Also, goast supports JSON format output.
$ goast eval -f json -p policy.rego main.go
{
"diagnostics": [
{
"message": "do not use fmt.Println",
"location": {
"path": "main.go",
"range": {
"start": {
"line": 6,
"column": 2
}
}
}
}
],
"source": {
"name": "goast",
"url": "https://github.com/m-mizutani/goast"
}
}
Static analysis in CI
Static analysis should be performed continuously by CI (Continuous Integration) to prevent unintentional inclusion of code. The JSON output schema is compatible with reviewdog and can be used as is in reviewdog.
We also have goast-action available for use with GitHub Actions, which allows you to perform static inspection on Pull Requests with the following workflow.
name: goast
on:
pull_request:
jobs:
eval:
runs-on: ubuntu-latest
steps:
- name: checkout
uses: actions/checkout@v2
- uses: reviewdog/action-setup@v1
- name: goast
uses: m-mizutani/goast-action@main
with:
policy: ./policy # Directory of rule files written in Rego
format: json # Output format, "text" or "json"
output: fail.json # File name for output
source: ./pkg # Directory of Go source code to be checked
- name: report
env:
REVIEWDOG_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: cat fail.json | reviewdog -reporter=github-pr-review -f rdjson
By the workflow, a comment such as the following image will be submitted by GitHub Actions.
Conclusion
I believe that static analysis with Go AST and Rego allows developers more flexible static analysis for Go language. It would be helpful to develop more secure software.
However, I thought it's not possible to cover all static inspections by AST that shows the entire source code and a general-purpose policy language. For example, Rego is not good at writing rules that track changes in state, so it is not very suitable for use cases such as "how a variable is referenced or changed".
I have just started using it myself in practice, so I am still in the process of exploring various ways to use it. I welcome feature suggestions and discussion, so please feel free to comment or create an issue in the repository.
Top comments (3)
I like the idea, but from the documentation it is unclear if one could write multiple rules in one Rego policy file. (There's only an example with one rule)
Definitely. Let me add more documentation to write rules
Hi,
I am using the ast.ParseModuleWithOpts() function to parse a string conatining rego. the functions returns a *ast.Module. But I can not find a way to save the module to a file. I would like to do the same as the opa agent command below.
opa parse -format json some.rego
Do you have any suggestions or can ppoint me in the right direction ?
Thanks