Last week Github released their first official version of Code Scanning. It allows users to quickly scan their source code to identify vulnerabilities. But I have a question.
How reliable is it?
In this post, I'll try to give an answer to that based on a sample C code.
I will use CodeQl to scan a deliberately bugged code and then we will analyze the results to check the performance of the tool.
CodeQl code scanning
Github code scanning uses CodeQl, a semantic code analysis engine. It runs queries on your code to identify potential threats and bad patterns. It supports a huge variety of languages (C/C++, Go, Java, JavaScript, Python, ecc.) and it has a cool action called autobuild
that you can use to build your code.
How to set it up
To enable CodeQl in your GitHub repository, you can follow the official documentation.
If you are a VS code user there's also an extension for you.
Reference Environment
For this evaluation I start from the standard CodeQl configuration, then I specify the language of the source code for the analysis: language: ['cpp']
plus an extended set of queries: queries: +security-extended, security-and-quality
.
name: "CodeQL"
on:
push:
branches: [main]
pull_request:
# The branches below must be a subset of the branches above
branches: [main]
schedule:
- cron: '0 13 * * 3'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language: ['cpp']
steps:
- name: Checkout repository
uses: actions/checkout@v2
with:
fetch-depth: 2
- run: git checkout HEAD^2
if: ${{ github.event_name == 'pull_request' }}
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
queries: +security-extended, security-and-quality
- run: |
make
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1
Reference code
I'm using a C code that contains, on purpose, some traits, and errors. The reference code is the fuzzgoat repository from fuzzstation and it contains 4 main vulnerabilities:
- Use of memory after a free:
/******************************************************************************
WARNING: Fuzzgoat Vulnerability
The line of code below frees the memory block referenced by *top if
the length of a JSON array is 0. The program attempts to use that memory
block later in the program.
Diff - Added: free(*top);
Payload - An empty JSON array: []
Input File - emptyArray
Triggers - Use after free in json_value_free()
******************************************************************************/
free(*top);
/****** END vulnerable code **************************************************/
- 2 invalid frees:
switch (value->type)
{
case json_object:
if (!value->u.object.length)
{
settings->mem_free (value->u.object.values, settings->user_data);
break;
}
/******************************************************************************
WARNING: Fuzzgoat Vulnerability
The line of code below incorrectly decrements the value of
value->u.object.length, causing an invalid read when attempting to free the
memory space in the if-statement above.
Diff - [--value->u.object.length] --> [value->u.object.length--]
Payload - Any valid JSON object : {"":0}
Input File - validObject
Triggers - Invalid free in the above if-statement
******************************************************************************/
value = value->u.object.values [value->u.object.length--].value;
/****** END vulnerable code **************************************************/
continue;
case json_string:
/******************************************************************************
WARNING: Fuzzgoat Vulnerability
The code below decrements the pointer to the JSON string if the string
is empty. After decrementing, the program tries to call mem_free on the
pointer, which no longer references the JSON string.
Diff - Added: if (!value->u.string.length) value->u.string.ptr--;
Payload - An empty JSON string : ""
Input File - emptyString
Triggers - Invalid free on decremented value->u.string.ptr
******************************************************************************/
if (!value->u.string.length){
value->u.string.ptr--;
}
/****** END vulnerable code **************************************************/
- Null pointer dereference:
/******************************************************************************
WARNING: Fuzzgoat Vulnerability
The code below creates and dereferences a NULL pointer if the string
is of length one.
Diff - Check for one byte string - create and dereference a NULL pointer
Payload - A JSON string of length one : "A"
Input File - oneByteString
Triggers - NULL pointer dereference
******************************************************************************/
if (value->u.string.length == 1) {
char *null_pointer = NULL;
printf ("%d", *null_pointer);
}
/****** END vulnerable code **************************************************/
Code analysis results
These are the results from CodeQl.
Known vulnerabilities
As you can see in the analysis above, none of the vulnerability that were introduced on purpose on fuzzgoat are discovered by CodeQl.
free
errors are difficult to spot
but this is not true for a NULL pointer dereference! It can easily be caught by other tools like clang:
$ scan-build-8 make all
scan-build: Using '/usr/lib/llvm-8/bin/clang' for static analysis
/usr/share/clang/scan-build-8/bin/../libexec/ccc-analyzer -o fuzzgoat -I. main.c fuzzgoat.c -lm
fuzzgoat.c:298:29: warning: Dereference of null pointer (loaded from variable 'null_pointer')
printf ("%d", *null_pointer);
^~~~~~~~~~~~~
New findings
On the other side, let's analyze the actual results.
CodeQl was able to locate a potential buffer overflow which is a crucial vulnerability.
That one is probably a false positive since the size of the buffer is fixed, but in general, it's a good practice to be cautious when performing a buffer write operation.
Apart from that, the other findings are:
- Missing enum case in the switch;
- Poorly documented large function;
- Time-of-check time-of-use filesystem race condition;
- Uncontrolled data used in path expression;
- Potentially uninitialized local variable.
This is it!
After this evaluation, I think that CodeQl is a good tool for the daily use to get insight on the code quality: it's easy to use and configure, it can be quickly integrated into your GitHub repository and it provides small actionable results.
The downside is that you can have a high number of false positives to deal with.
Reach me on Twitter @gasparevitta and let me know your thoughts!
You can find the code snippets on Github
This article was originally published on my blog. Head over there if you like this post and want to read others like it!
Top comments (2)
Hi Gaspare! I'm always impressed by your articles, you're very thorough with what you do.
I wanted to give this a try with my Python projects, but if it generates a lot of false positives it's easy to start discarding warning messages without even looking. It's like with unit test, if they're flaky and fail randomly you won't take them seriously. Better work without it.
We'll see how it does. Great work!
Hi Manuel, thanks for the good words!
At the moment the tool has a lot of false positive, I know. Probably there are better options out there.
Flaky tests are a pain but I don't think that no tests are all is better :)
I may write something about flakiness and how to deal with it in the future.