Kurt Feeley for AWS Community Builders

Posted on Aug 8, 2023 • Edited on Aug 13, 2023

Find Source Code Vulnerabilities with CodeQL Before You Commit

#python #security #tutorial #shiftleft

You have a plethora of Python code to commit for your new Django API. STOP! Before you commit and push, first scan your source code with the CodeQL CLI!

Photo by iMattSmart on Unsplash

The Solution

In this tutorial, we’ll go over scanning Python source code for vulnerabilities in a development environment using the CodeQL CLI.

Prerequisites

To complete this tutorial, you will need to install the CodeQL CLI.

Our Dev Environment

This tutorial was developed using Ubuntu 22.10, Python 3.10.6, CodeQL CLI 2.13.1 and Visual Studio Code 1.78.2. Some commands/constructs may vary across platforms.

What is CodeQL?

CodeQL is a type of static application security testing (SAST) scanner that scans source code for vulnerabilities. A vulnerability is a weakness in an application that allows an attacker to cause harm to the application’s owner, the application users, and/or organizations that rely on the application, et. al. Popular attacks include: SQL injection, cross-site scripting and brute force attacks. Using a tool like CodeQL early in the development process can save time, money and possibly prevent damage to a company’s reputation.

1) Setup the CodeQL CLI

Download the CodeQL CLI

Point your browser to the CodeQL releases page on GitHub and download the archive that corresponds to the platform that you are using. For this tutorial, we are downloading and using the release specified with “linux64” in the filename.

CodeQL Releases: https://github.com/github/codeql-cli-binaries/releases

Once the file has been downloaded, extract the files from the archive. For our Linux system, we’ll use the unzip command.

$ unzip codeql-linux64.zip -d ~/bin/codeql/

We are going to take one more step and add the codeql executable to our PATH variable so that we can call “codeql” from any location within the OS. On our Ubuntu system we can accomplish this by modifying the PATH variable in the ~/.profile file by appending the path of the codeql executable.

Test the CodeQL CLI

We can test the CodeQL CLI by checking the version at the command line.

$ codeql ––version

If everything is setup correctly, We should see output something like this:

CodeQL command-line toolchain release 2.13.1.
Copyright (C) 2019-2023 GitHub, Inc.
Unpacked in: /home/user/bin/codeql
Analysis results depend critically on separately distributed query and
extractor modules. To list modules that are visible to the toolchain,
use ‘codeql resolve qlpacks’ and ‘codeql resolve languages’.

You can further test by using the following command to get a list of the languages that can be used.

$ codeql resolve languages

Download the CodeQL Language Packs

To download precompiled queries for Python, use the following command:

$ codeql pack download codeql/python-queries

2) Create the CodeQL Database

Now that we have CodeQL downloaded and configured, we can create the CodeQL database.

Create a Directory for the CodeQL Database

The first thing we will need to do is create a directory to house the CodeQL database.

$ mkdir ~/codeql-dbs

Create the CodeQL Database

Now that we have a location for the database, let’s change to the directory of your app.

$ cd ~/source/python-app

Now we are set to create the CodeQL database with the following command:

Parameters:
~/codeql-dbs/python-app: The CodeQL database location.
language: The language to scan. In this case, Python.

$ codeql database create ~/codeql-dbs/python-app \
––language=python

If everything goes to plan, the output of the database create command will end with something like this:

“Successfully created database at /home/user/codeql-dbs/python-app.”

3) Scan the Source Code for Vulnerabilities

With the CodeQL database created, we can start to scan our source code.

Create a Directory for the CodeQL Output

CodeQL aggregates its findings in an output file. Let’s create a directory to house the output file.

$ mkdir ./codeql-output/

Code Analysis

Running the following command will instruct CodeQL to analyze the code using the previously built database for, “python-app.”

Parameters:
~/codeql-dbs/python-app: The CodeQL database location.
format: The output format. (Also supports SARIF and graph formats)
output: The path to the output file.

$ codeql database analyze ~/codeql-dbs/python-app \
––format=”csv” \
––output=”./codeql-output/scan.csv”

When CodeQL completes its analysis, the console should display a message like:

Shutting down query evaluator.
Interpreting results.
Analysis produced the following diagnostic data:
| Diagnostic | Summary |
+——————————+———–+
| Compilation message | 3 results |
| Successfully extracted files | 6 results |
Analysis produced the following metric data:
| Metric | Value |
+—————————————-+——–+
| Total lines of Python code in the database | 13,700 |

View the CodeQL Analysis Output

$ nano ./codeql-output/scan.csv

Summary

We have concluded this tutorial where you have learned how to scan Python source code for vulnerabilities in a development environment using the CodeQL CLI.

Now, before you commit code for that Django API –– scan it for source code vulnerabilities with CodeQL before you commit.

DEV Community