DEV Community

Aavash Shrestha for Deta

Posted on • Updated on

Track custom github repository metrics with Github Webhooks, FastAPI and DETA

Tracking important metrics for a Github repository facilitates analysis and optimization of a team's development and delivery process. The metrics help in identifying bottlenecks, prioritizing resources and setting goals.

In this tutorial, we are going to implement and deploy a simple self-hosted application that keeps track of the Code review turnaround, the time take between a code review assignment and completion, for a repository using Github Webhooks, FastAPI and Deta.

This tutorial showcases how, with Deta, you can direct most of your focus and effort on developing the tool and get it seamlessly deployed to production.

The application's users can either generate a graph or get a json response of average code review turnarounds for a specified duration averaged over a specified period. To generate the graph, we will use Bokeh, an interactive visualization library.

sample_plot

{
  "2020-09-18T09:00:22": 0, 
  "2020-09-19T09:00:22": 454.88,
  "2020-09-20T09:00:22": 1315.75,
  "2020-09-21T09:00:22": 87.13,
  "2020-09-22T09:00:22": 178.05,
  "2020-09-23T09:00:22": 95.83,
  "2020-09-24T09:00:22": 40.7,
  "2020-09-25T09:00:22": 0
}
Enter fullscreen mode Exit fullscreen mode

I have named the application GRT (Github Code Review Turnaround). A complete source code of the application is available on github.

Application Design

app_design

GRT needs to achieve three main things in order to keep track of the code review turnarounds and allow users to easily retrieve them:

  • Know when a code review request has been requested or deleted, and when a code review has been submitted. We use Github Webhooks for this.
  • Store and update information about code reviews in a persistent storage. For this we use Deta Base.
  • Offer an api for users to see the average code review turnaround. The user should be able to specify the type of response, duration and the period to average the metrics over.

Implementation and Deployment

The following guide assumes that you have signed up for Deta and have the Deta CLI installed.

This guide is also only for unix environments. Some shell commands might be different for windows. Please, use respective commands for windows.

Create a deta micro

Firstly, we create a new FastAPI application on Deta.

  • Create a directory called grt and cd into it.
$ mkdir grt && cd grt
Enter fullscreen mode Exit fullscreen mode
  • Create two files main.py and requirements.txt in the root of the directory.
$ touch main.py requirements.txt
Enter fullscreen mode Exit fullscreen mode
  • Specify fastapi and bokeh as dependencies in the requirements.txt file. It should look like this:
fastapi
bokeh
Enter fullscreen mode Exit fullscreen mode
  • Create the Deta Micro. With a main.py file and fastapi specified as a dependency in the requirements.txt file, all you need to do is type deta new in order to create a new fastAPI app. From the root of the directory, enter
$ deta new
Successfully created a new micro
{
    "name": "grt",
    "runtime": "python3.7",
    "endpoint": "https://{your_subdomain}.deta.dev",
    "visor": "enabled",
    "http_auth": "enabled"
}
Adding dependencies...
Collecting fastapi
...
Collecting bokeh
...
Successfully installed Jinja2-2.11.2 MarkupSafe-1.1.1 PyYAML-5.3.1 bokeh-2.2.1 fastapi-0.61.1 numpy-1.19.2 packaging-20.4 pillow-7.2.0 pydantic-1.6.1 pyparsing-2.4.7 python-dateutil-2.8.1 six-1.15.0 starlette-0.13.6 tornado-6.0.4 typing-extensions-3.7.4.3 
Enter fullscreen mode Exit fullscreen mode

The endpoint will be different for your micro.

  • You should see that http_auth is enabled by default. We will disable the auth for the github webhook and use a webhook secret.

Set up the webhook

  • The command deta details shows details about your deployed micro including your micro's http endpoint. Copy your micro's endpoint from the output of deta details.
$ deta details
{
    "name": "grt",
    "runtime": "python3.7",
    "endpoint": "https://{your_subdomain}.deta.dev",
    "visor": "enabled",
    "http_auth": "enabled"
}
Enter fullscreen mode Exit fullscreen mode
  • Go to Webhooks under Settings for the repository you want to track your metrics on and click on Add Webhook.

  • In the Payload URL, use your micro's endpoint with the route /webhook_events as the webhook endpoint.

https://{your_subdomain}.deta.dev/webhook_events
Enter fullscreen mode Exit fullscreen mode
  • Change the Content type to application/json.
  • Generate a long secure random string (there are services online that do this) and use that as the Webhook Secret. Keep hold of this secret as you will need it to set up the app's environment later.
  • Select Let me select individual events when selecting the events to trigger the webhook. Select the following events:
    • Pull requests : To know when a code review is requested
    • Pull requests reviews : To know when a code review has been submitted or deleted
  • Click on Add Webhook to add the webhook.

Set up the environment

The webhook secret used in setting up the webhook is provided to the micro through an environment variable WEBHOOK_SECRET.

  • Create a .env file in the app's root directory and add your secret in the file. Make sure not to expose this file publicly.
$ echo WEBHOOK_SECRET=your_webhook_secret > .env
$ cat .env
WEBHOOK_SECRET=your_webhook_secret
Enter fullscreen mode Exit fullscreen mode
  • Update the environment variables of your app.
$ deta update -e .env
Enter fullscreen mode Exit fullscreen mode

You should see that the environment variables have been successfully updated.

Implement the webhook endpoint

Time to code, let's add a POST endpoint that receives webhook events from github.

  • Open main.py in your editor and add the following code.
from fastapi import FastAPI, Request

# FastAPI app
app = FastAPI()

@app.post("/webhook_events")
async def webhook_handler(request: Request):
    # handle events
    payload = await request.json()
    event_type = request.headers.get("X-Github-Event")

    # reviews requested or removed
    if event_type == "pull_request":
        action = payload.get("action")
        if action == "review_requested":
            # TODO: store review request
        return "ok" 
        elif action == "review_request_removed":
            # TODO: delete review request 
        return "ok"        
    return "ok"

    # review submitted
    if event_type == "pull_request_review" and payload.get("action") == "submitted":
    # TODO: update review request
        return "ok"

    # ignore other events
    return "ok"
Enter fullscreen mode Exit fullscreen mode

Github sends different payloads for different events. The event_type is denoted by the header X-Github-Event and the action is denoted by the action field in the payload. The code above just identifies what action triggered the webhook.

As you can see, there are several TODOs in the code. For now, we just return ok to github without actually doing anything with the data. We will handle these events properly after we have implemented storing, retrieving, updating and deleting the review requests' information.

The next step is to verify the signature sent by github. The signature is used for integrity and authentication; it verifies that the payload came from github and that the payload has not been modified by anybody in between.

  • Create a file called utils.py and add the following code.
import os
import hmac

# get the webhook secret from the environment
WEBHOOK_SECRET = os.getenv("WEBHOOK_SECRET")

# caclulate hmac digest of payload with shared secret token
def calc_signature(payload):
    digest = hmac.new(
        key=WEBHOOK_SECRET.encode("utf-8"), msg=payload, digestmod="sha1"
    ).hexdigest()
    return f"sha1={digest}"
Enter fullscreen mode Exit fullscreen mode
  • Open main.py and add the code to verify the signature.
from fastapi import FastAPI, Request, HTTPException

import utils

# FastAPI app
app = FastAPI()

@app.post("/webhook_events")
async def webhook_handler(request: Request):
    # verify webhook signature
    raw = await request.body()
    signature = request.headers.get("X-Hub-Signature")
    if signature != utils.calc_signature(raw):
        raise HTTPException(status_code=401, detail="Unauthorized")

    # handle events
    payload = await request.json()
    event_type = request.headers.get("X-Github-Event")

    # reviews requested or removed
    if event_type == "pull_request":
        action = payload.get("action")
        if action == "review_requested":
            # TODO: store review request
        return "ok" 
        elif action == "review_request_removed":
            # TODO: delete review request 
        return "ok"        
    return "ok"

    # review submitted
    if event_type == "pull_request_review" and payload.get("action") == "submitted":
    # TODO: update review request
        return "ok"

    # ignore other events
    return "ok"
Enter fullscreen mode Exit fullscreen mode

Now only github can send you payloads on your webhook endpoint. Let's deploy what we have until now with the deta deploy command. It should take only a few seconds.

$ deta deploy
Enter fullscreen mode Exit fullscreen mode

Implement the review request store

We are using Deta Base to store information about review requests.

We use the python SDK to talk to our database which is pre-installed on a deta micro.

Each item in the database is information about a review request and will have the following schema

{
    "key": str, // randomly_generated
    "reviewer": str, // reviewer
    "pull_request": int, // pull request number 
    "requested_at" : int, // posix timestamp of request
    "submitted_at" : int, // posix timestamp of review submission
    "submitted": bool, // if the review has been submitted
    "crt": int // code review turnaround in seconds
}
Enter fullscreen mode Exit fullscreen mode
  • Create a file called reviews.py with the following code.
from dateutil.parser import isoparse
from datetime import datetime, timezone
from deta import Deta

# manages storing, fetching and updating review requests information
class ReviewRequestStore:
    def __init__(self):
        # creating a new base (or table) is only one line of code
        self.db = Deta().Base("code_reviews")

    # get review req from pull request number and reviewer
    def __get_review_req(self, pr_num: int, reviewer: str):
        # generator
        review_reqs_gen = next(
            self.db.fetch(
                {"submitted": False, "pull_request": pr_num, "reviewer": reviewer}
            )
        )

        review_reqs = []
        for r in review_reqs_gen:
            review_reqs.append(r)

        # there should be only one corresponding unsubmitted review request
        if len(review_reqs) == 0:
            raise Exception("No corresponding review request found")

        if len(review_reqs) > 1:
            raise Exception(
                "Found multiple imcomplete reviews for same pull request and reviewer"
            )

        return review_reqs[0]

    # store review request
    def store(self, payload: dict):
        # POSIX timestamp
        current_time = int(datetime.now(timezone.utc).timestamp())
        item = {
            "reviewer": payload["requested_reviewer"]["login"],
            "pull_request": payload["pull_request"]["number"],
            "requested_at": current_time,
            "submitted": False,
        }

        self.db.put(item)

    # mark review request complete
    def mark_complete(self, payload: dict):
        submission_time = int(isoparse(payload["review"]["submitted_at"]).timestamp())

        pr_num = payload["pull_request"]["number"]
        reviewer = payload["review"]["user"]["login"]
        review_req = self.__get_review_req(pr_num, reviewer)

        # updates to the review request
        updates = {
            "submitted": True,
            "submitted_at": submission_time,
            "crt": submission_time - review_req["requested_at"],
        }

        self.db.update(updates, review_req["key"])
        return

    # delete review request
    def delete(self, payload: dict):
        pr_num = payload["pull_request"]["number"]
        reviewer = payload["requested_reviewer"]["login"]

        review_req = self.__get_review_req(pr_num, reviewer)
        self.db.delete(review_req["key"])

    # get review requests created since date
    def get(self, created_since: str):
        # posix timestamp
        since = int(isoparse(created_since).timestamp())

    # query submitted reviews created since 'since'
        review_reqs_since_gen = next(
            self.db.fetch({"requested_at?gte": since, "submitted": True})
        )

        review_reqs_since = []
        for req in review_reqs_since_gen:
            review_reqs_since.append(req)

        return review_reqs_since

# initializing a singleton, only one instance should be used
rev_req_store = ReviewRequestStore()
Enter fullscreen mode Exit fullscreen mode

Creating or connecting to the database is only a single line of code if you use Deta Base as you can see in the constructor of the ReviewRequestStore class. It requires no pre-set up of a database.

self.db = Deta().Base("code_reviews")
Enter fullscreen mode Exit fullscreen mode

The ReviewRequestStore class offers methods to store, mark as complete, delete and retrieve the review requests from the database. These methods do the necessary processing of the github payloads to store, update and retrieve only necessary information.

Also, an instance of the class is already instantiated here as it should be a singleton. We will import this instance directly in our main.py and later for the insights.

  • Now we update our main.py to handle the payloads from github. Open main.py and update the code to the following.
from fastapi import FastAPI, Request, HTTPException

import utils
from reviews import rev_req_store

# FastAPI app
app = FastAPI()

@app.post("/webhook_events")
async def webhook_handler(request: Request):
    # verify webhook signature
    raw = await request.body()
    signature = request.headers.get("X-Hub-Signature")
    if signature != utils.calc_signature(raw):
        raise HTTPException(status_code=401, detail="Unauthorized")

    # handle events
    payload = await request.json()
    event_type = request.headers.get("X-Github-Event")

    # reviews requested or removed
    if event_type == "pull_request":
        action = payload.get("action")
        if action == "review_requested":
            # store the review request
            rev_req_store.store(payload)
        elif action == "review_request_removed":
            # delete the review request
            rev_req_store.delete(payload)        
    return "ok"

    # review submitted
    if event_type == "pull_request_review" and payload.get("action") == "submitted":
    # mark review request complete
        return "ok"

    # ignore other events
    return "ok"
Enter fullscreen mode Exit fullscreen mode

Let's deploy the latest changes.

$ deta deploy
Enter fullscreen mode Exit fullscreen mode

Generate the insights

Now we need to implement retrieving the data from the store and generating the insights with average review turnaround time. We use Bokeh for generating the HTML chart.

  • Create a file called insights.py with the following code
from datetime import datetime, timedelta
from dateutil.parser import isoparse
from statistics import mean
from math import isnan, nan

from bokeh.plotting import figure
from bokeh.resources import CDN
from bokeh.embed import file_html
from bokeh.models import HoverTool

from reviews import rev_req_store

# manages generating the insights data
class Chart:
    def __init__(self):
        # maps durations to number of days
        self.__durations = {
            "week": 7,  # number of days
            "month": 30,  # number of days
        }

        # maps periods to number of seconds
        self.__periods = {
            "day": 60 * 60 * 24, # number of seconds
            "week": 60 * 60 * 24 * 7, #number of seconds
        }

    # get submitted reviews bucketed by preiods based on duration
    def __get_insights(self, duration: str, period: str):
        if not self.__durations[duration] or not self.__periods[period]:
            raise ValueError("bad duration or period")

        since = self.__get_since(self.__durations[duration])
        submitted_reviews = rev_req_store.get(since)
        return self.__bucket_submissions(since, period, submitted_reviews)

    # convert duration into iso 8601 date format
    def __get_since(self, days: int):
        since = datetime.now() - timedelta(days=days)
        return since.isoformat()

    # bucket submitted reviews based on submission timestamp since date averaged by period
    def __bucket_submissions(self, since: str, period: str, submitted_reviews: list):
        now_posix = int(datetime.now().timestamp())
        since_posix = int(isoparse(since).timestamp())

        buckets = {}
        average_buckets = {}
        separators = []

        # separators are calculated based on period
        # for eg. if period is "day", separators are distanced by 86400 seconds
        start = since_posix + self.__periods[period]
        for start in range(since_posix, now_posix + 1, self.__periods[period]):
            buckets[start] = []
            separators.append(start)

        # fill the buckets
        for rev in submitted_reviews:
            for separator in separators:
                # the separaotrs are sorted in increasing order
                # so a simple comparision suffices here
                if separator > rev["requested_at"]:
                    buckets[separator].append(rev["crt"])
                    break

        # compute average for each bucket
        for separator in buckets:
            date = datetime.fromtimestamp(separator)
            crts = buckets[separator]
            average_buckets[date] = nan  # nan here to denote missing data for the chart
            if len(crts) != 0:
                average_buckets[date] = round(mean(buckets[separator]) / 60, 2)

        return average_buckets

    # generate html chart with bokeh
    def __generate_chart(self, buckets: dict):
        p = figure(
            title="Average code review turnarounds",
            x_axis_type="datetime",
            x_axis_label="date",
            y_axis_label="average turnaround (mins)",
            plot_height=800,
            plot_width=800,
        )
        x = list(buckets.keys())
        y = list(buckets.values())
        p.scatter(x, y, color="red")
        p.line(x, y, color="red", legend_label="moving average code review turnaround")
        return file_html(p, CDN, "Average code review turnarounds")

    # get html chart
    def get_chart(self, duration: str, period: str):
        buckets = self.__get_insights(duration, period)
        return self.__generate_chart(buckets)

    # get json of average values
    def get_json(self, duration: str, period: str):
        buckets = self.__get_insights(duration, period)
        for date in buckets:
            if isnan(buckets[date]):
                buckets[date] = 0
        return buckets
Enter fullscreen mode Exit fullscreen mode

Here we create a class Chart that manages the insights. Chart offers two main methods to get the insights, get_chart and get_json to either get an html chart or a json.

The insights are calculated based on the parameters duration and period.

The main algorithm here is to get the submitted reviews since a specific date, bucket the submitted reviews based on the period and return averages for each period.

# get submitted reviews bucketed by preiods based on duration
def __get_insights(self, duration: str, period: str):
    if not self.__durations[duration] or not self.__periods[period]:
        raise ValueError("bad duration or period")

     since = self.__get_since(self.__durations[duration])
     submitted_reviews = rev_req_store.get(since)
     return self.__bucket_submissions(since, period, submitted_reviews)

# convert duration into iso 8601 date format
def __get_since(self, days: int):
     since = datetime.now() - timedelta(days=days)
     return since.isoformat()
Enter fullscreen mode Exit fullscreen mode
 # bucket submitted reviews based on submission timestamp since date averaged by period
def __bucket_submissions(self, since: str, period: str, submitted_reviews: list):
    now_posix = int(datetime.now().timestamp())
    since_posix = int(isoparse(since).timestamp())

    buckets = {}
    average_buckets = {}
    separators = []

    # separators are calculated based on period
    # for eg. if period is "day", separators are distanced by 86400 seconds
    start = since_posix + self.__periods[period]
    for start in range(since_posix, now_posix + 1, self.__periods[period]):
        buckets[start] = []
        separators.append(start)

    # fill the buckets
    for rev in submitted_reviews:
        for separator in separators:
            # the separaotrs are sorted in increasing order
            # so a simple comparision suffices here
            if separator > rev["requested_at"]:
                buckets[separator].append(rev["crt"])
                break

    # compute average for each bucket
    for separator in buckets:
        date = datetime.fromtimestamp(separator)
        crts = buckets[separator]
        average_buckets[date] = nan  # nan here to denote missing data for the chart
        if len(crts) != 0:
           average_buckets[date] = round(mean(buckets[separator]) / 60, 2)

    return average_buckets
Enter fullscreen mode Exit fullscreen mode

The main algorithm:

  • calculate the exact time from which we need to retrieve the duration. For eg. if the duration is week, calculate the timestamp of exactly a week ago. This is the since.
  • get submitted reviews created since the since timestamp
  • divide up the time between since and now to equal intervals of period by using timestamps as separators. So, besides the first one, each separator will be period seconds higher than the previous separator.
  • bucket review submissions to the right intervals based on the submitted_at timestamp
  • calculate average review turnarounds of each bucket

Add api for getting the insights

Now that we have our insights, the final step is to implement the api that the users can get the insights from. For this we offer a GET endpoint to get the insights.

  • Open main.py and update it to the following code
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse

import utils
from reviews import rev_req_store
from insights import Chart

# FastAPI app
app = FastAPI()

# chart
chart = Chart()

## cache generated charts
CACHE_MAX_AGE = 300

@app.post("/webhook_events")
async def webhook_handler(request: Request):
    # verify webhook signature
    raw = await request.body()
    signature = request.headers.get("X-Hub-Signature")
    if signature != utils.calc_signature(raw):
        raise HTTPException(status_code=401, detail="Unauthorized")

    # handle events
    payload = await request.json()
    event_type = request.headers.get("X-Github-Event")

    # reviews requested or removed
    if event_type == "pull_request":
        action = payload.get("action")
        if action == "review_requested":
            rev_req_store.store(payload)
        elif action == "review_request_removed":
            rev_req_store.delete(payload)
        return "ok"

    # review submitted
    if event_type == "pull_request_review" and payload.get("action") == "submitted":
        rev_req_store.mark_complete(payload)
        return "ok"

    # ignore other events
    return "ok"

# get average turnaround insights
# last: for last 'x', 'x' is only one of 'week' or 'month' currently
# period: 'period to calculate average of, currently 'day' or 'week'
# plot: whether to generate a plot or not, returns json if plot is False
@app.get("/turnarounds/")
def get_turnarounds(last: str = "week", period: str = "day", plot: bool = True):
    try:
        if not plot:
            return chart.get_json(last, period)

        html_chart = chart.get_chart(last, period)
        return HTMLResponse(
            content=html_chart, headers={"Cache-Control": f"max-age={CACHE_MAX_AGE}"}
        )
    except ValueError:
        raise HTTPException(status_code=400, detail="Bad duration or period")
Enter fullscreen mode Exit fullscreen mode

We added a route /turnarounds/ to get the insights with three query parameters.

  • last:str : the duration since the request to get the average turnarounds of, only week or month supported for now, defaults to week
  • period:str : the period to calculate the average over, only day or week supported for now, defaults to day
  • plot:bool : whether to view a plot or get a json response, defaults to true

Finally, deploy the changes

$ deta deploy
Enter fullscreen mode Exit fullscreen mode

And we are done. The application should now keep track of the review turnarounds and you can easily get the insights from the api.

If you don't see the application behaving as expected, you can see real-time logs of your application in Deta Visor. To open the visor page, navigate to your micro's visor page on Deta or open it from the cli directly:

$ deta visor open 
Enter fullscreen mode Exit fullscreen mode

Deta Base also offers a UI which can be used to easily see what is stored in the database. Here's a screenshot of my base's data with completed submissions.

sample_data

The entire source code of the application can be viewed on github.

Conclusion

In a matter of few hours we created a github insights tool ourselves (instead of subscribing to an expensive enterprise solution) and deployed it to production effortlessly.

GRT can be easily tweaked and extended to enable additional features:

  • get individual insights for a reviewer
  • add other durations and periods
  • extend the api to accept from and to dates
  • return min and max turnaround times along with the average for each period
  • configure the same instance of the app to be used for multiple repositories

Deta enables developers direct their focus primarily on development and implementation of ideas and tools like grt and get them out as quickly as possible.

Top comments (0)