How To Manage Open edX Environment Variables Using Doppler and Automating The Deployment

#python #django #devops #security

Secrets are a pain to manage and If you are using Ansible to maintain your application and its components there is a better way to do it. In this article, we see how Doppler can help us to simply and quickly manage environment variables for a complex system called Open edX.
By default, to install and maintain the Open edX we use Ansible. For example, in order to change SMTP credentials, you need to modify an Ansible variables file, Then you should deploy it to your target machine and manually restart services in your server to apply the change.
The automation of this process will allow us to speed up the tedious work and save time. By hosting our variables in Doppler, reading them via API in our codebase, and automating the deployment using Doppler webhook.

Introduction

What is Open edX ?

Open edX is an open-source platform you can use to create and host online courses. It was originally developed in 2012 by MIT and Harvard University and has since been adopted by organizations of all shapes and sizes to power a wide range of online learning use cases.
It has been used by organizations and universities like Microsoft, IBM, MIT, and ASU.

How do we handle environment variables by default in Open edX ?

We use Ansible to provision and maintain our platform. If you are not familiar with Ansible, it's an open source Devops tool that automates the software provisioning and configuration. It building blocks are:

Ansible Playbooks which has 1 or multiple roles. Imaging Playbook as full instructions on how to install your software, it's components and how the set them up to work properly.
Ansible Roles. Each component in your stack has it's own role. for example if your are using Django, Nginx and MySQL in your application, each one of them has it's own role.
Variables, Each role has it's own variables. For example you should provide variables for MySQL role to define root username and password.

What is the workflow to change an environment variable ?

We have our playbooks, roles and their variables in a GitHub repo and we used Ansible vault to encrypt the variables, to prevent exposing our credentials in GitHub.
After encrypting it using vault it looks like the following:

$ANSIBLE_VAULT;1.1;AES256
61353035366436396262643237303063643839653630393261663234666461653566626130613562
.
.
.

After Open edX got provisioned by Ansible all of it's environment variables live as a plain text in the file system on Ubuntu server. The file calledlms.yml and it contains all the environment variables for our Learning Management System, for example:

EMAIL_BACKEND: django.core.mail.backends.smtp.EmailBackend
EMAIL_HOST: smtp.elasticemail.com
EMAIL_HOST_PASSWORD: ********
EMAIL_HOST_USER: ******
EMAIL_PORT: 2525
EMAIL_USE_TLS: true

Deploy our change to the server

To change a variable we need to, Decrypt the variables file in our repo, Change a variable there and Encrypt the file again and deploy it using Ansible, after a successful deployment the equivalent variable get changed in the lms.yml on the server . One extra step to apply this change is to SSH to the server and restart all the services manually.

The Problem with env files

There are multiple issues with this implementation:

If an unauthorized person have access to our server they can see all the critical credentials of our platform.
Deployment process includes too many steps specially decrypting and encrypting the variables file which a developer can easily forget to encrypt back the file and push it to GitHub, in this case all the secrets get exposed in the GitHub repository.
After successful deployment to the server we need to manually restart services.

How can Doppler help ?

Doppler can help us to address all these 3 issues.

We inject secrets in our codebase directly using Doppler, instead of creating local plain text variables file in the server
There is no need to decrypt and encrypt the variables file for each deployment, all the secrets are hosted safely in Doppler
We can use Doppler webhook, so as soon as a variable changes there we triggers CircleCI deployment to the server.

New secret management design and architecture

Before using Doppler this is how the environment management looks like

After integrating Doppler with our platform, this is how the secrets management look like.

Create environment variables in Doppler

In this article we create one environment variable in Doppler but the process is the same for all of our environment variables that we want to manage in from Doppler.

In my account I created a project called openedx with 3 environments

I created one environment variable called EMAIL_HOST_PASSWORD to manage the SMPT password. The value starts with aohRj****** This is not my real SMTP password it's only an example

Integrating Doppler into the codebase

Our application is written in Python/Django so I used Doppler API for this integration.
I integrated our codebase with Doppler, So instead of reading the EMAIL_HOST_PASSWORD from the local environment variables file it makes a call to Doppler API and gets the value for EMAIL_HOST_PASSWORD. I added the following to our codebase in the edx-platform/lms/envs/production.py file

################################### Getting Environment Variables From Doppler ###################################


url = "https://api.doppler.com/v3/configs/config/secrets"

querystring = {"project": "openedx", "config": "dev"}
DOPPLER_TOKEN = AUTH_TOKENS.get('DOPPLER_TOKEN', '')
Authorization = "Basic {DOPPLER_TOKEN}".format(DOPPLER_TOKEN=DOPPLER_TOKEN)
headers = {
    "Accept": "application/json",
    "accepts": "application/json",
    "Authorization": Authorization
}
doppler_response = requests.request("GET", url, headers=headers, params=querystring)
EMAIL_HOST_PASSWORD = doppler_response.json()['secrets']['EMAIL_HOST_PASSWORD']['raw']

and removed EMAIL_HOST_PASSWORD = AUTH_TOKENS.get('EMAIL_HOST_PASSWORD', '') from our code to prevent reading the value from local environment variables.

You can see the git diff here

Let's test it

>>> from django.conf import settings
>>> settings.EMAIL_HOST_PASSWORD
'aohRj******'

As you can see Django settings is returning the Value we set in Doppler for EMAIL_HOST_PASSWORD

Automate Deployment using Doppler Webhooks and CircleCI

After integrating our codebase with Doppler we improved the security of our platform by hosting our variables in safe way but when we change a variable in Doppler we still need to SSH to our server and manually run a command like sudo /edx/bin/supervisorctl restart all to restart all the services to get up to date values for our secrets from Doppler. To remove this manual step we can use Doppler webhooks, so as soon as there is a change in one of our variables, we make a call to CircleCI to restart all the services in our server.

CircleCI Configuration

In CircleCI, I created a project called doppler-openedx-deployer. This project has a config.yml file like following :

version: 2
jobs:
  build: 
    working_directory: ~/doppler-openedx-deployer
    docker: 
      - image: circleci/python:3.6.4
    steps: 
      - checkout
      - run: 
          name: Deploy to Open edX
          command:  ~/doppler-openedx-deployer/scripts/deploy.sh
          no_output_timeout: 30m

By running deploy.sh we download the latest version of our Ansible Playbooks and it deploys a command to Open edX server to update configuration and restart all the services.