This post looks at deploying a single Vault instance on Google Cloud using Terraform; and is intended for those who are already using Terraform and Google Cloud, and are looking into using Vault.
Introduction to Vault
Vault is a secrets management service, there are a lot of ways and benefits to using it, but essentially you no longer have to hard-code things like database passwords into your code; and you can have your apps request secrets at runtime when they need them. Think of it like a password manager that your apps/microservices can use.
The three consequences of doing this are:
- Security, you don't leave your database credentials lying around in your code or environment.
- You can automatically generate and manage passwords rather than having to copy/paste a new password by hand to the many places they may be used.
- You can add access rules to the passwords based on who's requesting them. Production apps can access production passwords, development apps can't.
An example of setting up a password automatically is spinning up an RDB instance in Google Cloud in Terraform
resource "google_sql_database_instance" "prod" {
...
}
resource "random_password" "prod" {
length = 16
}
resource "google_sql_database" "prod" {
name = "prod_db"
instance = google_sql_database_instance.prod.name
}
resource "google_sql_user" "prod" {
name = "prod_user"
instance = google_sql_database_instance.prod.name
password = random_password.prod.result
}
The above Terraform code creates a new RDB in Google Cloud, and creates a database called prod_db
, a user called prod_user
, and assigns it with a random 16-character password.
At this point, if you weren't using Vault, you'd have to peek at what that random 16-character password was, and then go configure a bunch of apps or environments to use it. However, with Vault, you could instead also have these lines in your Terraform code:
resource "vault_generic_secret" "prod_db" {
path = "secret/production/db"
data_json = jsonencode({
username = google_sql_user.prod.name
password = google_sql_user.prod.password
db = google_sql_database.prod.name
host = google_sql_database.prod.ip_address.0.ip_address
})
}
Now any app that is authorized to access secret/production/db
can fetch not only the random password that you never saw or have to deal with, but also the username, database, and even host IP of the database, all values that might from time to time change for security reasons or redeployment. You could repeat the process to set up a development db and set the secret's path at secret/development/db
. Now apps can switch between the two databases simply by changing which secret it fetches.
An example of the python client code for this is (and this is highly simplified)
import requests
role = "production"
vault_addr = "https://vault.example.com:8200"
secret_path = "production/db"
# fetch JWT from metadata server
req_jwt = requests.get(
"http://metadata/computeMetadata/v1/instance/service-accounts/default/identity",
headers={"Metadata-Flavor": "Google"},
params={"audience": f"http://vault/{role}", "format": "full"},
)
jwt = req_jwt.text
# authenticate with Vault
req_token = requests.post(
f"{vault_addr}/v1/auth/gcp/login",
json={"role": role, "jwt": jwt}),
)
token = req_token.json()["auth"]["client_token"]
# request hte secret
req_secret = requests.get(
f"{vault_addr}/v1/secret/data/{secret_path}",
headers={"x-vault-token": token},
)
secret = req_secret.json()["data"]["data"]
The first request fetches a JWT from the VM's metadata server, this JWT is this app's proof that it's running inside an authorized server. This JWT is how you avoid having to hard-code passwords, since only apps running inside the VM are able to fetch the JWT.
The second request authenticates against Vault using JWT (and Vault must have been set up beforehand to allow this authentication method) and receives a short-lived access token.
The third and final request uses this access token to fetch the secret that it need. The app can cache the token or JWT for re-use later.
The above code is simplified and doesn't include any error checking. I have a vault.py
script containing better wrapped code that I am able to include in all services as a library to give them access to Vault.
Vault also has a CLI and UI, to help you work with it manually.
Deploying Vault
There are lots of ways of deploying Vault, and Vault can run as a cluster too, and become highly available. However the easiest (and perhaps cheapest) way to do it is to run it as a program in a single VM. This is good enough for light use and testing, you can always expand it later.
I'm just going to go ahead and code-dump my Terraform code for this:
resource "google_compute_instance" "vault" {
name = "vault"
machine_type = "e2-small"
zone = "us-central1-a"
tags = [google_compute_firewall.vault.name]
metadata = {
ssh-keys = "terraform:${tls_private_key.vault.public_key_openssh}"
}
boot_disk {
initialize_params {
image = "ubuntu-minimal-2004-lts"
}
}
network_interface {
network = google_compute_network.default.name
access_config {
nat_ip = google_compute_address.vault.address
}
}
service_account {
email = google_service_account.vault.email
scopes = [ "cloud-platform" ]
}
provisioner "file" {
content = <<EOF
storage "gcs" {
bucket = "${google_storage_bucket.vault.name}"
prefix = "vault/store"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 0
tls_cert_file = "/etc/ssl/fullchain.pem"
tls_key_file = "/etc/ssl/privkey.pem"
}
disable_mlock = true
api_addr = "https://${local.vault_domain}:8200"
ui = true
log_level = "warn"
EOF
destination = "/tmp/vault.hcl"
connection {
type = "ssh"
user = "terraform"
host = self.network_interface.0.access_config.0.nat_ip
private_key = tls_private_key.vault.private_key_pem
}
}
provisioner "file" {
content = <<EOF
[Unit]
Description="HashiCorp Vault - A tool for managing secrets"
Documentation=https://www.vaultproject.io/docs/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/vault.d/vault.hcl
StartLimitIntervalSec=60
StartLimitBurst=3
[Service]
User=terraform
Group=terraform
ProtectSystem=full
ProtectHome=read-only
PrivateTmp=yes
PrivateDevices=yes
SecureBits=keep-caps
AmbientCapabilities=CAP_IPC_LOCK
Capabilities=CAP_IPC_LOCK+ep
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
NoNewPrivileges=yes
ExecStart=/usr/local/bin/vault server -config=/etc/vault.d/vault.hcl
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGINT
Restart=on-failure
RestartSec=20
TimeoutStopSec=30
StartLimitInterval=60
StartLimitIntervalSec=60
StartLimitBurst=3
LimitNOFILE=65536
LimitMEMLOCK=infinity
[Install]
WantedBy=multi-user.target
EOF
destination = "/tmp/vault.service"
connection {
type = "ssh"
user = "terraform"
host = self.network_interface.0.access_config.0.nat_ip
private_key = tls_private_key.vault.private_key_pem
}
}
provisioner "file" {
content = tls_private_key.cert_private_key.private_key_pem
destination = "/tmp/privkey.pem"
connection {
type = "ssh"
user = "terraform"
host = self.network_interface.0.access_config.0.nat_ip
private_key = tls_private_key.vault.private_key_pem
}
}
provisioner "file" {
content = "${acme_certificate.certificate.certificate_pem}${acme_certificate.certificate.issuer_pem}"
destination = "/tmp/fullchain.pem"
connection {
type = "ssh"
user = "terraform"
host = self.network_interface.0.access_config.0.nat_ip
private_key = tls_private_key.vault.private_key_pem
}
}
provisioner "remote-exec" {
# Install docker
inline = [
"echo '*** Install vault ***'",
"sudo apt update && sudo apt install -y unzip",
"cd /tmp",
"wget https://releases.hashicorp.com/vault/${local.vault_version }/vault_${local.vault_version}_linux_amd64.zip -O vault.zip && unzip vault.zip && rm vault.zip",
"sudo mv vault /usr/local/bin/vault && chmod +x /usr/local/bin/vault",
"echo '*** Move files to right place ***'",
"sudo mkdir -p /etc/ssl",
"sudo mv /tmp/*.pem /etc/ssl",
"echo '*** Start vault service ***'",
"sudo mkdir -p /etc/vault.d",
"sudo mv /tmp/vault.hcl /etc/vault.d",
"sudo mv /tmp/vault.service /etc/systemd/system/vault.service",
"sudo systemctl enable vault",
"sudo systemctl start vault"
]
connection {
type = "ssh"
user = "terraform"
host = self.network_interface.0.access_config.0.nat_ip
private_key = tls_private_key.vault.private_key_pem
}
}
}
It's a bit of a mouthful, and could be simplified and modularized.
The two file provisioners should be loading in templates from a file rather than being defined in-line to keep the code shorter, but I've condensed it to make the post easier to follow.
I'll try to explain what's going on here. Most of the top half are standard stuff for deploying Google VMs with Terraform, it's creating an e2-small
instance in us-central1-a
using an Ubuntu 20.04 image. We're generating a temporary SSH key so that terraform can go in and do some light deployment work (actual lines for doing that is below).
The first provisioner "file"
block deploys a config file for terraform. This config file contains the storage bucket that it needs to access, and the TLS/SSL and other config.
The second provisioner "file"
is the systemd service definition for Terraform, it's mostly boilerplate.
The third and fourth provisioners are for copying in the SSL certs for HTTPS. The way I have these set up are similar to my previous post:
Automating fetching of wildcard LetsEncrypt HTTPS certificates for your domain with Terraform
Yuan Gao ・ Sep 13 '20 ・ 6 min read
Next, the provisioner "remote_exec"
completes the setup - it runs an apt update and installs unzip
; downloads Vault from hashicorp servers, and installs it; then moves all the rest of the files from the /tmp folder where they were dumped (mostly due to file permission issues); and then enables and runs the service.
The above script does depend on a few other lines, which I've separated out to keep things readable:
locals {
vault_version = "1.5.3"
vault_domain = "vault.example.com"
}
These are some local variables, they make it easier to pick the Vault version you want to install. And where vault will be when it's fully deployed.
resource "tls_private_key" "vault" {
algorithm = "RSA"
}
This generates an SSH key for accessing Vault, it's pretty boilerplate for Terraform VM deployment
resource "google_compute_address" "vault" {
name = "vault"
}
This reserves an IP address for the vault VM instance. This isn't strictly needed, a VM without an IP address reservation will by default acquire an ephemeral one. However one practicality is that if you have to tear down the Vault VM for any reason and create a new one, it'll receive a new IP address, which will cause a DNS update, which will take a few minutes to propagate, leaving you with some downtime. It's a bit better to reserve the IP.
resource "google_compute_firewall" "vault" {
name = "vault"
network = google_compute_network.default.name
allow {
protocol = "tcp"
ports = ["9800"]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["vault"]
}
This is the firewall setting for allowing access to Vault
resource "google_service_account" "vault" {
account_id = "vault"
display_name = "Vault Service Account"
description = "For vault state storage backend"
}
resource "google_service_account_key" "vault" {
service_account_id = google_service_account.vault.name
}
resource "google_project_iam_member" "vault-iam" {
project = data.google_project.project.project_id
role = "roles/iam.serviceAccountUser"
member = "serviceAccount:${google_service_account.vault.email}"
}
This is vault's service account. Vault needs to use this service account for two reasons: accessing its storage bucket, and authenticating GCP-generated JWTs. The service key is needed later.
resource "google_storage_bucket" "vault" {
project = google_project.project.project_id
name = "<name of your vault storage bucket>"
location = "us-central1"
uniform_bucket_level_access = true
}
resource "google_storage_bucket_iam_member" "vault" {
bucket = google_storage_bucket.vault.name
role = "roles/storage.admin"
member = "serviceAccount:${google_service_account.vault.email}"
}
This creates a storage bucket for Vault's secrets. For more information about how Vault encrypts secrets, see their documentation
If you were to run the above, and assuming you have all the rest of the resources set up, this will spin up and install a Vault instance ready to go. Please go through the HCL closely, you'll find references to data I haven't talked about in this post, including things like network configs, SSL certificates, etc.
Configuring Vault
Once deployed, Vault is in a blank slate, including not having any keys generated. At this point you probably want to get to the instance first and generate the root keys before someone else does. You can either do it through the CLI, or through the web UI, the actual act of configuring and operating Vault is outside the scope of this post, which covers only examples of making a single-instance deployment in google cloud with terraform. For usage details, I suggest checking out the many tutorials on their site
Once you generate the keys, you need to "unseal" Vault before configuring.
Configuring Vault can be done manually using the CLI or UI, but it's much easier to do it through Terraform. The slight complexity of configuring Vault that was deployed in a VM using Terraform is that there's a manual step in which you need to go set up Vault manually after the VM goes live, and before the rest of the Vault-related Terraform code can run. This can cause some confusing deadlocks in the terraform code, there are a few ways around it, using manual waits, and depends_on
but I've yet to find a good way, I've usually separated the config into a separate Terraform workspace, or just contended with the errors while I go manually unseal.
Exactly how you want to configure Vault is going to be down to your requirements, but here's some examples of what I use:
provider "vault" {
address = "https://${var.vault_domain}:8200"
}
You first need the provider set up. See documentation for more details, as you first need your token provided either here, or as part of your environment.
resource "vault_policy" "prod_read" {
name = "prod_read"
policy = <<EOF
path "secrets" {
capabilities = ["list"]
}
path "secret/*" {
capabilities = ["read"]
}
path "secret/prod/*" {
capabilities = [ "read", "list"]
}
EOF
}
Here's a Policy for roles that can read production secrets
resource "vault_gcp_auth_backend" "vault_gcp" {
credentials = base64decode(google_service_account_key.vault.private_key)
}
resource "vault_gcp_auth_backend_role" "production" {
role = "production"
type = "iam"
bound_projects = [ data.google_project.project.project_id ]
bound_service_accounts = [ google_service_account.production.email ]
token_policies = [ "secrets_reader" ]
max_jwt_exp = 3600
}
This enables the GCP JWT authentication mentioned earlier. The private_key is the one generated above. Note the bound_projects
and bound_service_accounts
you'll want to add the service accounts from which JWTs are accepted.
There are lots of additional configuration which fall outside the scope of this blog post, but hopefully, this blog gives some guidance on how to set up and operate a single-instance Vault automatically for testing and evaluation or small-scale deployment.
Cover Photo by Jason Dent on Unsplash
Top comments (0)