DEV Community

Abhishek Gupta for Microsoft Azure

Posted on • Updated on

Kubernetes Volumes: the definitive guide (Part 2)

Welcome to yet another part of the "Kubernetes in a Nutshell" blog series which focuses on the “breadth” of Kubernetes and covers fundamental topics such as orchestrating Stateless apps, how to configure Kubernetes apps using ConfigMap etc. I hope you have enjoyed it so far!

This is a continuation of the previous blog which covered the basics of Kubernetes Volumes. In this part, we will notch it up a bit and:

  • Learn about PersistentVolume, PersistentVolumeClaim objects and how they work in tandem
  • Dive into types of provisioning in Kubernetes - Static, Dynamic
  • Learn about Storage Classes and how they power dynamic provisioning
  • Explore relevant examples

Pre-requisites:

To follow the example in this blog, you will need the following:

The code is available on GitHub

The previous episode....

... concluded with a discussion about "The need for persistent storage" given the fact that lifecycle of vanilla Kubernetes Volumes are tightly coupled with Pod and serious apps need stable, persistent storage which outlasts the Pod or even the Node on which the Pod is running.

Examples of long term storage medium are networked file systems (NFS, Ceph, GlusterFS etc.) or cloud based options, such as Azure Disk, Amazon EBS, GCE Persistent Disk etc.

Here is a snippet that shows how you can mount an NFS (Network File System) into your Pod using the nfs volume type. You can point to an existing NFS instance using the server attribute.

spec:
  volumes:
  - name: app-data
    nfs:
      server: nfs://localhost
      path: "/"
  containers:
  - image: myapp-docker-image
    name: myapp
    volumeMounts:
    - mountPath: /data
      name: app-data
Enter fullscreen mode Exit fullscreen mode

Is this 👆 good enough?

In the above Pod manifest, storage info (for NFS) is directly specified in the Pod (using the volumes section). This implies that the developer needs to know the details of the NFS server, including its location etc. There is definitely scope for improvement here and like most things in software, it can be done with another level of indirection or abstraction using concepts of Persistent Volume and Persistent Volume Claim.

The key idea revolves around "segregation of duties" and decoupling storage creation/management from its requirement/request. This is where PersistentVolumeClaim and PersistentVolume come into play:

  • A PersistentVolumeClaim allows a user to request for persistent storage in a "declarative" fashion by specifying the requirements (e.g. amount of storage) as a part of the PersistentVolumeClaim spec.
  • A PersistentVolume complements the PersistentVolumeClaim and represents the storage medium in the Kubernetes cluster. The actual provisioning of the storage (e.g. creation of Azure Disk using Azure CLI, Azure portal, etc.) and creation of the PersistentVolume in the cluster is typically done an administrator or in the case of Dynamic provisioning, by Kubernetes itself (to be covered later)

In addition to decoupling and segregation of duties, it also provides flexibility and portability. For e.g. you have multiple environments like dev, test, production etc. With a PersistentVolume, you declare the storage requirements once (e.g. "my app needs 5 GB") and switch the actual storage medium depending on the environment, thanks to PersistentVolumeClaim - this could be a local disk in dev env, a standard HDD in test and SSD in production. Same goes for portability in a multi-cloud scenario, where you could use the same request spec but switch the PersistentVolume as per cloud provider

The upcoming sections will cover examples to help reinforce these concepts.

Deep dive

PersistentVolumeClaim

A PersistentVolumeClaim is just another Kubernetes object (like Pod, Deployment, ConfigMap etc.). Here is an example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: volume-class
Enter fullscreen mode Exit fullscreen mode

The most important section is the spec, which is a reference to a PersistentVolumeClaimSpec object - this is where you define the storage requirements. The important attributes to focus on are:

  • resources - minimum resources that the volume requires
  • accessModes - ways the volume can be mounted (valid values are ReadWriteOnce, ReadOnlyMany, ReadWriteMany)
  • storageClassName - name of the StorageClass required by the claim (StorageClass is covered in another section)

PersistentVolumeClaim has other attributes apiVersion, kind, metadata, status. These are common to all Kubernetes objects.

PersistentVolume

This is what a typical PersistentVolume spec looks like:

  apiVersion: v1
  kind: PersistentVolume
  metadata:
    name: my-pvc
  spec:
    capacity:
      storage: 10Gi
    accessModes:
      - ReadWriteOnce
    persistentVolumeReclaimPolicy: Recycle
    storageClassName: volume-class
    nfs:
      server: nfs://localhost
      path: "/"
Enter fullscreen mode Exit fullscreen mode

Just like PersistentVolumeClaim, spec (PersistentVolumeSpec object) is the most important part of a PersistentVolume - let's dissect it further:

  • provider/storage specific - like nfs, azureDisk, gcePersistentDisk, awsElasticBlockStore etc. which allow you to provide info specific to the storage medium (NFS, Azure Disk etc.)
  • accessModes - ways in which the volume can be mounted
  • capacity - info about persistent volume's resources and capacity.
  • storageClassName - name of StorageClass to which this persistent volume belongs (StorageClass will be covered soon)
  • persistentVolumeReclaimPolicy - what happens when corresponding PersistentVolumeClaim is deleted - options are Retain, Delete, and Recycle (deprecated)

As homework, please explore the attributes nodeAffinity, volumeMode, mountOptions to determine what role they play

PersistentVolume has other attributes - apiVersion, kind, metadata, status. These are common to all Kubernetes objects.

How do these objects work together?

There are two ways in which you can use these constructs to get storage for your Kubernetes apps - Static and Dynamic.

In the "Static" mode, the user(s) need to take care of provisioning the actual storage (cloud, on-prem, etc.) and then referencing it in the Pod spec (your application)

In the "Dynamic" way, Kubernetes does the heavy lifting of the storage provisioning as well the creation of PersistentVolume. All you do is provide your storage requirements by creating and then referencing a PersistentVolumeClaim in the Pod spec

Dynamic provisioning should be enabled on a cluster - in most providers, this is done out of the box

Let's explore static provisioning

Static provisioning

There are two ways of using static provisioning:

One of them is to provision storage and use its info directly in the Pod spec

I have mentioned this already but this is the last time I will do so (in this context) and also recommend trying out the excellent tutorial on how to "Manually create and use a volume with Azure disks in Azure Kubernetes Service (AKS)". This is what it looks like (and as you've read before, this is convenient but has its limitations)

spec:
  containers:
  - image: nginx
    name: mypod
    volumeMounts:
      - name: azure
        mountPath: /mnt/azure
  volumes:
      - name: azure
        azureDisk:
          kind: Managed
          diskName: myAKSDisk
          diskURI: /subscriptions/<subscriptionID>/resourceGroups/MC_myAKSCluster_myAKSCluster_eastus/providers/Microsoft.Compute/disks/myAKSDisk
Enter fullscreen mode Exit fullscreen mode

In the second approach, instead of creating the disk and providing its details (azureDisk in this case), you encapsulate that info in a PersistentVolume. Then you create a PersistentVolumeClaim and reference it from the Pod spec and leave it to Kubernetes to match the storage requirements with what's available

Here is a snippet to give you an idea

spec:
      volumes:
      - name: app-data
        persistentVolumeClaim:
          claimName: data-volume-claim
Enter fullscreen mode Exit fullscreen mode

Think of it as refactoring a piece of logic into its own method - you take a bunch of storage request info and externalize it in the form of a PersistentVolume (analogous to a method).

Dynamic provisioning

As mentioned earlier, with Dynamic Provisioning, you can offload all the heavy lifting to Kubernetes. Before we dive in, here is a snapshot of how it works

One of the key concepts associated with dynamic provisioning is StorageClass

Storage Class

Just like a PersistentVolume encapsulates storage details, a StorageClass provides a way to describe the "classes" of storage. In order to use a StorageClass, all you need to do is reference it from the PersistentVolumeClaim.

Let's understand this practically - here is an example of a StorageClass for an Azure Disk.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  labels:
    kubernetes.io/cluster-service: "true"
  name: default
parameters:
  cachingmode: ReadOnly
  kind: Managed
  storageaccounttype: Standard_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
volumeBindingMode: Immediate
Enter fullscreen mode Exit fullscreen mode

The key parameters in a StorageClass spec are:

  • provisioner - volume plugin (details to follow) which provisions the actual storage
  • parameters - custom key value pairs which can be used be the provisioner at runtime
  • reclaimPolicy - reclaim policy with which the PersistentVolume is created (Delete the PV gets deleted when PVC get deleted, Retain to keep the PV)
  • volumeBindingMode - indicates how PersistentVolumeClaims should be provisioned and bound (valid values are Immediate and WaitForFirstConsumer)

The information in these parameters (and few others like allowVolumeExpansion, allowedTopologies, mountOptions) are used at runtime to dynamically provision the storage and create the corresponding PersistentVolume.

StorageClass has other attributes as well - apiVersion, kind, metadata. These are common to all Kubernetes objects.

What is a provisioner?

The provisioner is the heart of dynamic provisioning - it is a plugin that includes custom logic meant to create storage resources of a specific type. Kubernetes ships along with a bunch of provisioners, including cloud based ones such as Azure Disk (kubernetes.io/azure-disk), Azure File (kubernetes.io/azure-file), GCE Persistent Disk, AWS EBS etc.

In the above example,kubernetes.io/azure-disk is being used as the provisioner

The parameters section provides a means of passing information to the parameter at runtime - this is obviously specific to a provisioner. In the above example cachingmode, storageaccounttype and kind are passed as parameters to the kubernetes.io/azure-disk provisioner - this allows for a lot of flexibility.

If a parameter is not passed, a default value is used

A note on default Storage Class

A StorageClass can be marked as default such that it is used (for dynamic provisioning) when a storageClass attribute is not provided in the PersistentVolumeClaim.

Azure Kubernetes Service makes dynamic provisioning easy by including two pre-seeded storage classes. You can check the same by running kubectl get storageclass command

NAME                PROVISIONER                AGE
default (default)   kubernetes.io/azure-disk   6d10h
managed-premium     kubernetes.io/azure-disk   6d10h
Enter fullscreen mode Exit fullscreen mode

Hands-on: Dynamic provisioning

It's time to try things out Dynamic provisioning using Azure Kubernetes Service. You will create a PersistenceVolumeClaim, a simple application (Deployment) which references that claim and see how things work in practice.

If you don't have an Azure account already, now is the time to sign up for a free one and get cracking!

Kubernetes cluster setup

You need a single command to stand up a Kubernetes cluster on Azure. But, before that, we'll have to create a resource group

export AZURE_SUBSCRIPTION_ID=[to be filled]
export AZURE_RESOURCE_GROUP=[to be filled]
export AZURE_REGION=[to be filled] (e.g. southeastasia)
Enter fullscreen mode Exit fullscreen mode

Switch to your subscription and invoke az group create

az account set -s $AZURE_SUBSCRIPTION_ID
az group create -l $AZURE_REGION -n $AZURE_RESOURCE_GROUP
Enter fullscreen mode Exit fullscreen mode

You can now invoke az aks create to create the new cluster

To keep things simple, the below command creates a single node cluster. Feel free to change the specification as per your requirements

export AKS_CLUSTER_NAME=[to be filled]

az aks create --resource-group $AZURE_RESOURCE_GROUP --name $AKS_CLUSTER_NAME --node-count 1 --node-vm-size Standard_B2s --node-osdisk-size 30 --generate-ssh-keys
Enter fullscreen mode Exit fullscreen mode

Get the AKS cluster credentials using az aks get-credentials - as a result, kubectl will now point to your new cluster. You can confirm the same

az aks get-credentials --resource-group $AZURE_RESOURCE_GROUP --name $AKS_CLUSTER_NAME
kubectl get nodes
Enter fullscreen mode Exit fullscreen mode

If you are interested in learning Kubernetes and Containers using Azure, a good starting point is to use the quickstarts, tutorials and code samples in the documentation to familiarize yourself with the service. I also highly recommend checking out the 50 days Kubernetes Learning Path. Advanced users might want to refer to Kubernetes best practices or the watch some of the videos for demos, top features and technical sessions.

Create PersistentVolumeClaim followed by app Deployment

Here is the PersistentVolumeClaim spec which we will use

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-disk-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
Enter fullscreen mode Exit fullscreen mode

Notice that the PersistenceVolumeClaim did not use storageClass - this is to ensure that the default storage class is used for dynamic provisioing.

Create the PersistenceVolumeClaim

kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/volumes-2/azure-disk-pvc.yaml
Enter fullscreen mode Exit fullscreen mode

If you check it, you will see something like this (STATUS = Pending)

kubectl get pvc

NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
azure-disk-pvc   Pending                                      default        11s
Enter fullscreen mode Exit fullscreen mode

After some time, it should change to (STATUS = Bound) - this is because the Azure Disk and PersistenceVolume got created automatically

NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
azure-disk-pvc   Bound    pvc-7b0e2911-df74-11e9-93ab-025752f370d3   2Gi        RWO            default        36s
Enter fullscreen mode Exit fullscreen mode

You can check the dynamically provisioned PersistenceVolume as well - kubectl get pv

Confirm that the Azure Disk has been created

AKS_NODE_RESOURCE_GROUP=$(az aks show --resource-group $AZURE_RESOURCE_GROUP --name $AKS_CLUSTER_NAME --query nodeResourceGroup -o tsv)

az disk list -g $AKS_NODE_RESOURCE_GROUP
Enter fullscreen mode Exit fullscreen mode

The tags section will look something similar to

"tags": {
      "created-by": "kubernetes-azure-dd",
      "kubernetes.io-created-for-pv-name": "pvc-7b0e2911-df74-11e9-93ab-025752f370d3",
      "kubernetes.io-created-for-pvc-name": "azure-disk-pvc",
      "kubernetes.io-created-for-pvc-namespace": "default"
}
Enter fullscreen mode Exit fullscreen mode

Create the app Deployment

kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/volumes-2/app.yaml
Enter fullscreen mode Exit fullscreen mode

To test it out, we will use a simple Go app. All it does is push log statments to a file logz.out in /mnt/logs - this is the path which is mounted into the Pod

Wait for a while for the deployment to be in Running state

kubectl get pods -l=app=logz

NAME                               READY   STATUS    RESTARTS   AGE
logz-deployment-59b75bc786-wt98d   1/1     Running   0          15s
Enter fullscreen mode Exit fullscreen mode

To confirm, check the mnt/logs/logz.out in the Pod

kubectl exec -it $(kubectl get pods -l=app=logz --output=jsonpath={.items..metadata.name}) -- tail -f /mnt/logs/logz.out
Enter fullscreen mode Exit fullscreen mode

You will see the logs (just the timestamp) every 3 seconds

2019-09-25 09:17:11.960671937 +0000 UTC m=+84.002677518
2019-09-25 09:17:14.961347341 +0000 UTC m=+87.003352922
2019-09-25 09:17:17.960697766 +0000 UTC m=+90.002703347
2019-09-25 09:17:20.960666399 +0000 UTC m=+93.002671980
Enter fullscreen mode Exit fullscreen mode

That brings us to the end of this two-part series on Kubernetes Volumes. How did you find this article? Did you learn something from it? Did it help solve a problem, resolve that lingering query you had? 😃😃 Or maybe it needs improvement 😡 Please provide your feedback - its really valuable and I would highly appreciate it! You can reach out via Twitter or just drop in a comment right below to start a discussion.

As I mentioned earlier, this was a sub-part of the larger series of blogs in "Kubernetes in a Nutshell" and there is more to come! Please don't forget to like and follow 😉

Top comments (0)