Dean

Posted on Sep 22, 2021 • Originally published at veducate.co.uk on Aug 30, 2021

MongoDB Container data loss issue – A Journey

#kubernetes #bitnami #dataloss #mongodb

Over the past month or so I noticed an issue with my Pac-Man Kubernetes application, which I use for demonstrations as a basic app front-end that writes to a database back end, running in Kubernetes.

When I restored my instances using Kasten, my Pac-Man high scores were missing.
This issue happened when I made some changes to my deployment files to configure authentication to the MongoDB using environment variables in my deployment file.

This blog post is a detail walk-through of the steps I took to troubleshoot the issue, and then rectify it!

Summary if you don’t want to read the post

If you are not looking to read through this blog post, here is the summary:

I changed MongoDB images, I needed to configure a new mount point location to match the MongoDB configuration
New MongoDB image is non-root, so had to use an Init container to configure the permissions on the PV first

Overview of the application

The application is made up of the following components:

Namespace
Deployment
- MongoDB Pod
- DB Authentication configured
- Attached to a PVC
- Pac-Man Pod
- Nodejs web front end that connects back to the MongoDB Pod by looking for the Pod DNS address internally.
RBAC Configuration for Pod Security and Service Account
Secret which holds the data for the MongoDB Usernames and Passwords to be configured
Service
- Type: LoadBalancer
- Used to balance traffic to the Pac-Man Pods

Confirming the behaviour

The behaviour I was seeing when my application was deployed:

Pac-Man web page – I could save a high score, and it would show in the high scores list
- This showed the connectivity to the database was working, as the app would hang if it could not write to the database.
I would protect my application using Kasten. When I deleted the namespace, and restored everything, my application would be running, but there was no high scores to show.
This was apparent from deploying the branch version v0.5.0 and v0.5.1 from my GitHub.
Deploying the branch v0.2.0 would not product the same behaviour
- This configuration did not have any database authentication setup, meaning MongoDB was open to the world if they could connect without a UN/Password.

Testing the Behaviour

First, I deployed my branch v0.2.0 code. I saved some high scores, backed up the namespace and artifacts. I then restored everything, and it worked.

I connected to the shell of my container to look at what was happening.

kubectl exec {podname} -n {namespace} -it -- {cmd}

From here, I could see my mount point listed correct, and when browsing the mount point, I could see the expected files from MongoDB stored.

    spec:
      serviceAccount: pacman-sa
      containers:
      - image: mongo
        name: mongo
        ports:
        - name: mongo
          containerPort: 27017
        volumeMounts:
          - name: mongo-db
            mountPath: /data/db
      volumes:
        - name: mongo-db
          persistentVolumeClaim:
            claimName: mongo-storage

Next, I deleted this namespace, and redeployed using my branch v0.5.1 code. Ran a game of Pac-Man and saved the high score. Once again this looked to have committed fine. Backup data, kill namespace, and restore using Kasten.

I run a shell to the pod and browse the mount point again. There is no data.

Ok, so MongoDB is not writing the data to file, which means it’s storing the data in memory for some reason.

The next steps I took to confirm behaviour:

Restore only the Persistent Volume and connect a test pod to the PV.

apiVersion: v1
kind: Pod
metadata:
  name: task-pv-pod
  namespace: pacman
spec:
  volumes:
    - name: mongo-storage
      persistentVolumeClaim:
        claimName: mongo-storage
  containers:
    - name: task-pv-container
      image: alpine:latest
      command:
        - /bin/sh
        - "-c"
        - "sleep 60m"
      volumeMounts:
        - mountPath: "/data"
          name: mongo-storage

For the v0.2.0 deployment, this was as expected, the data is there.
For the v0.5.1 deployment, there is no data.

I deployed the both versions again and dropped the Kasten backup/restore steps.

Deploy the version of code
Play Pac-Man, save highscore
Set Mongo Deployment replicas to zero
Spin up a test pod and connect to the PVC/PV.

Confirmed same behaviour.

A few other checks I ran to ensure the volumes were being mounted correctly:

kubectl get pod,vpc,pv -n pacman

NAME READY STATUS RESTARTS AGE
pod/mongo-bdbcc7c7f-hlz6r 1/1 Running 0 77m
pod/pacman-5dd85445bc-bvqv9 1/1 Running 1 2d3h

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/mongo-storage Bound pvc-36fac4ef-a09a-4cd2-b03f-eaf09c442768 1Gi RWO csi-sc-vmc 2d3h

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-36fac4ef-a09a-4cd2-b03f-eaf09c442768 1Gi RWO Delete Bound pacman-052/mongo-storage csi-sc-vmc 2d3h

The Pac-Man NodeJS container also has some basic logging, we could see here the a successful insert of a new high score to the database.

kubectl get logs pacman-5dd85445bc-bvqv9 -n pacman

> pacman@0.0.1 start /usr/src/app
> node .

Listening on port 8080
Connected to database server successfully
Time: Thu Aug 26 2021 16:20:02 GMT+0000 (UTC)
[GET /highscores/list]
Time: Thu Aug 26 2021 16:20:02 GMT+0000 (UTC)
[GET /loc/metadata]
[getHost]
HOST: pacman-5dd85445bc-bvqv9
getCloudMetadata
getK8sCloudMetadata
Querying tkg-wld-01-md-0-54598b8d99-89498 for cloud data
Request Failed.
Status Code: 403
getAWSCloudMetadata
Time: Thu Aug 26 2021 16:20:02 GMT+0000 (UTC)
[GET /user/id]
Successfully inserted new user ID = 6127bf321c074a0011281673
Time: Thu Aug 26 2021 16:20:14 GMT+0000 (UTC)
[POST /highscores] body = { name: '052',
 cloud: '',
 zone: '',
 host: '',
 score: '100',
 level: '1' } host = 192.168.200.51 user-agent = Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36 referer = [http://192.168.200.51/](http://192.168.200.51/)
Successfully inserted highscore
problem with request: connect ETIMEDOUT 169.254.169.254:80
getAzureCloudMetadata
problem with request: connect ETIMEDOUT 169.254.169.254:80
getGCPCloudMetadata
problem with request: getaddrinfo ENOTFOUND metadata.google.internal metadata.google.internal:80
getOpenStackCloudMetadata
problem with request: connect ETIMEDOUT 169.254.169.254:80
CLOUD: unknown
ZONE: unknown
HOST: pacman-5dd85445bc-bvqv9

And then finally, I checked to see the high score in Mongo by getting a shell to the Mongo container (command above):

@mongo-bdbcc7c7f-hlz6r:/data/db$ mongo 127.0.0.1:27017/pacman -u blinky -p pinky
MongoDB shell version v4.4.8
connecting to: [mongodb://127.0.0.1:27017/pacman?compressors=disabled&gssapiServiceName=mongodb](mongodb://127.0.0.1:27017/pacman?compressors=disabled&gssapiServiceName=mongodb)
Implicit session: session { "id" : UUID("a839cb26-0d6e-41ef-a730-c82ccfd3897d") }
MongoDB server version: 4.4.8
> show dbs
pacman 0.000GB
> use pacman
switched to db pacman
> show collections
highscore
userstats
> coll = db.highscore
pacman.highscore
> coll.find()
{ "_id" : ObjectId("6127bf3e1c074a0011281674"), "name" : "052", "cloud" : "", "zone" : "", "host" : "", "score" : 100, "level" : 1, "date" : "Thu Aug 26 2021 16:20:14 GMT+0000 (UTC)", "referer" : "[http://192.168.200.51/](http://192.168.200.51/)", "user_agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36", "hostname" : "192.168.200.51", "ip_addr" : "::ffff:100.96.2.1" }
>

Attempting to fix the issue by changing container image

After discussing the issue with a few people in “virtual passing” (because there’s no more corridor discussions when you work from home). I decided to mix things up and change the image, as everything else in the YAMLs looks correct, just MongoDB isn’t writing to disk, maybe it’s a bug in the version in use, plus it was MongoDB 3.6, I should probably try a newer release if possible.

With that, I looked at the official Mongo container, but it’s packaging is pretty pants, in terms of initialising it for first use and available options.

I decided to move the image to the Bitnami MongoDB image.

Moving to the Bitnami Image

I moved over to the Bitnami MongoDB image, the read me file in GitHub is well produced.

I just swapped out the image in my YAML and expected it to work. I did not. Same behaviour.

I consulted another friend on the issue, and he asked one simple question, and everything fell into place:

“Can you check the mongodb config file and make sure the data source is /data/db?”

So off I went to google where the config file is located on the container image, (rather than you know, pay attention to the documentation). So that I could check the default location of where it expects the mount point to be for the storing the database files.

# Default MongoDB Config file for Bitnami image
/opt/bitnami/mongodb/conf/

# If you are providing your own config file, use a mount point here
/bitnami/mongodb/conf

Anyhow low and behold, the default path for the database files in the Bitnami image is:

/bitnami/mongodb/data/db

I also verified the issue by looking at the logs on the container:

{"t":{"$date":"2021-08-26T20:41:33.593+00:00"},"s":"E", "c":"STORAGE", "id":20557, "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"IllegalOperation: Attempted to create a lock file on a read-only directory: /bitnami/mongodb/data/db"}}

Fixing the volume mount issue and nearly winning

So I changed my Deployment file to the correct Volume Mount Point, and redeployed. This time I went straight to the logs, and I saw another error:

# k logs mongo-9c9dcf58d-47rf6 
mongodb 20:49:49.44 
mongodb 20:49:49.44 Welcome to the Bitnami mongodb container
mongodb 20:49:49.45 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mongodb
mongodb 20:49:49.45 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mongodb/issues
mongodb 20:49:49.45 
mongodb 20:49:49.45 INFO ==> **Starting MongoDB setup**
mongodb 20:49:49.47 INFO ==> Validating settings in MONGODB_* env vars...
mongodb 20:49:49.48 INFO ==> Initializing MongoDB...
mongodb 20:49:49.50 INFO ==> Deploying MongoDB from scratch...
mkdir: cannot create directory '/bitnami/mongodb/data/db': Permission denied

OK this isn’t good! Another hurdle to jump through.

Fixing the Permission Issue

The Bitnami MongoDB container image is a non-root image, meaning it doesn’t have the writes to set its permissions on the mounted file system. This is provide a more secure deployment. And helpfully I found listed in this Bitnami documentation, which also pointed me to the fix > Init Container.

If you deploy the Bitnami MongoDB image using helm, the deployment uses an Init Container to run the necessary root level commands to prepare the environment, in this case my Persistent Volume, before running the main container. An Init Container is short lived for its prescribed task.

So I cheated ever so slightly, I ran a Helm deployment of the Bitnami image, and looking at how they were achieving this using an Init container, and anything else I might have missed (by this point my files were pretty complete unless I wanted to add some liveness probes).

helm repo add bitnami https://charts.bitnami.com/bitnami

helm install bitmongotest bitnami/mongodb --set volumePermissions.enabled=true

I then cloned over the Init Container details to my deployment files, taking careful note to change things like the Service Accounts referenced and the PVC names.

Wrap Up

After using the Init Container to set the permissions, I found all my testing successful once again.

During this process I did actually realise by the time I hit the Bitnami mount point issue, what my issue was with Original MongoDB with Auth deployment was (in branch v0.5.0). The same thing, the volume mount point. I was using a different image of Mongo in this commit, as setting up Auth was a lot easier in this version, for the same reasons mentioned earlier in the post about using the Official MongoDB container.

Example of correct Centos/mongodb-36-centos-7 mount point

        volumeMounts:
        - mountPath: /var/lib/mongodb/data
          name: mongodb-data

However, I decided to continue down the Bitnami MongoDB image path by this point, as I wanted to use a newer version of MongoDB. I put my issues down to experience, as I develop my skills and knowledge of Kubernetes, and applications themselves. If I had taken a step back to thing about things logically, I might have hit earlier on the point that maybe the DB configuration had the wrong location to store the data.

Hopefully this blog post is useful anyone reading, I just wanted to document out my troubleshooting steps, and what I tested. Who knows, I might forget all this, and encounter the same issue again, and find my blog whilst googling (it’s happened before).

I’ve updated my GitHub Repo, and everything from this post is captured as the working output in Branch v0.5.2.

Regards

Follow @Saintdle

Dean Lewis

The post MongoDB Container data loss issue – A Journey appeared first on vEducate.co.uk.

DEV Community