MongoDB Atlas can provide 2 types of logs:
- Process Logs also known as server logs, these logs are printed in JSON. Process log can include entries such as issues, connections, etc. Messages are similar to this:
{"t":{"$date":"2020-05-01T15:16:17.180+00:00"},"s":"I", "c":"NETWORK", "id":12345, "ctx":"listener", "msg":"Listening on","attr":{"address":"127.0.0.1"}}
Source: MongoDB LOG MESSAGES
- Audit Logs these logs must be enabled and allows to audit any users actions inside the mongoDB Atlas cluster, like issued command, source IP, etc.
Information about Enabling auditing logs
These two types of logs has several aspects to consider:
- Logs are only available for 30 days, so in case you need more logs you need to have a process in place that can download these logs files. You can manually download these files, but let be honest, you will forget to do it from time to time.
- Logs are downloaded in .gz
- Logs are generated for each node inside the cluster, if you have a cluster with 3 nodes you will have 3 log files. The bigger log file will be the one corresponding to the master node.
- You must required project read access at a minimum to download log files, but in order to enable audit logs you require at least project owner.
- Cluster types M0 and M2/M5 does not provide downloadable logs, so you cannot replicate these blog entry on your sandbox or free cluster.
More about manually downloading log files from MongoDB Atlas HERE
As we talked before...
Downloading each log file on daily basis can be time consuming at least, and a really difficult to maintain activity.
THe simplest thing you can do is automate this file download using an script, and it is what we are going to do. We are going to suppose the following:
- We have our cluster on MongoDB Atlas.
- We are using AWS as a cloud provider. Maybe in the future i will adapt this code for other cloud providers, but this time we are going to use some services from AWS.
- THis is a simple script that will run on a daily basis inside a Linux EC2 Instance.
- The user will assing an Instance profile so our EC2 Instance will have access to the following AWS Resources:
- AWS S3 to store the log files
- AWS Parameter Store to store the key that allows us to connect to mongo DB Atlas cluster.
- MOngoDB Atlas has a connection with your AWS Infrastructure using one of the following:
But first, you need some pre requisites
- Create the EC2 Instance and install
- Create a S3 Bucket where logs are going to be stored.
- Create a PArameter store to keep the MOngo Atlas key.
- Define an instance profile with the following permissions:
s3:Read*
s3:Write*
"ssm:GetParameters"
Note Remember to use the specific ARN of your resources, when providing the resource.
- Generate an API KEY to connect to your cluster
And assign the permissions:
Store the public key and private key generated inside your parameter stores. REmember to store the values in a key:value fashion.
{
cluster_id: "your cluster id",
public_key: "your public api key from atlas",
private_key: "your private api key from atlas"
}
NOw with all these in place...
Let's move to our code:
#! /bin/bash -e
CLUSTERID=$(aws ssm get-parameters --names "mongodb-atlas-key" --with-decryption --query 'Parameters[*].Value' --output text | grep "cluster_id" | cut -f2 -d ":" | cut -d "\"" -f2)
PUBLICKEY=$(aws ssm get-parameters --names "mongodb-atlas-key" --with-decryption --query 'Parameters[*].Value' --output text | grep "public_key" | cut -f2 -d ":" | cut -d "\"" -f2)
PRIVATEKEY=$(aws ssm get-parameters --names "mongodb-atlas-key" --with-decryption --query 'Parameters[*].Value' --output text | grep "private_key" | cut -f2 -d ":" | cut -d "\"" -f2)
CURRENTDATE=`date +%Y%m%d`
NOW=`date '+%F_%H:%M:%S'`
declare -a StringArray=("node-00-00.snvtr.mongodb.net" "node-00-01.snvtr.mongodb.net" "node-00-02.snvtr.mongodb.net")
for hostname in ${StringArray[@]}; do
echo "Obtaining logs from ${hostname}"
curl --user ${PUBLICKEY}:${PRIVATEKEY} --digest \
--header 'Accept: application/gzip' \
--request GET "https://cloud.mongodb.com/api/atlas/v1.0/groups/${CLUSTERID}/clusters/${hostname}/logs/mongodb.gz" \
--output "mongodb-${hostname}-${CURRENTDATE}.gz"
echo "Uploading logs from ${hostname}"
aws s3 mv "mongodb-${hostname}-${CURRENTDATE}.gz" s3://mongodb-logs/mongodblogs/${NOW}/mongodb-${hostname}-${CURRENTDATE}.gz
done
for hostname in ${StringArray[@]}; do
echo "Obtaining Audit logs from ${hostname}"
curl --user ${PUBLICKEY}:${PRIVATEKEY} --digest \
--header 'Accept: application/gzip' \
--request GET "https://cloud.mongodb.com/api/atlas/v1.0/groups/${CLUSTERID}/clusters/${hostname}/logs/mongodb-audit-log.gz" \
--output "mongodb-audit-log-${hostname}-${CURRENTDATE}.gz"
echo "Uploading Audit logs from ${hostname}"
aws s3 mv "mongodb-audit-log-${hostname}-${CURRENTDATE}.gz" s3://mongodb-logs/mongodblogs/${NOW}/mongodb-audit-log-${hostname}-${CURRENTDATE}.gz
done
echo ""
echo "End of script execution..."
Let's see each part
#! /bin/bash -e
CLUSTERID=$(aws ssm get-parameters --names "mongodb-atlas-key" --with-decryption --query 'Parameters[*].Value' --output text | grep "cluster_id" | cut -f2 -d ":" | cut -d "\"" -f2)
PUBLICKEY=$(aws ssm get-parameters --names "mongodb-atlas-key" --with-decryption --query 'Parameters[*].Value' --output text | grep "public_key" | cut -f2 -d ":" | cut -d "\"" -f2)
PRIVATEKEY=$(aws ssm get-parameters --names "mongodb-atlas-key" --with-decryption --query 'Parameters[*].Value' --output text | grep "private_key" | cut -f2 -d ":" | cut -d "\"" -f2)
CURRENTDATE=`date +%Y%m%d`
NOW=`date '+%F_%H:%M:%S'`
declare -a StringArray=("node-00-00.snvtr.mongodb.net" "node-00-01.snvtr.mongodb.net" "node-00-02.snvtr.mongodb.net")
Under this section, we are declaring variables:
- CLUSTERID: this value is obtained directly from the parameter store, the value is decrypted and cut.
- PUBLICKEY: this value is obtained from the parameter store, decrypted and assigned to the variable.
- PRIVATEKEY: this value is obtained from the parameter store using the aws cli command
aws ssm get-parameters
- CURRENTDATE and NOW, are variables that we set to store the log files inside the s3 bucket.
- We also declare the array of cluster nodes names. YOu could also store this values in another parameter store, but as this is not critical you can download write it here.
As you may noticed, these approach allows us to keep critical information out of the code, and in case this information changes you will only need to make the change inside the parameter store. NEw values will be obtain on new executions.
for hostname in ${StringArray[@]}; do
echo "Obtaining logs from ${hostname}"
curl --user ${PUBLICKEY}:${PRIVATEKEY} --digest \
--header 'Accept: application/gzip' \
--request GET "https://cloud.mongodb.com/api/atlas/v1.0/groups/${CLUSTERID}/clusters/${hostname}/logs/mongodb.gz" \
--output "mongodb-${hostname}-${CURRENTDATE}.gz"
echo "Uploading logs from ${hostname}"
aws s3 mv "mongodb-${hostname}-${CURRENTDATE}.gz" s3://mongodb-logs/mongodblogs/${NOW}/mongodb-${hostname}-${CURRENTDATE}.gz
done
THis for loop will go through the node names and download the process logs
THen it will store them locally in the instance, and after that it will move them to the s3 bucket using the date and the hostname.
BAsically, we are making an API CALL to mongoDB Atlas, and we are authenticating ourself using the key generated.
for hostname in ${StringArray[@]}; do
echo "Obtaining Audit logs from ${hostname}"
curl --user ${PUBLICKEY}:${PRIVATEKEY} --digest \
--header 'Accept: application/gzip' \
--request GET "https://cloud.mongodb.com/api/atlas/v1.0/groups/${CLUSTERID}/clusters/${hostname}/logs/mongodb-audit-log.gz" \
--output "mongodb-audit-log-${hostname}-${CURRENTDATE}.gz"
echo "Uploading Audit logs from ${hostname}"
aws s3 mv "mongodb-audit-log-${hostname}-${CURRENTDATE}.gz" s3://mongodb-logs/mongodblogs/${NOW}/mongodb-audit-log-${hostname}-${CURRENTDATE}.gz
done
THis for loop will go through the node names and download the audit logs
THen it will store them locally in the instance, and after that it will move them to the s3 bucket using the date and the hostname.
You can grab this script and put it inside a cronjob to be executed every day.
We are not defining the time frame for the logs, by default is 24 hours, by executing this script daily we are donwloading logs from the last 24 hours.
Top comments (0)