A simple solution to improve the CloudWatch alarm to receive Lambda error details by email. Available as CDK construct or as CloudFormation.
AWS CloudWatch is not the best service for monitoring your system. It has a ton of features, but many of them are awkward to use. But at the same time, it is baked into AWS, and you do not have to look for something else. And for a large number of solutions, it is good enough.
One of the not-so-nice things is notifying of Lambda errors. Typically, you would attach an Alarm to the Lambda error metric. The Alarm would send a message to SNS, which would send it to your email or SMS. That is a relatively common way, but not the only one. One benefit of this approach is that you do not get a ton of messages. You just get one for the Alarm being in the error state.
But this notification does not contain any details on an error. You will have to open the AWS console, find the appropriate log group, open CloudWatch Logs Insights, write a query, and after a few minutes, you will have your error message, which quite possibly is an unimportant one-time occurrence of some timeout. Of course, you can simplify this process, but wouldn't it be great to get a sample of errors by email? Most of the systems for monitoring and logging, like Sentry, provide that out of the box.
I will present a simple solution for getting error messages by email. It is great for smaller projects when you do not want to configure other error-logging solutions or for older systems where you do not wish to alter them just because of that.
The solution is in two forms:
- CDK construct If you are building your system with CDK (or SST). Available for TypeScript, Java, C#, Python, and Go.
- CloudFormation For existing solutions, so you do not have to modify them. You deploy and point to the existing SNS used for CloudWatch alarms.
How does it work?
- Lambda is subscribed to the SNS topic where you receive your alarms. There is a message body subscription filter that limits errors based on the Lambda error metric. You must change the filter if you defied your metric in some other way, not the default one.
- Lambda analyzes the message, finds a log of the failed Lambda, and queries the log group for the period that the metric was configured, plus some additional safety time, so we do not miss that error message. It extracts just a number of errors that can fit in one SNS message.
- Lambda sends errors to the same SNS that you use for alerts. So, apart from the Alarm message for changing the error state, you will receive an additional one with detailed error messages.
Top comments (0)