Working on a SaaS product you’ll occasionally have to take administrative actions to assist a user or help them debug an issue. Maybe something is misconfigured and they need a hand, or possibly they just want another week in their trial. If you’re using something like Django, then you get a nice built-in admin UI based on your model definition, and that’s it – you have a great tool at your disposal by default. Even though you still need to carefully design the access flow, as they otherwise commonly become attack vectors.
If however you’re building a project in a custom way akin to “let’s combine many libraries and build exactly what we need” instead, then all you’ll have is the raw query language interface of your database – and that only if you’re lucky enough to have chosen a database which supports non-trivial querying.
In the long run though, as most developers know, running SQL queries on prod isn’t really a scalable or secure approach and requires significant coordination, communication and supervision in order not to accidentally corrupt your data. You also have to limit access to a few select individuals and have to manage access to sensitive data.
Moreover, you might have other dependencies than the SQL database. Then administering all the third-party services would require you to jump around various admin UIs, making everything even more complicated and error prone…
The Obvious™ Solution
There’s a whole class of tools tailored specifically to solving this use case – WYSIWYG back office tools. Using one of these tools you can easily create dialogs with SQL queries underneath and connect them directly to your database. Depending on the tool, it might even be usable by non-technical people.
Unfortunately, these tools also bring a whole class of problems with them, mainly access management – both for people, and for the tool itself. You can use a cloud-based, managed tool, but then you have to give it arbitrary querying access to your internal production database. This usually makes sense, but is a security consideration that you have to take into account. You can also use self-hosted open source tools, but these come with their own set of trade-offs:
- You have to self-host them, costing you precious time.
- They need to have access to your production database, and people in your organization need to have access to them. So you either make them publicly accessible – and trust their security – or you make them internally accessible, and set up limited internal network access for everybody who needs to use them.
- Access-management in the tool itself.
Other than that, one common disadvantage is that they’re yet another tool you’ll have to use in your day-to-day work. They are also usually optimized for working with a SQL database, not additional third-party services.
After comparing various solutions on both sides of the cloud/open-source divide, we’ve decided we didn’t like either set of trade-offs, which put a hold on the whole initiative for a while.
Then we had an idea… What’s the most popular DevOps administration and management tool? Slack of course! I’m slightly joking, but the ChatOps trend’s significance cannot be denied. In this case, it did indeed look like it could solve all our problems.
Why Slack?
Slack is great for a few reasons:
- It’s a textual interface – almost a terminal really – so you can transfer a lot of UX knowledge from your terminal experiences. Developers are always happy to use a text interface.
- You already manage access to Slack channels, so you can piggyback on top of that and only make your tool available in select channels. This way you don’t have to build additional access management for your tooling.
- If you design the UX well, it’s very easy to use for non-technical users.
- You automatically have a public audit log of all executed commands – the channel message history.
- You already use Slack.
It does have its disadvantages too:
- When Slack is dead (which hasn’t been a rare occurrence in recent times), you can’t access your backoffice tool. But that’s ok, as long as you keep your most critical maintenance commands available through alternative access channels.
Now for some Slack technicalities. For a project like this you can use slash commands or a slackbot. The gist of it is that slash commands are more structured, but also more limited. Slackbots on the other hand just interact with your channels like a normal user would, so the flexibility is limitless, but it requires more work on your side. In order to provide an experience that’s as user-friendly and magical as possible, we went with the slackbot approach.
And thus the Backoffice Bot was born…
Building the Slackbot
Internally, we already had an event handling system in place for application events – i.e. GitHub push notifications. We actually even have a Slack integration, so there was ample opportunity for copy-paste driven development.
In practice though, it’s pretty simple. There’s an AWS API Gateway endpoint to handle Slack webhooks, it puts events on an SQS queue and then a handler which has access to all relevant production systems takes care of handling each message and potentially responding to it.
One more advantage of the slackbot using our existing event handling framework, is that we not only have access to our SQL database, we also have access to any third-party products we use, like various AWS offerings.
The Router Structure
All commands are registered with a message pattern, which is then converted into a regular expression with capture groups for the arguments. Here is an example code block specifying a command with its handler:
{
Command: "set <subdomain> trial remaining to <number> days",
Description: "Set account to be on an Enterprise trial for a specified number of days from now.",
Channels: []string{s.BackofficeChannelID},
Handler: func(ctx *Context, event slackevents.EventsAPIInnerEvent, params map[string]string) error {
ctx.sendSimpleResponse("Setting %s trial remaining to %s days.", params[“subdomain”], params[“number”])
// ...
},
},
Initially we planned to make it a sophisticated trie-based router, but that’s just immensely more complex than going over a list of regular expressions and trying to match each. With this amount of traffic and number of commands there was simply no point in optimizing further.
This way of adding commands is really simple and there’s a bonus – an automatically generated @Backoffice Bot help
message, which dynamically lists and describes all commands available in the current channel.
Approval Flow
With processes like these, oftentime you want certain commands to require approval. We do this by using Slack reactions.
The Slackbot will first detail what it would like to do (dry-run so to say), and then ask for approval. Approval is required from the caller, as well as a predefined number of other people, which depends on the “destructiveness” of the command at hand.
In order not to double-execute commands, we’re marking handled messages using a Slack reaction – this way we don’t have to use an external database. You can see this in the above picture with the Spacelift logo reaction on the root message.
Access Control
One additional requirement was that different commands are allowed to be executed by different people. In our case there are commands which are built for non-technical people, for example extending account trials; as well as commands built for technical people, for example displaying diagnostic information or cycling a worker pool’s workers.
We achieve this by limiting the available commands based on the channel. We have a #backoffice channel for non-technical people and a #backoffice-developers channel for technical people. They are private, so we can limit access levels by inviting select people to relevant channels.
The help message takes this into account, and only shows the commands available to you in the current channel.
How does it work in practice
Everybody using our new Backoffice Bot has quickly fallen in love with it. Adding new commands follows the same process as getting standard product changes into production, which is familiar to all developers out of the box. There are no additional tools, workflows, or processes to maintain. We definitely recommend this approach if you ever find yourself having similar challenges to solve.
(The original post was published at Spacelift)
Top comments (1)
I can confirm, the bot is awesome :)