It can be challenging to solve for complex operations and high traffic without writing software that becomes brittle and difficult to maintain. Event-Driven Architecture (EDA) can be a great solution for handling complexities, due to how this architecture naturally adapts to human behaviors and how it leads to designs that can scale in both complexity and traffic. Let's explore the principles, advantages, and some of the complex scenarios that EDA is naturally able to handle, so we can understand when and where to apply it.
So what are events?
Events are just data, so e.g. a JSON body submitted via HTTP to an endpoint just. And they represent an asynchronous operation, meaning their results arrive later and not directly part of the initial HTTP response. So they are a signal to a system that something has changed or an action is to be performed. We can imagine the ringing of a doorbell as an event, where the ringing is a signal that announces someone's arrival and someone has to listen to that signal and decide how to respond.
There are many ways to think about events, and for this article I'll focus on "Commands" which are a type of event that I find is the easiest way to introduce the core concepts of EDA.
Commands are direct instructions for a system, such as "play a song", "like a post", or "cancel an order". An EDA-based software system will supports a list of Commands, and that's already quite different from more classic RESTful systems which focuses on resource-endpoints that can be synchronously modified.
To explore the challenges of RESTful design, let's imagine keeping track of cars entering and exiting various garages, and we want to show available spaces on a display somewhere. The simplest design would be to simply PUT to a garages/<id>
resource's available-space
field, setting the proper value. That will genuinely work at first, but it scales poorly: If someone else updates any other field in that garage while your request is sending then your garage-data will overwrite their changes.
To fix that we can imagine the server requires you send an If-Unmodified-Since
header so it can reject requests if the resource has been modified since you last saw it, which means their changes won't be lost but your request gets rejected instead. So you'll have to modify and retry your request.
A common RESTful iteration is to then extract the occupancy field into its own resource, e.g. garages/<id>/occupancy
. But now the problem becomes if two systems set available-space
from 99 to 100 then one of them gets their request rejected and have to retry (assuming we kept the If-Modified-Since
header feature).
And so the next common RESTful iteration is to make that field an entire collection resource, meaning we can POST to it. So now we maybe POST to /garages/<id>/entries
, and the server then increments and decrements its available-space
field. We'll also need an /exits
endpoint. And do we model those resources as tables? And when we do PUT to /garages/<id>
how do we know available-space
is now a read-only field? Should the server silently ignore changes to that field? Or perhaps reject the request?
To be clear these are all solvable problems, there's nothing inherently wrong or bad about RESTful design patterns, it's just that in this case it ends up with a lot of unnecessary complexities that EDA can handle more naturally:
The API documentation will show we can submit Commands such as vehicle-entered
and vehicle-exited
, and all we have to do is POST that to the command endpoint: POST {"type": "vehicle-entered"}
to /command
.
So now the interaction is like dropping a request into a mailbox and moving on, no more waiting around, no more rejected requests to retry. This is the asynchronous part of EDA, and it's a key part of how EDA systems can continuously prioritize and process tasks even under extreme load or if subsystems start failing.
Let's move on to explore those qualities in more detail.
The Mechanics of EDA
When an EDA system receives a Command it's typically placed directly onto a queue, and then a separate processing-subsystem deals with those items as fast as it can. Already there we see how EDA remains operational and responsive even under extreme load or partial failures, because at worst processing "just" falls behind. And queued items can often be processed in parallel (e.g. queues could be per garage or we can imagine priority queues backed by dedicated hardware, etc.) which means we can put that processing into elastic computing to scale with demand.
It also means the system is resilient to failure: Only the Command ingestion system must critically and crucially be available at all times, and because that sub-system does very little (literally just puts a Command to a queue) the surface area of that system is minimal and thus simpler to harden. There's even cloud infrastructure that does this for you, which results in zero code to maintain and capacity to cope with effectively infinite data.
EDA and Error Handling
The system that actually processes a Command is where our business logic lives, and I'd like to switch to a more complicated example of managing orders to illustrate how EDA supports robust error handling:
Let's imagine a "cancel order" Command, which our system processes by looking up that order in some external order registry and marks it as cancelled.
Subsystem Down
Let's now imagine that "cancel order" Command arrives while the order registry is temporarily down. The ingestion of the Command will work fine, but trying to process that Command isn't possible. In this case our system will simply retry the Command later, meaning the command still gets done but just with a bit of a delay. It is very common when various systems calls each other to have seconds or minutes where various errors or downtimes suddenly occur, and EDA naturally handles this by just trying again a little bit later.
But let's go even further, and consider what happens if the order registry is down for so long that once the "cancel-order" Command does get processed the customer is no longer eligible to cancel (e.g. the item has been physically shipped).
But our EDA system can simply store all received commands and that provides a crystal-clear audit-trail that the customer did actually attempt to cancel within their allowed time. So it's not a problem to still offer the customer their money back, and we take the hit.
Now that's a way to get satisfied customers.
A Terrible Mistake
Let's now imagine we made a terrible coding-mistake: We accidentally allowed orders to be canceled without properly checking it belonged to that customer, which has resulted in refunds getting queued up in the reimbursement-system. That kind of mistake could happen in any architectural pattern, but with EDA we can fix the code-problem and then wipe the refund database and replay all the Commands. That would rebuild the correct state of refunds, because those commands form a perfect audit-trail of all actions that has affected our system.
This ability to correct an error and re-resolve Commands to arrive at a new correct state is an incredibly powerful aspect of EDA.
Out of Order
Let's lastly consider a quite advanced scenario, where a "cancel order" Command is received but the actual order was never created. That's an error, right? And then a few milliseconds later a "create order" Command does arrive.
That isn't a hypothetical problem, it's just how the Internet works: Sometimes requests are slowed down for no reason at all, meaning requests that were sent later can arrive first. We can imagine a user clicks "Create" in their GUI and then they immediately click "Cancel", which can be confusing.
A RESTful system would bypass this problem by probably modeling all this as order-resources, and the requests would be synchronous. Meaning the GUI will lock as the order-create request gets resolved. That's not bad, it's just not an ideal user experience: It means the customer waits at a minimum several seconds, and will see errors if e.g. the underlying order-registry is down. In EDA it could be problematic to simply reject the initial "cancel order" Command, because then the create-order command ends up getting processed. But our system could react to an invalid "cancel-order" by putting it back into its queue and delay it by a few seconds, and then see if it makes more sense next time it gets processed in case a "create-order" Command does arrive.
That's just like what a human would do if we got handed two events out of order.
And Finally
Just to close the loop of how one then gets results out of an EDA system, the system that processes a task should emit its own events to indicate outcomes of any given command. I won't spent too much time on this aspect because this article is meant to be more focused on event-ingestion, but in the mailbox analogy from earlier this is like sending a letter back to the sender with an answer. So we can imagine some system that informs callers (e.g. Kafka, Kinesis, RabbitMQ, etc.) where the sender can subscribe to updates. Callers can also poll query endpoints (e.g. GET requests) to wait for results.
The Human Element in EDA
At its core, EDA mimics human interaction patterns and I think that's what makes it such a well-suited pattern for a wide variety of business problems: Clicking a button, asking for assistance, pausing and resuming, these are all human activities that EDA elegantly captures which means the user's intent gets directly stored in our system. It's just a great alignment to natural human behavior, which makes EDA not just a nice technical solution but also a very user-centric solution.
Conclusion
This exploration is just touching the surface of EDA, there's more more to dive into for how EDA tends to deeply connect with domain driven design, and event storming, and how it allows us to design long-term maintainable systems by letting parts be decoupled and shifted around as necessary. But as a primer I hope this dive into Commands show some of the great advantages of EDA, and how its a powerful design pattern worth keeping in your toolbelt.
Top comments (0)