I have a confession to make. Before I joined The Agile Monkeys and the Booster project, I had no clue about event-sourcing...
OK, that statement is not entirely accurate. I had been exposed to the concept before (the idea of appending events to a log and using them to reconstitute state) but not in a way related to information systems. Where had I encountered such a concept? Every time I logged in to my bank's website and looked at my account.
You see, in banking, it is vital to keep a record of every event (the most obvious being withdrawals and deposits) that occurred in an account; you don't expect to look at your bank statement and have it simply tell you you have a total of $0 in your account. You want to know how and when you earned or spent money so you can see how you got to $0. Imagine you open an account and start with $0; days later, your first paycheck is deposited and you have $100. You then go to the ATM and withdraw $50 for groceries; a few days later, you withdraw another $50 to pay back a friend who loaned you money. Your total balance after those transactions? $0.
This method of record-keeping and reconstituting state has been around for centuries; it just wasn't called event-sourcing until recently. The bank account ledger is just one example, but there are many others in other areas, such as addendums in legal contracts.
Record-keeping and state in traditional software systems
Event-sourcing appears to be a natural way of modeling data in information systems, but it seems that most of us who learned about storing data when learning about building information systems in software engineering school only learned the CRUD (create, read, update, and delete) method (ALVYS co-founder and CTO Leo Gorodinski even went so far as to say that someone could've come up with event-sourcing by paying no attention to the software engineering literature).
The traditional CRUD style of persisting data and only the current state has its historical reasons; there was a time when computer storage was expensive, so storing every event was unaffordable. But today, in the age of cloud computing, this is not a problem anymore (Adam Dymitruk, creator of Event Modeling, has a great explanation of this on the Event Modeling blog).
An example of CRUD vs. event-sourcing
Let's compare the different approaches with a very simple example: an address book or a list of contacts.
Traditional CRUD approach
For most people, the most obvious way of storing these contacts would probably be in a relational database in a contacts table that would look like something like this:
CREATE TABLE Contacts (
Id int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
PhoneNumber varchar(255),
);
Want to add a new contact? Just run an INSERT
SQL query. Has your contact changed their address or phone number? Execute an UPDATE
SQL query and update those fields. Want to know where a person with a specific ID lives? Execute a SELECT Address
query for that ID. This works pretty well most of the time, but what if you need to know the previous addresses where someone lived and not their current address? All those UPDATE
queries changed those fields in that ID's row, and the old data is lost forever. That means finding the previous addresses is not possible.
Event-sourcing approach
Instead of updating a row on a database and storing the latest state, with event-sourcing, you would record everything that happens as an event. Let's say that instead of a relational database, we store the events as JSON objects in a NoSQL database, like MongoDB. (However! I am not saying you need to use a NoSQL database for event-sourcing. You can create an event-sourced system using a traditional, relational database, but the schema-less nature of NoSQL databases makes it easier to explain how events are stored.) Let's look at a sample event stream for one of the contacts represented as an array of JSON objects:
[
{
"eventType": "ContactCreated",
"registeredAt": "2021-01-01T00:00:00+00:00",
"data": {
"id": 90125,
"lastName": "Doe",
"firstName": "John",
"address": "22 Acacia Avenue, London, UK",
"phoneNumber": "634-5789"
}
},
{
"eventType": "PhoneNumberChanged",
"registeredAt": "2021-03-21T00:00:00+00:00",
"data": {
"id": 90125,
"phoneNumber": "867-5309"
}
},
{
"eventType": "AddressChanged",
"registeredAt": "2022-01-01T00:00:00+00:00",
"data": {
"id": 90125,
"address": "2120 South Michigan Avenue, Chicago, IL"
}
}
]
If we were to query this contact's phone and address, we would see that John Doe, with ID "90125", lives at 2120 South Michigan Avenue, Chicago, IL, and his phone number is 867β5309. But storing this stream of events lets us also easily find out that on March 21, 2021, John Doe changed his phone number from 634β5789 to 867β5309 and then changed his address from 22 Acacia Avenue to 2120 South Michigan Avenue on January 1, 2022. With the traditional CRUD approach that many developers (myself included) have been accustomed to, this information would've been lost.
For the example I just used, the business value of modeling the data around events and storing them is not evident; you pretty much only care about where your contact currently lives and their current phone number, but think of an e-commerce application. Imagine that every time a person modifies their shopping cart, an event is registered with the items that were added to or removed from the cart, and a final event is registered when the person proceeds to checkout. All these events happening between the customer adding the first item to their cart and proceeding to checkout provide a gold mine for analytics. Do people start removing items once they cross a certain price threshold? Which items do people start removing? Do people save items for later? All of these insights can then be translated to better business decisions; do we offer a discount once people cross that price threshold so they don't start removing items from the cart? Do we lower the prices of all the items on our site?
From a business perspective, this is one of the most evident examples of the advantages of modeling your system around events. Another great benefit of this approach is that it provides an audit log right out of the box; this benefit is of particular importance in financial applications. There are also advantages from a technical perspective in terms of scalability, resilience, and decoupling since different applications could tap into a shared event store and derive their own models (I recommend reading Martin Fowler's article on event-sourcing). Of course, like all things in life, nothing is perfect, and there are things you need to watch out for when implementing an event-sourced system, such as eventual consistency and versioning (these topics merit their own articles).
Using Booster to implement an event-sourced application
Seeing as most developers are not accustomed to implementing event-sourced systems, it might appear to most of them that it's technically challenging, especially from the infrastructure and security perspective, and not worth putting forth the effort, especially for those who want to ship their product quickly and require a shorter time-to-market. Luckily, the Booster team has come up with an opinionated framework for coding event-sourced applications using TypeScript, removing many of the barriers to entry of getting such an application up and running quickly. It doesn't just provide you a good starting point to structure your code, but it even generates the whole infrastructure in your cloud provider of choice, ready for production from minute 1.
Booster's cornerstones are event-sourcing, CQRS, and elements of domain-driven design. I'm not going to do a deep dive on thisβthat's what Booster's documentation is for (and I strongly recommend reading it). Here's the abridged version of what a developer needs to code when building an app with Booster:
-
Commands: User actions to interact with the application (e.g.,
AddContact
,UpdatePhoneNumber
,UpdateAddress
) -
Events: Things that happened (e.g.,
ContactCreated
,PhoneNumberChanged
,AddressChanged
) -
Entities: Representation of a domain entity's state (e.g.,
Contact
) - Read Models: Cached data the different entities optimized for read operations
Let's look at the code for our sample address book, starting with the commands. Here's what the code for an AddContact
command looks like:
@Command({
authorize: 'all',
})
export class AddContact {
public constructor(
readonly id: UUID,
readonly firstName: string,
readonly lastName: string,
readonly address: string,
readonly phoneNumber: string
) {}
public static async handle(command: AddContact, register: Register): Promise<void> {
register.events(
new ContactCreated(command.id, command.firstName, command.lastName, command.address, command.phoneNumber)
)
}
}
The @Command
class decorator is used to define a command. The class constructor must have the fields that will be part of the request (a GraphQL mutation, as we'll see later) to submit the command. The authorize: 'all'
property in the decorator is part of Booster's authorization mechanism. We're not going to talk about that in this tutorial; you can take a look at the documentation to learn more about that.
The command's handle
function can do all the necessary logic and validation before registering an event; in this case, it's a new ContactCreated
event. Let's look at the code for that event:
@Event
export class ContactCreated {
public constructor(
readonly id: UUID,
readonly firstName: string,
readonly lastName: string,
readonly address: string,
readonly phoneNumber: string,
) {}
public entityID(): UUID {
return this.id
}
}
To define an event, a class decorator is also used; in this case, it's the @Event
decorator. The code for events is pretty straightforward. The event's structure is defined by its properties in the constructor, and in the entityID
function, we need to define which of the properties is the unique identifier for the domain entity affected by this event.
The code for the UpdatePhoneNumber
and UpdateAddress
commands, as well as for the PhoneNumberChanged
, and AddressChanged
events follow a similar pattern, so I'm going to skip them here, but you can take a look at them in the GitHub repo.
Now we need to define our domain entity. In the case of the address book, we're managing a bunch of contacts; these contacts all have details for people's first and last names, addresses, phone numbers, and unique IDs to distinguish contacts from each other. The events we've defined make up the source of truth for our system, and we'll use them to reconstitute the state for the different entities. Let's look at the code:
@Entity
export class Contact {
public constructor(
public id: UUID,
public firstName: string,
public lastName: string,
public address: string,
public phoneNumber: string
) {}
@Reduces(ContactCreated)
public static reduceContactCreated(event: ContactCreated, currentContact?: Contact): Contact {
if (currentContact) {
return currentContact // if contact already exists we'll ignore the event
} else {
return new Contact(event.id, event.firstName, event.lastName, event.address, event.phoneNumber)
}
}
@Reduces(AddressChanged)
public static reduceAddressChanged(event: AddressChanged, currentContact: Contact): Contact {
return new Contact(
currentContact.id,
currentContact.firstName,
currentContact.lastName,
event.address,
currentContact.phoneNumber
)
}
@Reduces(PhoneNumberChanged)
public static reducePhoneNumberChanged(event: PhoneNumberChanged, currentContact: Contact): Contact {
return new Contact(
currentContact.id,
currentContact.firstName,
currentContact.lastName,
currentContact.address,
event.phoneNumber
)
}
}
Just like commands and events, we annotate our class with an @Entity
decorator to define an entity, and in the constructor, we define the properties we want our Contact
entity to have. Then, we need to define a reducer function for each event. You see, the state is reconstituted by taking all the events in an event stream, going from oldest to newest, and applying a function on the state of the entity up to the point of that event and the event itself. For example, take the reducePhoneNumberChangedfunction
, which reduces the PhoneNumberChanged
event; it's going to return the same first name, last name, and address as the currentContact
(i.e., the state of the entity up to that event) and change the phone number to the one registered in the event. Now that we can successfully reconstitute the state of an entity, let's expose them as read models for the world to view them! To do this, we define a read model that projects an entity. Let's look at the code for the ContactReadModel
:
@ReadModel({
authorize: 'all',
})
export class ContactReadModel {
public constructor(
public id: UUID,
public firstName: string,
public lastName: string,
public address: string,
public phoneNumber: string
) {}
@Projects(Contact, 'id')
public static projectContact(
entity: Contact,
currentContactReadModel?: ContactReadModel
): ProjectionResult<ContactReadModel> {
return new ContactReadModel(entity.id, entity.firstName, entity.lastName, entity.address, entity.phoneNumber)
}
}
Once again, we use decorators, and in our read model class constructor, we define the structure or the properties we'll return to the user when they run a GraphQL query on this read model.
And that's it! The cool thing about Booster is that we can now deploy directly to our cloud provider of choice without having to think about setting up API endpoints, databases, etc., because in Booster, all that is inferred from the code. Once your app is deployed (more on that in the documentation), you can run GraphQL queries and mutations like the following:
-
Add a new contact:
mutation { AddContact( input: {id: "90125", firstName: "John", lastName: "Doe", address: "22 Acacia Avenue, London, UK", phoneNumber: "634-5789"} ) }
-
Change that contact's phone number:
mutation { UpdatePhoneNumber( input: {id: "90125", phoneNumber: "867-5309"} ) }
-
Query that contact's current info:
{ ContactReadModel(id: "90125") { id firstName lastName address phoneNumber } }
You can check out the complete code on GitHub and try it out yourself!
Conclusions
Event-sourcing gives us a new way to model data in information systems that's different from the CRUD way of thinking many software developers are accustomed to. This new method offers benefits in terms of scalability, resiliency, decoupling, and access to information previously lost in systems that only handle the latest state.
Booster provides an easy way to harness the benefits of event-sourcing by way of an opinionated framework for developing applications with TypeScript and cloud infrastructure that's inferred from the code.
I encourage you all to try out Booster and modeling your systems around events. Learn more by visiting Booster's website, GitHub repo, or join the conversation on the Booster Discord server!
Top comments (3)
Interesting. How do you actually save the events in AWS? DynamoDB, Keyspaces, RDS? I can't find where it is described in the docs of booster. There are lots of useful stuff but it is still a black box to me, essentially because it hides infrastructure. Most companies have compliance on that, the fact that it is integrated within Booster is a drawback to me, not an advantage.
Hi Benjamin! The framework core interacts with the different cloud providers via the provider-specific packages. In AWS's case, there's
The first one contains a series of adapters that implement a generic interface for the framework components (events, read models, and so on) but for AWS resources (e.g., DynamoDB for storing events, etc.).
The second package is used to provision and configure (using the AWS CDK) the cloud resources required by the
framework-provider-aws
package. For a more in-depth look, please feel free to look at these packages on our GitHub.While it may seem like a black box at first, the idea is to provide a common interface to quickly understand how each function is implemented in each cloud provider. By understanding this common interface, you could implement your provider specific to your own compliance needs.
Anyhow, thanks for pointing out that it's not clear from the documentation. We'll take that into account!
Yeah, the intention for Booster is far from becoming a black box that hides what's inside. The idea is to offer a set of standards and abstractions that make development easier 99% of the time, but not at any cost!
One of the project's goals is that a team can either pick a pre-built infrastructure package or build their own, so they can work on infrastructure only once and focus on the business logic the rest of the time.
The default AWS implementation uses DynamoDB and solves many challenges out of the box like scalability, message ordering, or eventual consistency. This is perfect for people learning about event sourcing, early-stage startups, or organizations already in AWS. Still, as Mario mentioned, it's straightforward to build your own implementation if you need it. If you want to work on existing infrastructure, you just need to implement the ProviderLibrary interface to tell the framework how to store an event, how to read it, and a few more basic operations. Optionally, you can also use some infrastructure-as-code solution to provision the environments with Booster too. The framework doesn't make any assumptions on what's behind, so you can use any stack you want.
Indeed, there's a nice multi-cloud demo in our Youtube that leverages on this architecture to deploy a single codebase to AWS, Azure and Kubernetes by just providing a separate package for each: youtube.com/watch?v=MHw_.9tcqjz0
We should definitely work on that part of the documentation; we've been focusing on the user-level docs first, but it's becoming more and more important to talk about extensibility. Thanks a lot for pointing it out.