DEV Community

Tom Cools
Tom Cools

Posted on • Originally published at tomcools.be

Event Stores: Saving the right kind of byte[]

A colleague of mine recently started exploring Event Sourcing.
While his focus is on .NET and mine is on Java, I was more than glad to help him out.
I pointed him to Event Store as a great place to start.
A couple of hours later he shared a link to his github, proclaiming he "had done it".

The warm feeling you get when helping a colleague discover something new was rising up inside me... until it was taken away again when he dropped the following line on me.

It's a shame I have to manually serialize my objects to byte[]...

I immediately opened the Event Store website only to discover that serializing your Objects into byte[] seems to be the default way of saving events when you use the C# library. While there are other ways to create a byte[], there seems to be no guidance what so ever on what you should be putting in your event store.

So... what bytes are a right fit for an event store? To make a decision, lets take a step back to one of the fundamental truths in our sector.

Change as a constant vs Event Sourcing

Perhaps one of the most powerful lessons I learned in school was this: "Change is the only constant".
Writing code is an iterative process where we learn and discover better ways of expressing ourselves. We refactor, we get new insights and sometimes so does the business we are working for.

Event Sourcing on the other hand is about writing and never changing events.
Most of the time, the way we save our events is referred to as an "immutable sequence of events". This means that if we save something, we have to keep dragging it along until the end of time. It can not change.

Immutable Event Store VS The fundamental truth of change

The Event Store and the concept of Change may seem like complete opposites.
What we can do is try to bring the two closer together by choosing an event structure which is flexible enough to allow for some change, without truly breaking the immutability of the Event Store.

Lets look at some of the options for creating our byte[] events while keeping change in mind.

Serializing JAVA/C# Objects

My colleague chose for serializing his C# objects. There are some obvious pitfalls to this choice.

Serialization is a very fragile thing. Simple changes like adding or renaming fields will already break the system.
Your events will be locked forever to a single version, requiring you to keep them forever and use upcasters to update them to the latest version.

Depending on how much you change your events, this may lead to an explosion of versions and an overall increase in complexity.

A developer unable to retrieve a serialized object from the Event Store

That being said, there is a bigger issue here. By serializing Objects, you are binding your events to a certain programming language.
This might be a conscious choice but it is not one that should be taken lightly as it will haunt you forever.

JSON

To remove the dependency on a specific programming language we could use JSON. JSON has been a big success as a communication format in polyglot environments. We are at a point where a public API is expected to return data in a JSON format!
But how does it fare as a format for events in our store?

While we can now be polyglot when reading our events, it is far from perfect.
For example, if we discover a more meaningful name for a field, we still can not change it.
Have a look at the following JSON structure:

{
  "event": "UserAddedEvent",
  "data": {
    "userid": "123456789",
    "name": "Tom",
    "lastname": "Cools"
  }
}

Through evolving our code, we notice that we have different kinds of users in our system, which each mean something different.
You want to be able to make the distinction between "Customers", "Employees" and maybe an "Administrator" type of user.
As adding employees is new for the system, the currently saved users are all "Customers".
What you actually want to do, is change the event to the following:

{
  "event": "CustomerAddedEvent",   <- Change event name
  "data": {
    "customerid": "123456789",  <- change field name
    "name": "Tom",
    "lastname": "Cools"
  }
}

Using JSON however, this will not work.

  • You can't change the events in the event store, so now you have two kinds of events.
  • Without making two readers for the different types, the reading of the event will fail, either hard with an exception, or soft because the resulting field is empty or null.

You will be forced to add a new "type" field to the data and your reader must be smart enough to set the type = Customer as a default.
This might not be a big deal for now, but remember, every version of an event must be handled. This means, adding more processors or adding more upcasters to handle the differences.

Another downside is that JSON is, at its core, a String. This means you will need to escape special characters.
Those of you who have lost some hours with escape sequences should have felt a shiver down your spines reading the last paragraphs...

Data Serialization Libraries

Data Serialization Libraries are often mentioned in the context of Inter Process Communication. Think about the things you put on Kafka, RabbitMQ or some other message broker. It is commonplace to use a library to transform messages into byte[] to put them on the wire. So why not use those libraries to serialize and save our events?

There are some out there, like Avro and Thrift, but my personal favorite is the Google Protocol Buffers Library.
Using Protocol Buffers you declare a schema which is used to generate (de)serialization code for a specific programming language.

The serialized result is the same each time, so you could serialize data using the Java library and deserialize that data again with the .NET Library.

message CustomerAddedEvent {
  required string customerId = 1;
  required string name = 2;
  optional string lastname = 3;
}

Important to note is that a Protocol Buffer schema is index based. Because this is the case, there is no impact when changing a field name.
As long as the field index and type never change, you can change the field name as much as you want.

Adding and removing of fields can be done without problem as well. The serialization will keep working.
When you add a new field that didn't exist before, you do need to program a sensible default yourself.

I have created an example project on github where I use the Maven plugin to generate the Java classes out of the Protobuf Schemas.

GitHub logo TomCools / protocol-buffers-example

Example repo for protocol buffers



Be sure to have a look. For other programming languages, check out the official documentation.

So what do you choose?

Choosing in which structure you save your events is probably the most significant decision you have to make while starting Event Sourcing.
If you worry about an ever changing world, you may want a mix between the principles of an Immutable store, while still having a little bit of flexibility.

Here is a table listing the options I described above and which changes are possible when using each one without adding an extra mapping layer.

Example Serialized Object JSON Protocol Buffers
Polyglot
Remove field from event
Add field to event
Change field name
Human Readable

I have a tendency to forgo the human readability in favor of the flexibility and speed of Protocol Buffers.
If you are interested in performance metrics, the guys at auth0 have written a great blog post on this.

For now, I will be sticking with Protocol Buffers as the format for my events. Whatever you choose to use, be sure you know the consequences.
Choosing an event format is a day 1 decision, so whatever you do, keep in mind:

A bad choice now will haunt you for a long, long time.


Don't agree or have another idea, please leave your remarks in the comments!

Top comments (0)