DEV Community

Developer deprogramming: Getting started in Event Sourcing

Barry O Sullivan on November 20, 2017

There are two things I wish I knew when I started building Event Sourced Apps. Always talk to the domain expert before building or designing anyt...

Read full post

Kasey Speakman • Nov 20 '17 • Edited

I really like your example. This resonates with my experience as well.

I have observed at least 2 main reasons for data that is traditionally CRUD.

For humans
- codes/names for identification - Because humans more easily recognize words than arbitrary numbers (IDs).
- descriptions/notes/status text for understanding context and history
For software, configuration affecting run-time operation - E.g. One course requires a grade of 70 to pass. Another course, 60.

These are really two orthogonal dimensions of a given concept, but are typically packaged together for convenience.

Side tangent: I find that almost invariably, the human side of the data requires some manner of set-based constraints... the kind typically found in relational databases. Example: Humans gets really confused if there are 2 MATH101 in the course catalog. Or if PASCAL100 (a deactivated course) shows up in the catalog. So some parts of the data must be indexed for human consumption. While practicing event sourcing, I've often thought I could use a "database" product that allows me to directly update different indexes without requiring them to be attached to a table. I guess it'd be a key-value store but one that allowed me to choose the key index algorithm (hash, b+, reverse, etc) on each collection. If you know of a database product that does this, let me know. For now, I use relational tables and attach indexes to them. It's less efficient (2 copies of key data, 2 places to update, relational engine overhead), but it's already a solved problem.

K • Nov 20 '17

how do event stores handle binary data updates efficiently?

Kasey Speakman • Nov 20 '17 • Edited

A good and proper DDD / ES person would further ask the question: What is the scenario? (What problem are you trying to solve?) Because there might be a better alternative.

In an event store, you don't really update data in place if you can help it. You just keep adding new events. Depending on the size of the binary, I might do the traditional thing where you save it to file storage like S3 and then add an event when the file link changes.

Side note: The product EventStore stores the event data as binary. Meaning I actually have to serialize to string and convert to bytes before calling the method to save the event. I think internally it uses ProtoBuf to store events, which I read (but have not browsed the code to verify) handles binary data pretty efficiently.

K • Nov 20 '17

The use-case was to allow versioning of files (pdf, avi, mpg, mp3, wav, doc, xml, etc.)

Barry O Sullivan • Nov 21 '17 • Edited

As Kasey suggested, best practice there would be to store the file remotely on S3, and then have an event that references the remote file. This allows you to version binary files, as you just upload a new file and create a new a event.

Eg.

# File A, Version 1
FileUploaded
    file_id: 9f9753b8-6beb-4e36-b7ca-1f6f6bf23702
    reference: https://link.to.file

# File A, Version 2
FileUploaded
    file_id: 9f9753b8-6beb-4e36-b7ca-1f6f6bf23702
    reference: https://link.to.other-file

file_id is the reference in your system, whereas reference is the remote instance of that version.

K • Nov 21 '17

I see.

Doesn't this work against the "one source of truth" philosophy?

Barry O Sullivan • Nov 21 '17

Yeah, technically. It's just a philosophy though, so there are times when it's not the appropriate philosophy.

In this case it really comes down to practicality. Is it useful to store the full binary file in the event log? Does that give any value? If the answer is no, then there's no point in saving the file in the log, just store a reference.

Kasey Speakman • Nov 21 '17 • Edited

You can have different sources of truth for each "domain" or specialty. The source of truth for files is the file system... if it is gone from there, it doesn't matter what the database says. :)

It may still be important to (event-sourced) areas of the business to record that something changed and perhaps trigger a further action. You can set this up in a number of ways. The client could issue an API call after successful saving of the file (this is request-driven, probably not the way I would go). Or you could setup event notification on file operations (this is event-driven) -- S3 supports this or just use a file watcher for local apps.

At this point, this is really integration between two systems and no longer event-sourcing. Instead it is Event-Driven Architecture. Event sourcing really only applies inside individual domains, not across different systems. This is probably why you already had an inkling that event-sourcing would not solve the file management problem. By itself, it won't.

norpan • Nov 23 '18

We have an audit requirement that it should not be possible to change data without a trace, so storing the file separately was a problem for us until we realized that we just have to store the file sha256 hash and that way we can check if the file is the right one. So we get the best of both worlds.

SMART2016 • Apr 6 '20

Hi Barry,

I am having nightmare with a simple question, I have a crud application similar to your example, where we really Just Create update delete on a domain entity called "Ruleset",
We are simply trying to store each request or CUD into the db , each of CUD on an entity is stored as separate record in the datastore.

I can think of that the RuleSet entity has two very simple states "SUCCESS" or "FAILURE" for each of the CUD commands and so will have entry into the datastore having states as SUCCESS or FAILURE for each CUD on the entity.

Is that correct way to design the application of CUD on Ruleset Entity model ?
I mean if this cannot be the way , how else can I design the CRUD as ES?

For example in your use case of commands:

AddCourse --> what if AddCourse Fails due to some business validations, will you not store a state with the Course entity in the persistent store ,saying SUCCESS / FAILED with command as AddCourse ?

ChangeCourse --> what if ChangeCourse Fails due to some business validations, will you not store a state with the Course entity in the persistent store ,saying SUCCESS / FAILED with command as ChangeCourse ?

RemoveCourse --> what if RemoveCourse Fails due to some business validations, will you not store a state with the Course entity in the persistent store ,saying SUCCESS / FAILED with command as RemoveCourse ?

If we store things in above manner it will be helpful to retrieve and identify when commands failed and for what reason , isn't ?

and then will the SUCCESS / FAILED state not be associated with the Course entity?

Regards
Dipanjan

Kevin Chaves • Jul 19 '18

Good article man! I find the gathering of requirement really difficult and you gave me a great example on how to do it! Keep the good job (: