Since learning about CQRS, it’s something I’ve taken into almost every new data-based microservice I build. Differentiating how data is created to how it’s retrieved gives you a lot of power.
Take a relational database. Where a table has a relation, which has another two relationships.
public class TopLevelObject
{
public string Name { get; private set; }
public MyChildObject ChildObject { get; private set; }
}
public class MyChildObject
{
public int Id { get; set; }
public string AccessCode { get; set; }
public int TopLevelObjectId {get; set;}
public virtual TopLevelObject TopLevelObject { get; set; }
public RelatedObject RelatedObject { get; set; }
public Category Category { get; set; }
}
public class RelatedObject
{
public int Id { get; set; }
public string ContentDescription { get; set; }
public string ContentType {get; set;}
}
public class Category
{
public int Id {get; set;}
public string Description {get; set;}
}
Take the extremely trivial example above. If somebody wants to retrieve a specific top-level object and the category description, a direct Entity Framework query would give some rather messy JSON.
Having a completely different data access method for querying data negates this.
public class TopLevelObjectDTO
{
public string Name { get; private set; }
public string AccessCode { get; private set; }
public string ContentDescription { get; private set; }
public string ContentType { get; private set; }
public string CategoryDescription { get; private set; }
}
Using a direct SQL query, mixed with an ORM framework like Dapper, allows a more simplistic response model as detailed above.
Of course, this would be possible using EF. But once a complex object model is created, LINQ can get messy and un-performant.
C Microservice & Q Microservice
There’s an idea I’ve been toying with recently around taking CQRS to an extreme with Microservices.
I often find myself writing an API that would see huge improvements from running multiple instances. Whilst this is entirely possible with containers, things get kind of messy when it comes to data access.
I’ve always been nervous having multiple services having the ability to manipulate my databases. I’ve been stung with race conditions in the past.
However, having 10 workers that allow people to ready the data. Knock yourself out guys, query all the data you want.
Breaking the rules
As long as I have been working with microservices the hard and fast rule is one DB per service. Multiple services should NOT share the same database.
But what about having two services that share a database, one being a Command service and one being a Query service.
That way 100,000 instances of the query service could run (imagining a world in which a DB could handle that many connections) with its own view model.
Next to that, a more controlled data manipulation service could run at the same time.
In Practice
This is purely at a conceptual level at the moment, so I apologies right now if I’ve missed something or if I’m re-inventing the wheel.
How I actually envisage this working, at a functional level, is the 100,000 read services would each hold their own data cache.
When a new request comes in instance X first checks it’s own internal knowledge of the data and returns that if found.
If no query results are found, the database is then queried directly.
If a result is found in the DB, that is then stored in the local cache for next time and the response is returned.
At a command level, there is simply one service running that handles all data manipulation.
I’d love to hear your thoughts, including if that thought is ‘you’ve been building microservices wrong your whole life’.
Top comments (8)
Recently @kspeakman noted "The problem with learning about [CQRS] is that it has been conglomerated with a lot of other things over time"
I think this is a case of that. :) Considering CQRS on a conceptual level, as you noted, provides a lot of power. A nice separation of concerns in the conceptual model! There are, though, a lot of different logical models that support CQRS.
I think what you are describing is implementing the query side using an eventually consistent caching proxy. Importantly, tho, you describe how it supports CQRS. :)
Consider this about the consistency: What happens if the data underlying the resource changes? I find that if I considered CQRS starting from the conceptual this has a much easier answer than not.
Eventual consistency, thats exactly the words I was looking for! I knew there was a term for it. Just couldn't think of it.
When you saying the underlying data changes, do you mean from a structure perspective? Fields changing etc...
See also: jepsen.io/consistency/models/read-...
Imagine the following sequence of events:
there is a query "latest address of business X".
This query is performed. Which means: the query service received the request; it did not have it cached; requests from database; provides stored response to requester;
the command "set current address of business X" is performed. This updates the database.
This query is performed again. Which means: ???
There are a number of options for ???. Which depend heavily on the application requirements. Does the query service provide the stored response again? That would be inconsistent with the data in database but maybe fine for requirements. When is the stored value in the query service expired? After a certain amount of time? Never? When an event is received from the database?
I've found that when considering CQRS from the start the consistency requirements are easier to determine: There is no assumed consistency between query and command. Which then directs the design to have a nice separation of concerns while still ensuring the consistency requirements.
This is different from traditional non-CQRS designs: Those often assume a transaction encompassing both reads and write requests. Which ensures consistency but avoids having to consider what's actually required for the application... Until you're sitting there wondering why hundreds of rows are being locked to +1 a number XD
I think in this case you should give up on consistency for the sake of availability. Of course this does not work in every system, but maybe this system does not need to know the exact updated address of the user (it is not a service trying to send you a letter, it just want to see your general profile to sell you a car).
In those system, the cache data could be invalidated after que Command event was done.
But again, it all depends on what system are we working with
Yeah, that's a great point.
I have two separate implementations at the moment that fit both models. One is a postcode lookup service, so the address that is found absolutely needs to be returned without fail (highly available).
Another implementation has a dataset that is used in numerous different places, but can be up to 15-30 minutes old without causing any issues (eventual consistency, looser coupling).
One DB per service means one logical service. One logical service might run many copies (if stateless) or shards (if stateful/actors) of the same service for scale or availability. Likewise the DB means one logical DB which can be scaled with shards or replicas.
It is not exactly clear where CQRS fits... whether the Command and Query sides are just parts of the same logical service. Or whether they are different logical services. I think you could make a case and implement it either way. In our latest venture, we could probably consider them separate services with their own individual scaling knobs, since they do use separate databases, and a backend service translates from the command-side database (an event store in this case) to the query-side.
It really all depends on what you need for your use case.
Interesting point on the logical service Kasey. So one logical service could well be a combination of two separate read/write services?
In the use case, I have in mind, it sounds the same as yours. Having two different scaling knobs for reading vs writing is extremely useful. The volume of reading data is MUCH higher than the written data.
So a separate service handling that seems exactly right.
In most services I've written prior to this, I've still kept a clear distinction between C and Q. But rather than having completely different knobs, they run under one knob.
Interesting comments though, thanks for your input.
Sure, if we are talking about stateless services. The database is probably going to require scaling before the compute resources do. When you do need to scale compute due to read load, it probably doesn't cost that much memory to also carry command handling code along for the ride even if it is mostly dormant. And you still retain the option to split the command and query sides off into different services if the need arises.