Help me Devs! I work on a web app which deals with a lot of data that moves really fast. Our primary database (MySQL) simply cannot keep up, so we store a lot of “current state” data in Redis.
Today we (de)serialize the objects with JSON and store the whole object. That leads to several issues, so we’re going to switch to storing each field individually.
The code for managing all these fields is going to get icky fast, so I’m thinking about building a general purpose ORM for Redis. Some of these already exist but don’t have all the features I need (e.g. transactions with pipelines), locks.
Is this a horrible idea? Am I just building a product that already exists elsewhere that I just don’t know about?
Thanks for your response!
EDIT: There have been a few responses suggesting that the problem has to do with JSON. To clarify, when we were storing this sort of data in MySQL it was normalized, not JSON. Using JSON is only a means to persist objects to Redis.
Top comments (31)
I feel like Redis is a delightfully flexible tool and it’s in its nature to support all sorts of use cases.
In spirit I feel like ORMs are a perfectly nice fit.
A quick search seems to indicate that others have tried and succeeded at this
"...Our primary database (MySQL) simply cannot keep up..." MySQL runs some of the biggest data sets in the world. Find a good DBA. Clustering, federation, caching.
Using Redis as a caching layer is a great idea, but as a persistent storage media; I would not recommend that. Sure, it can save to disk, has replication, complex object types, etc. But Redis is, by its nature, a cache, not persistent storage.
Right tool for the right job.
Thanks for the suggestion! I'll clarify the use case just a bit. The problem is not with read performance from MySQL, but with write performance to it. The data we're storing here is transient by its very nature, with each record changing several times per minute. It's also not data that we'd care much about losing (we persist as much as we need to in a down-sampled fashion to other storage mediums). So even if we could tune MySQL to accept writes fast enough (which would be expensive if we need to hire a new full time employee to figure it out!) we don't need most of the features it gives us.
You can save yourself from huge pain of maintaining data consistency between two different storages. Take a look at Apache Ignite data grid. Using it you can get several benefits:
It also provides additional services like queues, pub-sub, distributed computing, built-in file system and many more useful things.
Very cool, I've never looked at that before! I'll certainly consider it as an option.
There are three ways:
I recommend the 3.
Thank you! I clarified a bit in an edit but JSON was not the goal, just a means to store data in Redis. In MySQL we had normal tables with columns. I hadn't been considering a NoSQL database before, but I'm certainly considering it now!
Oooh now I understand. Writes on normalized tables (with index rewrites and such) can be slow. I'm not 100% up to date on MySQL (I main use PostgreSQL) but I remember there's the possibility of having a JSON column for destructured data. Maybe you can test that before moving to a separate NoSQL DB.
I would test a JSON column, then if it's still too slow I would consider a separate DB. Maintaining one DB is still better than multiple DBs :D
Let us know about your findings, I'm very curious!
It sounds a lot like you're trying to shoehorn non-relational data into a relational system.
You lose all of the performance support of a relational system when your primary data collection is json. It's difficult to index and more complex for MySQL to store.
This is more of an issue of using a hammer like you would a screwdriver: you're simply not using the tool correctly.
Either maintain your data as JSON documents and move to a document-storage NoSQL solution like mongo (easier), or refactor your data management into a relational model to work with MySQL (harder).
Oops, I clearly made a mistake in describing the issue! I clarified a bit above. When we were trying to use MySQL for this, the data was not JSON. You do bring up a good point though in that we don't need the power of a relational database (e.g. joins) for this sort of data. I have been looking at DynamoDB to see if that will give us what we need. Though maybe Mongo would have better tooling.
Reading over some of your other comments implies that it's only a subset of your data that has this problem, not the entire dataset.
In that case, mixing storage engines is probably the preferred solution. If this "transient data" is short-lived between periods of long persistence, then redis makes a lot of sense.
Did you introduce Redis because you were having a performance issue specifically with JSON blobs being stored in MySQL? If that's the case I would look back before looking forward: if your data can be expanded into relations and indexed you might not need a second database at all (and even in MySQL you can index generated columns sourced from JSON, should you need to).
Outside that, you build the tooling you need! If someone else has gotten 80 or 90% of the way there, maybe don't bother starting from scratch. And since Redis is a key-value store rather than a relational database it won't, strictly speaking, be an O/RM -- but that's a good thing.
I clarified a bit in an edit since I clearly implied exactly what you were saying- JSON in MySQL. That isn't what we were doing, but that would have definitely been a problem! You're certainly right in that if we go forward with better Redis tooling we'll look at forking / expanding an existing package before writing one from scratch. Just depends on how much rewriting we'd have to do.
I have worked on a project that had PostgreSQL. it was just amazing. Fast, SQL syntax and we could also query directly in JSON fields. Maybe it will help to have a look.
You raise a good point here, MySQL also supports native JSON type since 5.7.8, and if you're using MySQL 8.* the support for JSON is even more robust.
I'd certainly consider a separate data store (if you want to isolate this data from your primary stores) in either PostgreSQL or MySQL with native JSON support unless there is some particular business case that rules these out (for the OP).
Our primary database is pretty much stuck on MySQL 5.6 for the foreseeable future due to boring reasons. You're right though in that I don't need this data in the same database (clearly, because it's not) so I could stand up a new MySQL 8 or PostgreSQL instance. I've been wanting to play with both of those anyway.
Thanks for the suggestion!
That's also a good option. If the primary DB is frozen in time and if this data is mostly written and not read, you could even have a separate MySQL connection with the newest version that only contains the table to store this data that gets updated asynchronously (read "eventually consistent").
You'd recycle the same tooling, company expertise and learn new things that might help in the future to upgrade the main DB if you ever chose to do so.
Adopting a completely new storage server is a big investment (in terms of money, devops, time and so on), I'd prioritise a way to decrease that impact if it's technologically feasible
If you decide to go with Redis, and the built-in data structures (sets, hashes, etc') don't cover your requirements (and I do suggest you first try to make it fit - you should get into the "Redis mindset" and not always expect Redis to fit your code as it is), you may want to look into RedisJSON - it's an official Redis module for storing and editing JSON natively without (de)serialization. There are client libraries for several platforms.
Boy would I love to. Unfortunately we’re using Elasticache and it isn’t supported :(. Good suggestion though!
You can store your data in Redis, no problem. I don't know what language are you using (I guess Python), but in Ruby there are several Redis ORMs. Maybe you can take a look and get some ideas:
I don't think Redis is only intended for caching, it can be used to calculate distances between coordinates, do unions/intersections between groups, create scoreboards, etc. There are background job processors like Sidekiq that also use it to store and manage their jobs in a persistent way.
You might want to give CassandraDB a try, since it sounds like you need a non relational database instead of MySQL.
Thanks! I’ve never looked at that one before