DEV Community

Cover image for Why Running Databases on Kubernetes is a Recipe for Disaster: The Case for a New Platform Designed for Stateful Workloads
Ali Alp
Ali Alp

Posted on

Why Running Databases on Kubernetes is a Recipe for Disaster: The Case for a New Platform Designed for Stateful Workloads

Kubernetes has become a powerhouse for managing containerized applications, especially for stateless workloads, because of its scalability and automation. However, when it comes to running databases—critical, stateful systems—things get much more complicated. Even though Kubernetes has improved, there are still big challenges. These issues suggest that perhaps we need a completely new platform built specifically for stateful workloads like databases, rather than trying to make Kubernetes do something it wasn’t originally designed for.

Let’s explore why running databases on Kubernetes remains risky and why we might need a platform tailored to handle databases' unique demands.

1. CSI Crashes and Storage Attachments: Still a Risk

The Issue: The Container Storage Interface (CSI) manages how Kubernetes attaches and detaches storage. While CSI has gotten better over time, it can still fail, and these failures can cause data loss or corruption in databases.

Why It’s a Problem: Databases rely on constant, stable access to storage. If a CSI crash occurs when storage is being reattached—say after a node failure or pod eviction—the database might lose data or get corrupted.

A Better Solution: A new platform specifically built for databases could provide better storage management, ensuring that databases always stay connected to their storage even during failures. This would reduce the risk of data loss or corruption significantly.

2. Immature Database Operators: Still Not Perfect

The Issue: Database operators in Kubernetes are responsible for automating tasks like setting up the database, handling backups, and managing failovers. While some operators are now quite robust, others are still maturing, and things can go wrong, especially in complex scenarios like failovers.

Why It’s a Problem: Even with improved operators, there’s still the risk of errors, especially during critical moments like failovers or upgrades. This could lead to data inconsistencies or even corruption, which is unacceptable for production databases.

A Better Solution: A platform built specifically for databases could come with native tools that handle these tasks reliably, without the need for third-party operators. This would simplify database management and reduce the risks of running critical operations.

3. Data Loss and Corruption: Risks from Pod Evictions, Node Failures, and Network Issues

The Issue: Kubernetes isn’t great at handling stateful applications when things go wrong, such as when nodes fail, pods get evicted, or network issues occur. These events can disrupt databases and cause data loss or corruption if they aren’t carefully managed.

Why It’s a Problem: Without careful tuning, databases running on Kubernetes can face serious risks from these kinds of disruptions. For example, network partitions can cause a "split-brain" scenario, where two database replicas think they are the primary, leading to conflicting data.

A Better Solution: A dedicated platform for databases would handle these situations better by providing built-in mechanisms to ensure data consistency and prevent issues like split-brain from occurring in the first place.

4. Replica Lag and Network Bottlenecks: A Constant Struggle

The Issue: In distributed databases, replication is key to keeping data in sync across multiple instances. On Kubernetes, network congestion and I/O bottlenecks can lead to replication delays (also known as replica lag), which can cause major problems.

Why It’s a Problem: If the network gets too congested, replication may fall behind, meaning if the primary database fails, the backups may not have the latest data. This could result in data loss or inconsistencies during a failover.

A Better Solution: A platform built specifically for databases would prioritize network and I/O resources for replication, ensuring that databases always stay in sync, even when other workloads are running on the same infrastructure.

5. Kubernetes Wasn’t Built for Databases

The Issue: Kubernetes was originally designed for stateless applications. While it now supports stateful workloads with features like StatefulSets and PersistentVolumes, these were added later and aren’t ideal for databases. Running a database well requires specialized handling of things like backups, disaster recovery, and failovers.

Why It’s a Problem: Without native support for these critical database tasks, organizations end up using a mix of third-party tools and custom scripts to manage things like backups and failovers. This adds complexity and increases the chance of errors, making it harder to ensure database reliability.

A Better Solution: A new platform could offer all of these features out of the box, specifically designed with databases in mind. That means built-in support for backups, disaster recovery, and seamless failover, reducing the need for custom solutions and making databases easier to manage.

6. DBAs Now Need to Be Kubernetes Experts

The Issue: Running databases on Kubernetes has blurred the line between traditional database administrators (DBAs) and Kubernetes administrators (CKAs). DBAs now need to understand Kubernetes deeply, or CKAs need to learn how to manage databases.

Why It’s a Problem: Managing both Kubernetes infrastructure and databases is a complex task. Expecting a DBA to also become a Kubernetes expert—or expecting a CKA to know the intricacies of databases—adds a lot of complexity. This skill gap can lead to operational issues and downtime if not handled properly.

A Better Solution: A new platform designed for stateful workloads could abstract away much of the complexity of Kubernetes, allowing DBAs to focus on managing databases, without needing to learn the ins and outs of Kubernetes infrastructure. This would simplify the skill requirements and reduce operational risks.

Points to Consider:

1. Kubernetes Ecosystem Maturity

Kubernetes has come a long way in supporting stateful workloads. Tools like StatefulSets and CSI drivers are maturing, and many database operators are becoming more reliable. However, the complexity and learning curve involved in running databases on Kubernetes remain high.

The Takeaway: Even though Kubernetes is evolving, it still wasn’t designed with databases in mind. For teams without deep Kubernetes and database expertise, a simpler, purpose-built platform could offer a better solution with less overhead.

2. Building a New Platform Adds Complexity

While building a new platform for stateful workloads might solve some of these issues, it could also create its own set of problems. A new platform means new learning curves, migration challenges, and the risk of fragmenting the ecosystem.

The Takeaway: While a dedicated platform for databases could be ideal, it’s important to consider the overhead of learning and migrating to a new system. The trade-offs between short-term complexity and long-term reliability need to be weighed carefully.

3. Can We Integrate These Features into Kubernetes?

Instead of building an entirely new platform, we could explore whether the features needed for stateful workloads, like better backup and failover handling, could be integrated into Kubernetes itself or provided as extensions.

The Takeaway: While Kubernetes is general-purpose by nature, it has a strong ecosystem of extensions. It’s worth exploring whether we can enhance Kubernetes to better handle stateful workloads rather than starting from scratch.

Conclusion: Time for a Stateful Revolution

Running databases on Kubernetes can work, but it still comes with significant risks and challenges. Kubernetes was not designed with stateful workloads like databases in mind, and while the ecosystem has improved, the complexity of managing databases on Kubernetes remains high. This has led to an ongoing debate: Should we continue to push Kubernetes to do something it wasn’t originally designed for, or is it time to build a new platform specifically for stateful workloads like databases?

While Kubernetes will continue to evolve, a dedicated platform designed for databases could offer a simpler, more reliable solution. Such a platform would be optimized for the needs of stateful workloads, reducing the complexity and risks of running critical databases in production.

Whether through a new platform or better integration within Kubernetes, the future of managing databases needs to focus on reducing operational complexity, ensuring data reliability, and allowing teams to focus on what matters—keeping their databases secure, scalable, and highly available.

Top comments (3)

Collapse
 
flyinprogrammer profile image
Alan Scherger

Your Better Solutions sound nice, but how would you achieve any of them?

For example:

ensuring that databases always stay connected to their storage even during failures.


In a world where most cloud providers offer non-ephemeral storage in the form of network attached storage, how are you going to ensure it stays connected?

Collapse
 
torsten_liermann_9d74dee2 profile image
Torsten Liermann

Here’s a similar post on the topic, but with a more optimistic outlook. kubernetespodcast.com/episode/225-...

Collapse
 
farmer_sneed_b58f87019ef1 profile image
Farmer Sneed

Good read. It'd be great to manage everything in Kubernetes, but it seems like a massive hassle to manage databases in there. DB Management has stable solutions that don't really need Kubernetes tbh.