DEV Community

ScaleGrid for ScaleGrid

Posted on • Edited on • Originally published at scalegrid.io

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. Released just four years ago in 2015, Scylla has averaged over 220% year-over-year growth in popularity according to DB-Engines. We’ve heard a lot about this rising database from the DBA community and our users, and decided to become a sponsor for this years Scylla Summit to learn more about the deployment trends from its users. In this ScyllaDB Trends Report, we break down ScyllaDB cloud vs. on-premise deployments, most popular cloud providers, SQL and NoSQL databases used with ScyllaDB, most time-consuming management tasks, and why you should use ScyllaDB vs. Cassandra.

ScyllaDB vs. Cassandra - Which Is Better?

Wondering which wide-column store to use for your deployments? While Cassandra is still the most popular, ScyllaDB is gaining fast as the 7th most popular wide column store according to DB-Engines. So what are some of the reasons why users would pick ScyllaDB vs. Cassandra?

ScyllaDB offers significantly lower latency which allows you to process a high volume of data with minimal delay. In fact, according to ScyllaDB’s performance benchmark report, their 99.9 percentile latency is up to 11X better than Cassandra on AWS EC2 bare metal. So this type of performance has to come at a cost, right? It does, but they claim in this report that it’s a 2.5X cost reduction compared to running Cassandra, as they can achieve this performance with only 10% of the nodes.

There are dozens of quality articles on ScyllaDB vs. Cassandra, so we’ll stop short here so we can get to the real purpose of this article, breaking down the ScyllaDB user data.

ScyllaDB Cloud vs. ScyllaDB On-Premises

ScyllaDB can be run in both in the public cloud and on-premises. In fact, ScyllaDB is most popularly deployed in both public cloud and on-premise environments within a single organization. The 44% of ScyllaDB deployments leveraging both cloud and on-premise computing could be through either a hybrid cloud environment leveraging both for a specific application, or using these environments separately to manage different applications.

ScyllaDB on-premise deployments and ScyllaDB cloud deployments were dead-even at 28% each. You can run both the free open source ScyllaDB and ScyllaDB Enterprise in the cloud or on-premise, and ScyllaDB Enterprise license starts at $28.8k/year for a total of 48 cores.

ScyllaDB Cloud vs. ScyllaDB On-Premise Chart - Database Trends Report ScaleGrid

Most Popular Cloud Providers for ScyllaDB

With 28% of ScyllaDB cluster exclusively being deployed in the cloud, and 72% using the cloud in some capacity, we were interested to see which cloud providers are most popular for ScyllaDB workloads.

#1. AWS

We found that 39.1% of all ScyllaDB cloud deployments are running on AWS from our survey participants. While we expected AWS to be the #1 cloud provider for ScyllaDB, the percentage was considerably lower than the responses from all cloud database types in this survey that reported 55% were deploying on AWS. This number is more inline with our recent 2019 Open Source Database Trends Report where 56.9% of cloud deployments were reported running on AWS. This may be because AWS does not support ScyllaDB through their Relational Database Services (RDS), so we could hypothesize that as more organizations continue to migrate their data to ScyllaDB, AWS may experience a decline in their customer base.

#2. Google Cloud

Google Cloud Platform (GCP) was the second most popular cloud provider for ScyllaDB, coming in at 30.4% of all cloud deployments. Google Cloud does offer their own wide column store and big data database called Bigtable which is actually ranked #111, one under ScyllaDB at #110 on DB-Engines. ScyllaDB’s low cost and high performance capabilities make it an attractive option to GCP users, especially since it is open source compared to Bigtable which is only commercially available on GCP.

#3. Azure

Azure followed in third place representing 17.4% of all ScyllaDB deployments in the cloud from our survey respondents. Azure is an attractive cloud provider for organizations leveraging the Microsoft suite of services.

Most Popular Cloud Providers for ScyllaDB Chart: AWS, GCP, Azure - Database Trends Report ScaleGrid

The remaining 13.0% of ScyllaDB cloud deployments were found to be running on DigitalOcean, Alibaba, and Tencent cloud computing services.

Their managed service, Scylla Cloud, is currently only available on AWS, and you must use the ScyllaDB Enterprise version to leverage their DBaaS. Scylla Cloud plans to add support for GCP and Azure in the future, but with only 39% reporting on AWS, we can assume over 60% of ScyllaDB deployments are being self-managed in the cloud.

Databases Most Commonly Used with ScyllaDB

As we also found from the 2019 Open Source Database Report, organizations on average leverage 3.1 different database types. But, in this survey, organizations using ScyllaDB reported only using 2.3 different database types on average, a 26% reduction compared to our results from all open source database users. We also found that 39% of ScyllaDB deployments are only using ScyllaDB, and not leveraging any other database type in their applications.

So which databases are most commonly used in conjunction with ScyllaDB? We found that ScyllaDB users are also using SQL databases MySQL 20% of the time and PostgreSQL 20% of the time as well. The second most commonly used database with ScyllaDB was Cassandra represented in 16% of the deployments, and we could assume this is by organizations testing ScyllaDB as an alternative to Cassandra in their applications as both database types are wide column stores.

MongoDB was the fourth most popularly deployed database with ScyllaDB at 12%. Redis and Elasticsearch were tied in fifth place, both being leverage 8% of the time with ScyllaDB deployments.

Databases Most Commonly Used with ScyllaDB Chart: MySQL, PostgreSQL, Cassandra, MongoDB - Database Trends Report ScaleGrid

We also found 20% of Scylla deployments are leveraging other database types, including Oracle, Aerospike, Kafka (which is now transforming into an event streaming database), DB2 and Tarantool.

Most Time-Consuming ScyllaDB Management Tasks

We know that ScyllaDB is widely powerful, but how easy it is to use? We asked ScyllaDB users what their most time-consuming management task was, and heard from 28% that Scylla Repair was the longest management task. Scylla Repair is a synchronization process that runs in the background to ensure all replicas eventually hold the same data. Users must run the nodetool repair command on a regular basis, as there is no way to automate repairs in the ScyllaDB open source or ScyllaDB Enterprise versions, but you can setup a repair schedule through Scylla Manager.

ScyllaDB slow query analysis tied ScyllaDB backups and recoveries for second place at 14% each for the most time-consuming management task. It does not look like ScyllaDB currently has a query analyzer available to identify queries that need optimizing, but users can use their Slow Query Logging to see which queries have the longest response time. ScyllaDB backups are also unable to be automated through the open source and enterprise versions, but they state that recurrent backups will be available in future editions of Scylla Manager. There is also no automated way to restore a ScyllaDB backup, as these must be performed manually in all versions.

10% of ScyllaDB users reported that adding, removing or replacing nodes was the most time-consuming task, coming in at fourth place. These are manual processes that can take quite a bit of time, especially if you are dealing with a large data size. Adding nodes is used to scale out a deployment while removing them scales your deployment down. Nodes must be replaced if they are down, or dead, though a cluster can still be available when more than one node is down.

Tied for fifth place at 7% was upgrades and troubleshooting. ScyllaDB Enterprise and open source both require extensive steps to upgrade a cluster. The recommended methods are through a rolling procedure so there is no downtime, but this is a manual process so the user must take one node down at a time, perform all of the upgrade steps, restart and validate the node before moving on to performing the same steps for the remaining nodes in the cluster. Time-consuming indeed, but fortunately not a daily task! Troubleshooting is of course a deep rabbit hole to dive into, but ScyllaDB Enterprise customers receive 24/7 mission critical support, and open source users have access to a plethora of resources, including documentation, mailing lists, Scylla University and a slack channel for user discussions.

Most Time-Consuming ScyllaDB Management Tasks Chart - Database Trends Report ScaleGrid

The remaining 21% of time-consuming tasks reported by ScyllaDB users include monitoring, migrations, provisioning, balancing shards, compaction and patching.

So, how do these results compare to your ScyllaDB deployments? Are you looking for a way to automate these time-consuming management tasks? While we support MySQL, PostgreSQL, Redis™ and MongoDB® Database today, we're always looking for feedback on which database to add support for next through our DBaaS plans. Let us know in the comments or on Twitter at @scalegridio if you are looking for an easier way to manage your ScyllaDB clusters in the cloud or on-premises!

Top comments (0)