When deployed on Kubernetes or OpenShift, CockroachDB uses persistent volumes (PVs) to store DB data, metadata, state-data, user-data, log files, configuration files. These volumes are typically file-system mounts that are mapped to disks/SSDs where the data is physically saved in a distributed fashion. When you operate CockroachDB and run queries, data must be read or written where these operations translate to frequent or continuous disk reads & writes.
Managing the disk: IOPS & throughput
On cloud-managed orchestrators, when you read or write data to disk (PVs), this consumes IOPS and utilizes some of the available IO throughput. These are limiting factors that can result bandwidth saturation, or worse, throttling by the cloud provider under heavier workloads. This condition can be identified by the combination of low CPU usage and high disk latencies, visualized through the CockroachDB UI console hardware dashboard metrics and charts.
Divide & conquer
To overcome these limitations, CockroachDB lets you take advantage of multiple, independent PVs to separate the destination of the cockroach runtime data. CockroachDB Logging is a good candidate to move out of the critical path by dedicating its own volume/storage. This will help with performance tuning since your SQL/schemas live on their own dedicated volume. In fact it's the production readiness recommendation to split the data from the logs into separate PVs.
Typical CockroachDB deployments
Most CockroachDB clusters implement a single PVC that is assigned to each node in a stateful set. Default configurations in both HELM and Operator managed environments create this 1:1 mapping as follows:
Our planned deployment with multiple PVs
…to the implementation
We need to make additions to the StatefulSet template along with custom log-configuration settings to direct CockroachDB logs into the new destination PV.
The logging “secret” configuration
This resource is the one-stop-shop for all your customized logging properties, including log sinks (output logs to different locations, including over the network), logging channels that are mapped to each sink, the format used by the log messages, any redaction-flags of log messages, the buffering and max sizes of log messages.
The following log configuration is the smallest/simplest configuration that we will use as a starting point. Here we keep most defaults, only adjusting the file-defaults destination path for the actual files, where this path will be mounted to a separate PV defined in the StatefulSet template.
For a comprehensive explanation of this fragments, along with working examples and code-fragments, please refer to the Cockroach log configuration documentation so you can tailor the actual logging to your needs.
The StatefulSet template configuration
This statefulset fragment only highlights the added template properties to define the PVC and specific mount points to both the log config secret and the new logs folder. A full, complete StatefulSet example follows this fragment to show the entirety of an actual solution I deployed.
Here is the complete StatefulSet of these changes,including tags/labels specific to my cluster as a reference example that you can copy and edit to make your own (eg sizes, storage classes, IOPS, tags/labels. etc):
Conclusion & References
This is a versatile addition to the standard statefulset because the IOPS can be managed between the PVs, and the plumbing is in-place for log customization. DB admins can easily make changes the to logging channels in a running environment by editing a single log-config file that saved as a secrets object.
Yes indeed! Adding additional data-stores is an ideal solution to address several use-cases:
CRDB on high-vCPU worker-nodes: From our production readiness guidelines, we do not recommend workers with > 32 vCPUs. If you're bound to servers with 32 or more vCPUs, the additional store will benefit from the extra compute/processing power by creating additional processes per-store. These include splitting the GC workload, compactions, replica management, WAL, monitoring, etc. In the end you will leverage the additional CPU and will experience less waiting times on I/O operations.
You can create custom stores dedicated for specialized activities such as encryption at-rest for a subset of your data. In most cases there is a performance cost to encrypt/decrypt data, and you may not want to do this for the entirety of your data, maybe just a few tables managing PII. This is nicely written up with tangible examples in this blog: cockroachlabs.com/blog/selective-e...
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (2)
Mark, I'm guessing you could take a similar approach to having multiple data store devices?
Yes indeed! Adding additional data-stores is an ideal solution to address several use-cases:
CRDB on high-vCPU worker-nodes: From our production readiness guidelines, we do not recommend workers with > 32 vCPUs. If you're bound to servers with 32 or more vCPUs, the additional store will benefit from the extra compute/processing power by creating additional processes per-store. These include splitting the GC workload, compactions, replica management, WAL, monitoring, etc. In the end you will leverage the additional CPU and will experience less waiting times on I/O operations.
You can create custom stores dedicated for specialized activities such as encryption at-rest for a subset of your data. In most cases there is a performance cost to encrypt/decrypt data, and you may not want to do this for the entirety of your data, maybe just a few tables managing PII. This is nicely written up with tangible examples in this blog: cockroachlabs.com/blog/selective-e...