DEV Community

Cover image for Optimum Sharding strategy in OpenSearch
Sandeep Kanabar for AWS Community Builders

Posted on

3

Optimum Sharding strategy in OpenSearch

This article explores a few tips on optimum sharding strategy in OpenSearch.

  1. Using time-based indices wherever possible. There are a number of advantages of using time-based indices as mentioned in this article.

  2. If unsure, begin with 1 shard. With time-based indices, it offers the flexibility of modifying the number of shards anytime.
    E.g.

if the event count per second is 100 and each event is 1KB, then per day
number of events = 100 per sec * 86400 secs in day = 86,40,000
approx size of each event = 1KB
size of all events per day = 1 KB * 86,40,000 = 86,40,000 KB = ~9 GB per day
Enter fullscreen mode Exit fullscreen mode

Each shard is good enough to hold around 30-50 GB data. In the above scenario, with a daily dataset size of 9 GB, a single shard should suffice in case of day-wise indices.

  1. Consider another scenario -
If the event count per second is 200 and each event is 2KB, then per day
number of events = 200 per sec * 86400 secs in day = 1,72,80,000
approx size of each event = 2KB
size of all events per day = 2 KB * 1,72,80,000 = 3,45,60,000 KB = ~34 GB per day

Enter fullscreen mode Exit fullscreen mode

Here also, a single shard might suffice but it would impact indexing making it slower. Opting for 3 primary shards would mean each shard would be ~12 GB.

  1. For scenario discussed in point 3, the shards of size ~12 GB might look too smaller but then past indices being read-only could be force-merged to 1 segment. Alternatively, the no of shards could be reduced for past indices by re-indexing them, e.g. say reindex day-wise indices to monthly indices and then force-merge them. This could lead to 30 day-wise indices with each index have 1 shard (thereby total 30 shards for 30 indices) become a single monthly index with say 9 or 12 shards depending on the size of shards.

  2. The best way is to experiment and find out what works best. Day-wise indices offer scope to experiment as the template could be easily modified to vary the no of shards for newly created indices.

  3. Keep shards EVEN-sized even for different types of indices. Eg. say twitter index has 5 shards each of 10 GB, then design posts index such that the shard size for posts index is also approx around 10-15 GB or 10-20 GB. The reason being, if twitter index shard is 10 GB and posts index shard is say 50 GB, then it might lead to un-even disk space.

Feel free to add your questions / thoughts in the comments below.

Sentry workshop image

Sick of your mobile apps crashing?

Let Simon Grimm show you how to fix them without the guesswork. Join the workshop and get to debugging.

Save your spot →

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay