This article explores a few tips on optimum sharding
strategy in OpenSearch.
Using time-based indices wherever possible. There are a number of advantages of using time-based indices as mentioned in this article.
If unsure, begin with
1
shard. With time-based indices, it offers the flexibility of modifying the number of shards anytime.
E.g.
if the event count per second is 100 and each event is 1KB, then per day
number of events = 100 per sec * 86400 secs in day = 86,40,000
approx size of each event = 1KB
size of all events per day = 1 KB * 86,40,000 = 86,40,000 KB = ~9 GB per day
Each shard is good enough to hold around 30-50
GB data. In the above scenario, with a daily dataset size of 9 GB
, a single shard
should suffice in case of day-wise indices.
- Consider another scenario -
If the event count per second is 200 and each event is 2KB, then per day
number of events = 200 per sec * 86400 secs in day = 1,72,80,000
approx size of each event = 2KB
size of all events per day = 2 KB * 1,72,80,000 = 3,45,60,000 KB = ~34 GB per day
Here also, a single shard might suffice but it would impact indexing making it slower. Opting for 3 primary shards would mean each shard would be ~12 GB.
For scenario discussed in point 3, the shards of size ~12 GB might look too smaller but then past indices being read-only could be force-merged to 1 segment. Alternatively, the no of shards could be reduced for past indices by re-indexing them, e.g. say reindex day-wise indices to monthly indices and then force-merge them. This could lead to 30 day-wise indices with each index have 1 shard (thereby total 30 shards for 30 indices) become a single monthly index with say 9 or 12 shards depending on the size of shards.
The best way is to experiment and find out what works best. Day-wise indices offer scope to experiment as the template could be easily modified to vary the no of shards for newly created indices.
Keep shards EVEN-sized even for different types of indices. Eg. say
twitter
index has5 shards
each of10 GB
, then designposts
index such that the shard size for posts index is also approx around10-15 GB
or10-20 GB
. The reason being, iftwitter
index shard is10 GB
andposts
index shard is say50 GB
, then it might lead to un-even disk space.
Feel free to add your questions / thoughts in the comments below.
Top comments (0)