DEV Community

Simer Plaha
Simer Plaha

Posted on • Edited on

Hash-Index @ SwayDB

Hash-index @ SwayDB

Hash-indexes improves random read performance by 55%!

Hash-index-performance-bar-chart

Hash-indexes like HashSet or HashMap are useful for serving random read requests. They increase performance for randomly accessed data, reduce total number of IOps and save CPU time. But they can also be costly for applications that rarely do random reads like

  • Time-series or event-source data might only require fast sequential reads.
  • Cold storage or archived data might not require random reads at all.

A configurable hash-index is required for SwayDB so various data storage requirements could be handled. The following were the requirements:

  • Optionally enable or disable hash-index.
  • Allow creation of perfect or nearly perfect hash-indexes.
  • Create partially indexed hash-indexes with fallback to alternative indexes like binary-search & linear-search.
  • Copy keys directly into the hash-index.
  • Controlled concurrency.
  • Compression.

Example configuration

The following creates a persistent Map where hash-index is disabled.

Map<Integer, String, Void> map =  
  MapConfig  
    .functionsOff(Paths.get("myMap"), intSerializer(), stringSerializer())  
    .setRandomKeyIndex(RandomKeyIndex.disable())  
    .get();

map.put(1, "one");  
map.get(1); //Optional[one]
Enter fullscreen mode Exit fullscreen mode

The following enables hash-index (RandomKeyIndex) with custom configuration.

Map<Integer, String, Void> map =  
  MapConfig  
    .functionsOff(Paths.get("myMap"), intSerializer(), stringSerializer())  
    .setRandomKeyIndex(  
      RandomKeyIndex  
        .builder()  
        .maxProbe(10) //re-hash 10 times to resolve hash collision
        .minimumNumberOfKeys(20) //minimum keys required
        .minimumNumberOfHits(10) //minimum indexed keys required 
        //use reference format. IndexFormat.copyKeys() can also be used here
        //to copy keys directly into the hash-index
        .indexFormat(IndexFormat.reference())
        .allocateSpace(  
          (RandomKeyIndex.RequiredSpace optimalSpace) ->
            //allocates 3 times more storage than the 
            //default optimal space for higher hit rate. 
            optimalSpace.requiredSpace() * 3   
         )  
         //allow async IO for all IO operations
        .ioStrategy((IOAction ioAction) -> new IOStrategy.AsyncIO(true))  
         //Use LZ4 and fallback to Snappy.
        .compression((UncompressedBlockInfo info) ->  
          Arrays.asList(  
             //try running LZ4 with minimum 20.0% compression    
            Compression.lz4Pair(  
              new Pair(LZ4Instance.fastestInstance(), new LZ4Compressor.Fast(20.0)),  
              new Pair(LZ4Instance.fastestInstance(), LZ4Decompressor.fastDecompressor())  
            ),  
            //if LZ4 fails try Snappy with 20.0% compression.    
            new Compression.Snappy(20.0)  
         )  
       )  
    )  
    .get();

map.put(1, "one");  
map.get(1); //Optional[one]
Enter fullscreen mode Exit fullscreen mode

For a detail documentation of the configurations refer to documentation website.

Useful links

Top comments (0)