We all know that Memory IO is 50-200 times faster than Disk IO!
Caching plays a decent role in boosting read and compaction performance but some machines & smaller devices (eg: mobile phones) might have not have enough memory for cache so a configurable cache allowing disabled, partial or full cache is required.
Note - A Segment(.seg)
file in SwayDB is simply a byte array that stores other bytes arrays like keys, values, indexes etc (Array<Array<Byte>>
). All these bytes can be cached based on any condition which is configurable.
Configuring IO and Cache
When accessing any file with a custom format we generally
- Open the file (
OpenResource
) - Read the file header or info (
ReadDataOverview
) to understand the files content eg: format etc. - Finally read the content of the file (
Compressed
orUncompressed
data).
The following sample ioStrategy
function does exactly that where we get an IOAction
that describes what IO is being performed by SwayDB and in our function we define how we want to perform IO (IOStrategy
) for that action and also configure caching for the read data/bytes.
.ioStrategy(
(IOAction ioAction) -> {
if (ioAction.isOpenResource()) {
//Here we are just opening the file so do synchronised IO because
//blocking when opening a file might be cheaper than thread
//context switching. Also set cacheOnAccess to true so that other
//concurrent threads accessing the same file channel do not
//open multiple channels to the same file.
return new IOStrategy.SynchronisedIO(true);
} else if (ioAction.isReadDataOverview()) {
//Data overview is always small and less than 128 bytes and can be
//read sychronously to avoid switching threads. Also cache
//this data (cacheOnAccess) for the benifit of other threads and to save IO.
return new IOStrategy.SynchronisedIO(true);
} else {
//Here we are reading actual content of the file which can be compressed
//or uncompressed.
IOAction.DataAction action = (IOAction.DataAction) ioAction;
if (action.isCompressed()) {
//If the data is compressed we do not want multiple threads to concurrently
//decompress it so perform either Async or Sync IO for decompression
//and then cache the compressed data. You can also read the compressed
//and decompressed size with the following code
//IOAction.ReadCompressedData dataAction = (IOAction.ReadCompressedData) action;
//dataAction.compressedSize();
//dataAction.decompressedSize();
return new IOStrategy.AsyncIO(true);
} else {
//Else the data is not compressed so we allow concurrent access to it.
//Here cacheOnAccess can also be set to true but that could allow multiple
//threads to concurrently cache the same data. If cacheOnAccess is required
//then use Asyc or Sync IO instead.
return new IOStrategy.ConcurrentIO(false);
}
}
}
)
You will find the above ioStrategy
property in all data-blocks that form a Segment - SortedKeyIndex, RandomKeyIndex, BinarySearchIndex, MightContainIndex & ValuesConfig.
A Segment itself is also a data-block and it's ioStrategy
can also be configured via SegmentConfig.
Cache control/limit with MemoryCache
Caching should be controlled so that it does not lead to memory overflow!
You can enable or disable caching for any or all of the following
- Bytes within a Segment (
ByteCacheOnly
). - Parsed key-values (
KeyValueCacheOnly
). - Or all the above (
MemoryCache.All
).
By default ByteCacheOnly
is used because KeyValueCacheOnly
uses an in-memory SkipList
and inserts to a large SkipList
are expensive which is not useful for general use-case. But KeyValueCacheOnly
can be useful for applications that perform multiple reads to the same data and if that data rarely changes.
An Actor
configuration is also required here which manages the cache in the background. You can configure the Actor to be a Basic
, Timer
or TimerLoop
.
The following demoes how to configured all caches.
//Byte cache only
.setMemoryCache(
MemoryCache
.byteCacheOnlyBuilder()
.minIOSeekSize(4096)
.skipBlockCacheSeekSize(StorageUnits.mb(4))
.cacheCapacity(StorageUnits.gb(2))
.actorConfig(new ActorConfig.Basic((ExecutionContext) DefaultConfigs.sweeperEC()))
)
//or key-value cache only
.setMemoryCache(
MemoryCache
.keyValueCacheOnlyBuilder()
.cacheCapacity(StorageUnits.gb(3))
.maxCachedKeyValueCountPerSegment(Optional.of(100))
.actorConfig(new Some(new ActorConfig.Basic((ExecutionContext) DefaultConfigs.sweeperEC())))
)
//or enable both the above.
.setMemoryCache(
MemoryCache
.allBuilder()
.minIOSeekSize(4096)
.skipBlockCacheSeekSize(StorageUnits.mb(4))
.cacheCapacity(StorageUnits.gb(1))
.maxCachedKeyValueCountPerSegment(Optional.of(100))
.sweepCachedKeyValues(true)
.actorConfig(new ActorConfig.Basic((ExecutionContext) DefaultConfigs.sweeperEC()))
)
minIOSeekSize
The blockSize which set the minimum number of bytes to read for each IO. For example in the above configuration if you ask for 6000 bytes
then 4096 * 2 bytes
will be read.
The value to set depends on your machines block size. On Mac this can be read with the following command:
diskutil info / | grep "Block Size"
which returns
Device Block Size: 4096 Bytes
Allocation Block Size: 4096 Bytes
skipBlockCacheSeekSize
This skips the BlockCache
and perform direct IO if the data size is greater than this value.
cacheCapacity
Sets the total memory capacity. On overflow the oldest data in the cache is dropped by the Actor
.
maxCachedKeyValueCountPerSegment
If set, each Segment
is initialised with a dedicated LimitSkipList
. This cache is managed by the Actor
or by the Segment
itself if it gets deleted or when the max limit is reached.
sweepCachedKeyValues
Enables clearing cached key-values via the Actor
. If false
, key-values are kept in-memory indefinitely unless the Segment
gets deleted. This configuration can be used for smaller databases (eg: application configs) that read the same data more often.
Memory-mapping (MMAP)
MMAP can also be optionally enabled for all files.
Map<Integer, String, Void> map =
MapConfig
.functionsOff(Paths.get("myMap"), intSerializer(), stringSerializer())
.setMmapAppendix(true) //enable MMAP for appendix files
.setMmapMaps(true) //enable MMAP for LevelZero write-ahead log files
.setSegmentConfig( //configuring MMAP for Segment files
SegmentConfig
.builder()
...
//either disable memory-mapping Segments
.mmap(MMAP.disabled())
//or enable for writes and reads.
.mmap(MMAP.writeAndRead())
//or enable for reads only.
.mmap(MMAP.readOnly())
...
)
.get();
map.put(1, "one");
map.get(1); //Optional[one]
Summary
You are in full control of Caching & IO and can configure it to suit your application needs. If your IOStrategy
configurations uses only AsyncIO
and ConcurrentIO
then you can truely build reactive applications which are non-blocking end-to-end other than the file system IO performed by java.nio.*
classes. Support for Libio to provide aysnc file system IO can be implemented as a feature if requested.
Useful links
- SwayDB on GitHub.
- Java examples repo.
- Kotlin examples repo.
- Scala examples repo.
- Documentation.
Top comments (1)
No worries. Let me know if there are any questions.