Introduction
In this article, we will examine the purpose of Manticore Columnar storage, how it differs from the row-wise storage, and in which cases it makes sense to use it. We will also get acquainted with the basic structure of the storage format and the specifics of its integration into the query processing workflow of the search daemon.
Default Attribute storage (row-wise)
In Manticore, there are two distinct entities: full-text fields, which support only full-text queries, and attributes of various types, which can be used for grouping, sorting, and filtering. Default storage engine (engine='rowwise'
) stores all attributes of all documents in memory.
To load attributes into memory, mmap is used, which can be configured through the options access_plain_attrs
and access_blob_attrs
. The first option is responsible for loading .spa
files, which contain all fixed-length attributes (integer, bigint, float, etc.). The second option is for loading .spb
files, which contain variable-length attributes (string, mva, float_vector, etc.). Since mmap loads data into memory only when accessed, initial queries using attributes can be slower. To mitigate this, the 'access_plain_attrs' and 'access_blob_attrs' by default is set to the 'mmap_preread' mode in which it makes Manticore read .spa
and .spb
files a background thread on start. This usually means (but does not guarantee) that attribute files will be in memory, avoiding the need to read data from the disk during queries. However, if the system decides that there is insufficient memory, attributes might be partially or fully unloaded from memory, slowing down queries again. To avoid this, 'mlock' can be specified in access_plain_attrs
/ access_blob_attrs
. If there is enough memory, attributes will be guaranteed to stay in memory, and the system will not unload them. However, if memory is insufficient, there are no guarantees.
Why do we need Columnar storage?
In situations where the available memory is sufficient to store all attributes, the traditional 'row-wise' storage works efficiently. Problems arise when there isn't enough memory for all attributes.
Indeed, mmap can automatically unload and reload parts of attribute files as needed, allowing operations even with limited memory. However, in practice, this approach can significantly reduce performance to unacceptable levels. The issue partially lies in the row-wise storage architecture, which stores all attributes of one document sequentially, followed by the attributes of the next document, and so on. Here is an example of such a data table:
The .spa
file has the following structure:
The .spb
file looks like this:
When a query includes grouping by a specific attribute, only the value of that attribute for each document needs to be retrieved. However, mmap works in such a way that it cannot read individual bytes; instead, it loads one or more memory pages, each usually 4 KB in size. This means that when attempting to read the value of one attribute, the system often loads the values of all nearby attributes, even if they are not needed for query processing.
Another nuance of the rowwise
storage is the lack of data compression. Since it operates through direct memory access, the code assumes that ready-to-use data is immediately available. Additionally, the data of different attributes within one document is heterogeneous, making it difficult to effectively compress a single document. Moreover, there is no concept of document blocks that could be compressed within this storage format.
The columnar storage (engine='columnar'
) was developed precisely for situations where there is not enough memory to load all attributes. This storage format offers the following advantages:
- Data for each attribute is stored separately and can be read without affecting other attributes.
- Since the data within a single attribute is usually homogeneous, compression can be applied.
- Attribute values can be divided into blocks of several documents for compression.
- After unpacking blocks, optimizations for stream processing can be applied.
- A very small piece of metadata is stored in RAM, with everything else on the disk.
- For fast access to hot data, instead of loading pages into memory through mmap, the file system cache is used.
Schematically, the data structure in columnar storage can be represented as follows:
Columnar storage is delivered in a separate library called MCL (Manticore Columnar Library). This library is responsible for creating storage, packaging, and reading data.
Additionally, a significant amount of code was added to the search daemon itself to handle attributes, considering the specifics of the columnar storage.
Initially, the daemon implemented the rowwise
storage, which is aimed at random data access. One could simply replace the in-memory data access with the columnar storage, but doing so without considering that the columnar storage is designed for stream processing and stores data in compressed blocks would severely degrade performance.
Here are a few examples of what was added to the daemon to work with the columnar storage:
Columnar sorter
In Manticore's architecture, documents found via full-text search are immediately sent to what is called the Sorter. The Sorter can simply sort, or it can group documents, calculate aggregates, and much more. However, access to attributes in columnar storage is slow because values are retrieved one by one.
If a query is relatively light — meaning the full-text search is fast or not present at all — retrieving attributes from storage one at a time can significantly impact performance. Therefore, in some cases, it is beneficial to use an additional sorter on top of the standard Sorter. This additional sorter does not sort but accumulates documents and retrieves columnar attributes in batches before passing the documents to the main Sorter for final processing, which is significantly faster.Columnar grouper
The Grouper in Manticore is responsible for transforming incoming documents into one or more grouping keys for subsequent transfer to the Sorter. The main goal of the columnar grouper is to improve performance by reducing the number of virtual calls. For example, when a regular grouper operates, it requests data from an expression that retrieves data from storage. This expression is implemented on the daemon side and provides transparent access to different types of attribute storages. Internally, it uses a columnar iterator (discussed below), implemented on the MCL library side, which directly accesses the storage. The specialized columnar grouper, however, knows it will only work with the columnar storage and removes some abstractions, working directly with the columnar iterator.
When grouping by strings, a different mechanism is used. When strings are added to the columnar storage, a hash of each string (considering the current collation) is automatically calculated and stored as a separate integer attribute. The columnar grouper retrieves these hashes from the columnar storage instead of reading the string itself and calculating the hash to use as a grouping key.Similarly, by removing the columnar expression, which through the columnar iterator retrieves the attribute value for the current document, columnar filters and columnar aggregates are implemented in the daemon.
The query optimizer (Cost based optimizer, CBO) also needs to be aware of the columnar storage. The CBO determines the execution path of a query. For instance, it can replace one of the filters with a search through the corresponding secondary index, removing the filter from the query. Consequently, the secondary index will return document numbers (row IDs), on which the remaining filters will operate.
When columnar storage is used, the optimizer must compare the performance of the secondary index and the columnar analyzer (which performs a quick search on the columnar storage, more on this below). Depending on the data and the number of available threads, sometimes one method works faster than the other. For example, the columnar analyzer parallelizes better.
What is Included in MCL?
Among the components that are directly part of the MCL library, two main groups can be distinguished:
1.Builder
Responsible for receiving raw data from the daemon and constructing the columnar storage.
2.Accessors
Responsible for data access. There are two main types:
Iterators
A slower method of accessing the storage. An iterator can move to a specified document and extract attribute values. It works slower because it is not adapted for stream processing. However, in some cases, the daemon cannot process data in a stream, and this is when columnar iterators are used.Analyzers
A significantly faster method of accessing data. Analyzers take a set of filters as input and output the document numbers (row IDs) that satisfy these filters. They work quickly because they use all available metadata of the columnar storage during data processing and can immediately unpack and process many documents in a single call. For instance, they can entirely skip data blocks (without unpacking them) if the min-max values of the block indicate that the desired values are not present.
In the daemon, this type of access is automatically enabled based on CBO analysis but can also be manually enabled through index hints; this type of hint is called ColumnarScan.
Conclusion
In this article, we have broadly covered the principles of Manticore Columnar Storage and some details of its integration into the Manticore Search daemon.
Top comments (0)