I just watched this talk about storing JSON in ClickHouse. It's quite insightful. My takeaways:
- There are performance-trimmed json-value-extraction functions that avoid parsing the entire blob in favor of performance. JSON in String columns compresses pretty well.
- There're functions to convert simple JSON to maps and when you store the same data from the talk's example as a map instead of JSON, queries are 4-5x faster.
- You can insert data into a table with engine=null. Such a table doesn't store any data but it can be the source of a Materialized View.
- It's pretty common for users to store the JSON in one column and then extract specific values into columns next to that. That will already boost performance if you're smart about it. You can further enhance performance by defining Data-Skipping Indexes on those columns.
- All the approaches explored in the talk are documented here.
- There's a beta feature for the future of JSON data in ClickHouse with many under the hood optimizations (essentially, inserted data is schema-inferred and clickhouse will automatically create columns for all fields (even nested) and it will even create new columns as json object with new fields flow in).
Top comments (0)