BigQuery recently expanded its capabilities with new JSON helper functions, as seen on their release notes:
Combined with enhancements to log analytics (which utilizes JSON columns) and the power of search functions across JSON data:
It's an exciting time to use BigQuery to best leverage these data types. Then let's dive into when to use Struct vs JSON columns in BigQuery, considering their strengths and potential trade-offs.
STRUCT
A simple example:
CREATE TABLE customers (
customer_id INT64,
customer_name STRING,
address STRUCT<
street STRING,
city STRING,
state STRING,
zip_code STRING
>,
contact STRUCT<
email STRING,
phone STRING
>
);
Strengths:
Schema Enforcement: Enforces a clear structure, ensuring data consistency and integrity.
No need to run aJSON_KEYS
keys like function, remember? When your data environment starts to get past a few tables to hundreds, that certainly makes a big difference!Query Performance & Cost Savings: Optimized for querying specific nested attributes, leading to potentially faster performance and lower costs for well-structured data.
Illustrative example, referencing a BigQuery public dataset, we see how querying different Struct fields impacts the amount of bytes processed:
Querying the ci
field processes significantly fewer bytes compared to querying the system
field, demonstrating the potential cost savings when targeting specific Struct attributes.
Ease of Use: Simple syntax with dot notation for accessing nested fields, making queries more readable.
SELECT
customer_name,
address.city,
contact.email
FROM customers;
JSON
A simple example:
CREATE TABLE products (
product_id INT64,
product_name STRING,
details JSON
);
SELECT
product_name,
JSON_EXTRACT_SCALAR(details, '$.color') AS color,
JSON_VALUE(details, '$.price') AS price
FROM products;
Strengths:
- Data Exchange: A widely used format for seamless integration with external systems and APIs.
- Flexibility: Handles dynamic or evolving data structures without schema changes - perfect for unpredictable or unstructured data.
The challenge with JSON is handling varying key values. Upstream validation using frameworks like data contracts or other techniques, can help enforce consistency, but if that level of rigor is needed, Structs might be a better fit.
For genuine JSON needs, new functions like JSON_KEYS and JSONPath_mode provide powerful tools for querying and managing your data.
Choosing the Right Structure
The ideal choice between STRUCT and JSON hinges on your specific data characteristics and priorities:
- STRUCT: When you require strict schema enforcement, predictable query performance, and ease of use with nested data.
- JSON: When you need to accommodate flexible or evolving data structures and prioritize seamless data exchange.
Whichever path you choose, BigQuery has you covered! The latest enhancements provide greater control and flexibility in managing both structured and semi-structured data.
Top comments (0)