Marcelo Costa

Posted on Aug 25

BigQuery's New JSON Functions: Struct vs. JSON - Choosing the Right Structure

#bigquery #googlecloud #database

BigQuery recently expanded its capabilities with new JSON helper functions, as seen on their release notes:

Combined with enhancements to log analytics (which utilizes JSON columns) and the power of search functions across JSON data:

It's an exciting time to use BigQuery to best leverage these data types. Then let's dive into when to use Struct vs JSON columns in BigQuery, considering their strengths and potential trade-offs.

STRUCT

A simple example:



CREATE TABLE customers (
  customer_id INT64,
  customer_name STRING,
  address STRUCT<
    street STRING,
    city STRING,
    state STRING,
    zip_code STRING
  >,
  contact STRUCT<
    email STRING,
    phone STRING
  >
);

Strengths:

Schema Enforcement: Enforces a clear structure, ensuring data consistency and integrity.
No need to run a JSON_KEYS keys like function, remember? When your data environment starts to get past a few tables to hundreds, that certainly makes a big difference!
Query Performance & Cost Savings: Optimized for querying specific nested attributes, leading to potentially faster performance and lower costs for well-structured data.

Illustrative example, referencing a BigQuery public dataset, we see how querying different Struct fields impacts the amount of bytes processed:

Querying the ci field processes significantly fewer bytes compared to querying the system field, demonstrating the potential cost savings when targeting specific Struct attributes.

ci
system
Ease of Use: Simple syntax with dot notation for accessing nested fields, making queries more readable.



SELECT 
  customer_name, 
  address.city, 
  contact.email 
FROM customers;

JSON

A simple example:



CREATE TABLE products (
  product_id INT64,
  product_name STRING,
  details JSON
);

SELECT 
  product_name, 
  JSON_EXTRACT_SCALAR(details, '$.color') AS color,
  JSON_VALUE(details, '$.price') AS price
FROM products;

Strengths:

Data Exchange: A widely used format for seamless integration with external systems and APIs.
Flexibility: Handles dynamic or evolving data structures without schema changes - perfect for unpredictable or unstructured data.

"With great flexibility comes great responsibility"

The challenge with JSON is handling varying key values. Upstream validation using frameworks like data contracts or other techniques, can help enforce consistency, but if that level of rigor is needed, Structs might be a better fit.

For genuine JSON needs, new functions like JSON_KEYS and JSONPath_mode provide powerful tools for querying and managing your data.

Choosing the Right Structure

The ideal choice between STRUCT and JSON hinges on your specific data characteristics and priorities:

STRUCT: When you require strict schema enforcement, predictable query performance, and ease of use with nested data.
JSON: When you need to accommodate flexible or evolving data structures and prioritize seamless data exchange.

Whichever path you choose, BigQuery has you covered! The latest enhancements provide greater control and flexibility in managing both structured and semi-structured data.

DEV Community

BigQuery's New JSON Functions: Struct vs. JSON - Choosing the Right Structure

STRUCT

JSON

Choosing the Right Structure

Top comments (0)

Read next

Why Running Databases on Kubernetes is a Recipe for Disaster: The Case for a New Platform Designed for Stateful Workloads

Persistent Storage vs. Cache Storage: Understanding the Differences and Use Cases

Exploring InstantDB: A New Approach to Local-First Databases

Optimizing Laravel Performance: Addressing Slowdowns with Large Datasets