DEV Community

Cover image for BigQuery's New JSON Functions: Struct vs. JSON - Choosing the Right Structure
Marcelo Costa
Marcelo Costa

Posted on

BigQuery's New JSON Functions: Struct vs. JSON - Choosing the Right Structure

BigQuery recently expanded its capabilities with new JSON helper functions, as seen on their release notes:
BigQuery new JSON functions

Combined with enhancements to log analytics (which utilizes JSON columns) and the power of search functions across JSON data:
Image descriptionImage description

It's an exciting time to use BigQuery to best leverage these data types. Then let's dive into when to use Struct vs JSON columns in BigQuery, considering their strengths and potential trade-offs.

STRUCT

A simple example:



CREATE TABLE customers (
  customer_id INT64,
  customer_name STRING,
  address STRUCT<
    street STRING,
    city STRING,
    state STRING,
    zip_code STRING
  >,
  contact STRUCT<
    email STRING,
    phone STRING
  >
);


Enter fullscreen mode Exit fullscreen mode

Strengths:

  • Schema Enforcement: Enforces a clear structure, ensuring data consistency and integrity.
    No need to run a JSON_KEYS keys like function, remember? When your data environment starts to get past a few tables to hundreds, that certainly makes a big difference!

  • Query Performance & Cost Savings: Optimized for querying specific nested attributes, leading to potentially faster performance and lower costs for well-structured data.

Illustrative example, referencing a BigQuery public dataset, we see how querying different Struct fields impacts the amount of bytes processed:

Image description

Querying the ci field processes significantly fewer bytes compared to querying the system field, demonstrating the potential cost savings when targeting specific Struct attributes.

  • ci
    Image description

  • system
    Image description

  • Ease of Use: Simple syntax with dot notation for accessing nested fields, making queries more readable.



SELECT 
  customer_name, 
  address.city, 
  contact.email 
FROM customers;


Enter fullscreen mode Exit fullscreen mode

JSON

A simple example:



CREATE TABLE products (
  product_id INT64,
  product_name STRING,
  details JSON
);

SELECT 
  product_name, 
  JSON_EXTRACT_SCALAR(details, '$.color') AS color,
  JSON_VALUE(details, '$.price') AS price
FROM products;



Enter fullscreen mode Exit fullscreen mode

Strengths:

  • Data Exchange: A widely used format for seamless integration with external systems and APIs.
  • Flexibility: Handles dynamic or evolving data structures without schema changes - perfect for unpredictable or unstructured data.

"With great flexibility comes great responsibility"
Image description

The challenge with JSON is handling varying key values. Upstream validation using frameworks like data contracts or other techniques, can help enforce consistency, but if that level of rigor is needed, Structs might be a better fit.

For genuine JSON needs, new functions like JSON_KEYS and JSONPath_mode provide powerful tools for querying and managing your data.
Image description

Choosing the Right Structure

The ideal choice between STRUCT and JSON hinges on your specific data characteristics and priorities:

  • STRUCT: When you require strict schema enforcement, predictable query performance, and ease of use with nested data.
  • JSON: When you need to accommodate flexible or evolving data structures and prioritize seamless data exchange.

Whichever path you choose, BigQuery has you covered! The latest enhancements provide greater control and flexibility in managing both structured and semi-structured data.

Top comments (0)