Have you ever started a project and asked - "should I use a SQL or NoSQL database?"
It’s a big decision. There are multiple horror stories of developers choosing a NoSQL database and later regretting it.
But now you can get the best of both worlds with JSON in PostgreSQL.
In this article I cover the benefits of using JSON, anti-patterns to avoid, and an example of how to use JSON in Postgres.
Table of Contents
- When to use JSON in Postgres
- JSON Basics.
- Types of JSON in PostgreSQL.
- Creating a JSON table
- How to query JSON data
- JSONPath: advanced JSON syntax
- Arctype: An Easy Tool for Databases
Why Use a SQL Database for Non-Relational Data?
First we have to briefly cover the advantages of using SQL vs NoSQL.
The difference between SQL and NoSQL is the data model. SQL databases use a relational data model, and NoSQL databases usually use a document model. A key difference is how each data model handles data normalization.
Data normalization is the process of splitting data into “normal forms” to reduce data redundancy. The concept was first introduced in the 1970s as a way to reduce spending on expensive disk storage.
In the example above, we have a normalized entity relationship diagram for a school database. The StudentClass
table stores every class a student has taken. By normalizing the data, we only keep one row for each class in the Class
table, instead of duplicating class data for every student in the class.
But what if we also wanted to track every lunch order (entree, sides, drink, snacks, etc) to send each student a summary at the end of every week?
In this case it would make more sense to store the data in a single document instead of normalizing it. Students will always be shown their entire lunch order, so we can avoid expensive joins by keeping the lunch order data together.
Instead of maintaining a separate NoSQL database, we now store lunch orders as JSON objects inside an existing relational Postgres database.
What is JSON?
Name | Data type |
---|---|
student_id | Integer |
order_date | Date |
order_details | Object |
sides | Array |
JSON, or Javascript Object Notation, is a flexible format to pass data between applications, similar to a csv file. However, instead of rows and columns, JSON objects are collections of key/value pairs.
According to Stack Overflow, JSON is now the most popular data interchange format, beating csv, yaml, and xml.
The original creator of JSON, Douglas Crockford, attributes the success of JSON to its readability by both developers and machines, similar to why SQL has been dominant for almost 50 years.
The JSON format is easy to understand, but also flexible enough to handle both primitive and complex data types.
Evolution of JSON in PostgreSQL Plain JSON type
In 2012, PostgreSQL 9.2 introduced the first JSON
data type in Postgres. It had syntax validation but underneath it stored the incoming document directly as text with white spaces included. It wasn’t very useful for real world querying, index based searching and other functionalities you would normally do with a JSON document.
JSONB
In late 2014, PostgreSQL 9.4 introduced the JSONB
data type and most importantly improved the querying efficiency by adding indexing.
The JSONB
data type stores JSON as a binary type. This introduced overhead in processing since there was a conversion involved but it offered the ability to index the data using GIN/Full text based indexing and included additional operators for easy querying.
JSONPath
With JSON’s increasing popularity, the 2016 SQL Standard brought in a new standard/path language for navigating JSON data. It’s a powerful way of searching JSON data very similar to XPath for XML data. PostgreSQL 12 introduced support for the JSON Path standard.
We will see examples of JSON, JSONB, and JSONPath in the sections below. An important thing to note is that all JSON functionality is natively present in the database. There is no need for a contrib module or an external package to be installed.
JSON Example in Postgres
Lets create a Postgres table to store lunch orders with a JSON data type.
create table LunchOrders(student_id int, order json);
Now we can insert JSON formatted data into our table with an INSERT
statement.
insert into LunchOrders values(100, '{
"order_date": "2020-12-11",
"order_details": {
"cost": 4.25,
"entree": ["pizza"],
"sides": ["apple", "fries"],
"snacks": ["chips"]}
}'
);
insert into LunchOrders values(100, '{
"order_date": "2020-12-12",
"order_details": {
"cost": 4.89,
"entree": ["hamburger"],
"sides": ["apple", "salad"],
"snacks": ["cookie"]}
}'
);
If you do a Select *
from the table, you would see something like below.
Inserting data into a JSONB
column is exactly the same, except we change the data type to jsonb
.
create table LunchOrders(student_id int, orders jsonb);
How to Query JSON Data in Postgres
Querying data from JSON objects uses slightly different operators than the ones that we use for regular data types ( =
, <
, >
, etc).
Here are some of the most common JSON operators:
Operator | Description |
---|---|
-> | Select a key/value pair |
->> | Filter query results |
#> | Selecting a nested object |
#>> | Filter query results in a nested object |
@> | Check if an object contains a value |
The ->
and ->>
operators work with both JSON
and JSONB
type data. The rest of the operators are full text search operators and only work with the JSONB
data type.
Let's see some examples of how to use each operator to query data in our LunchOrders
table.
Getting values from a JSON object
We can use the ->
operation to find every day that a specific student bought a school lunch.
select orders -> 'order_date'
from lunchorders
where student_id = 100;
Filtering JSON data using a WHERE
clause
We can use the ->>
operator to filter for only lunch orders on a specific date.
select orders
from lunchorders
where orders ->> 'order_date' = '2020-12-11';
This query is similar to the =
operator that we would normally use, except we have to first add a ->>
operator to tell Postgres that the order_date
field is in the orders
column.
Getting data from an array in a JSON object
Let's say we wanted to find every side dish that a specific student has ordered.
The sides
field is nested inside the order_details
object, but we can access it by chaining two ->
operators together.
select
orders -> 'order_date',
orders -> 'order_details' -> 'sides'
from
lunchorders
where
student_id = 100;
Great now we have arrays of the sides that student 100 ordered each day! What if we only wanted the first side in the array? We can chain together a third ->
operator and give it the array index we're looking for.
Retrieving nested values from a JSON object
Instead of chaining together multiple ->
operators, we can also use the #>
operator to specify a path for retrieving a nested value.
select orders #> '{order_details, sides}'
from lunchorders;
?column?
--------------------
["apple", "fries"]
["apple", "salad"]
(2 rows)
Checking if a JSON object contains a value
Lets say we wanted to see every order a student made that had a side salad. We can't use the previous ->>
for filtering because sides
is an array of values.
To check if an array or object contains a specific value, we can use the @>
operator:
select
orders
from
lunchorders
where
orders -> 'order_details' -> 'sides' @> '["salad"]';
orders
----------
{"order_date": "2020-12-12", "order_details": {"cost": 4.89, "sides": ["apple", "salad"], "entree": ["hamburger"], "snacks": ["cookie"]}}
(1 row)
JSONPath: The Final Boss
JSON Path is a powerful tool for searching and manipulating a JSON object in SQL using JavaScript-like syntax:
- Dot (.) is used for member access.
- Square brackets ("[]") are used for array access.
- SQL/JSON arrays are 0-indexed, unlike regular SQL arrays that start from 1.
Built-in functions
JSONPath also includes powerful built-in functions like size()
to find the length of arrays.
Let's use the JSONPath size()
function to get every order that had >= 1 snack.
select *
from lunchorders
where orders @@ '$.order_details.snacks.size() > 0';
Comparison without type casting
JSONPath also enables comparisons without explicit type casting:
select *
from lunchorders
where orders @@ '$.order_details.cost > 4.50';
This is what the same query would look like with our regular JSON comparisons:
select *
from lunchorders
where (orders -> 'order_details' ->> 'cost')::numeric > 4.50;
JSON Summary
In this article we've covered:
- When to use SQL vs NoSQL
- A history of JSON in Postgres
- Examples of how to work with JSON data
- JSON query performance with indexing
- JSON anti-patterns
Working with JSON data can be complicated. Arctype is a free, modern SQL editor that makes working with databases easier. Try it out today and leave a comment below with your thoughts!
Sneak peek at part 2 of the JSON in Postgres series:
With our existing table, the database engine has to scan through the entire table to find a record. This is called a sequential scan.
Sequential scans become degrade in performance as the dataset grows.
Follow to get notified of Part 2 where I will show show how to index a dataset with 700k rows to improve query performance by 350X.
Top comments (0)