DEV Community

Cover image for Functions for manipulating data in PostgreSQL
Marriane Akeyo
Marriane Akeyo

Posted on • Edited on

Functions for manipulating data in PostgreSQL

PostgreSQL is an open source, object-relational database management system (DBMS) that provides robust features and scalability for handling large and complex datasets. It follows the SQL (Structured Query Language) standard and extends it with additional capabilities.Some common data types it supports includes:

  • Text eg CHAR, VARCHAR and TEXT
  • Numeric eg INT and DECIMAL
  • Date/time eg DATE, TIME, TIMESTAMP, INTERVAL
  • Arrays It also supports JSON, XML, hstore (key-value store), and other specialized data types to handle diverse data requirements. To determine dataypes from existing tables you can query the INFORMATION SCHEMA.
SELECT
     column_name,
     data_type,
FROM INFORMATION_SCHEMA.COLUMNS
WHERE column_name IN ('title', 'description',                                                     'special_features)
AND table_name = 'name_of_your_table';
Enter fullscreen mode Exit fullscreen mode

Array manipulation

It allows you to store and manipulate arrays of values.
Arrays can contain elements of any data type, including built-in types, user-defined types, or even arrays of arrays.

Array Declaration Syntax:
To declare a variable of array type in PL/pgSQL (PostgreSQL's procedural language), you use the same syntax. As shown,
array_name data_type [];

For example:

CREATE TABLE grades (
  email text[][],
  test_scores int[]
);
Enter fullscreen mode Exit fullscreen mode

Array Initialization:
Array literals can be created using the curly braces "{ }" notation. For example, initializing the arrays declared above:

INSERT INTO grades (email, test_scores)
VALUES (
     {{"work","work@gmail.com},{"other","other@gmail.com"}},
     {92,85,56,88}
);
Enter fullscreen mode Exit fullscreen mode

Accessing Array Elements:

SELECT 
     email [1][1] AS type,
     email [1][2] AS email,
     test_scores [1] AS scores
FROM grades;
Enter fullscreen mode Exit fullscreen mode

Note : Postgres arrays start with 1 and not 0

The ANY() function

It is an array operator in PostgreSQL that allows you to compare a value with an array and returns true if the value matches any element of the array. It is commonly used in combination with other comparison operators, such as =, <>, <, >, <=, >= and many more.

The syntax is as follows:
ANY(array_expression)

We can use the ANY() function to manipulate the data selected from the grades table as shown:

SELECT 
     email [1][1] AS type,
     email [1][2] AS email,
     test_scores [1] AS scores
FROM grades
WHERE 'other' = ANY(email);
Enter fullscreen mode Exit fullscreen mode

The array containment operator (@>)

It is used in PostgreSQL to check if an array contains all the elements of another array, verifying if an array is a subset of another array, or filtering rows based on array containment.It evaluates to true if the left-hand array contains all the values in the right-hand array.

The syntax is as shown:
array_expression @> array_to_check
It thus checks if the array represented by array_expression contains all the elements in the array_to_check.

Note: The @> operator checks for containment, meaning it requires the left-hand array to contain all the elements without any duplicates. If you need to check for equality (exact match), you can use the = operator instead.

We can thus obtain the same results above when we use the @> operator.

SELECT 
     email[1][1] AS type,
     email[1][2] AS email,
     test_scores[1] AS scores
FROM grades
WHERE email @> ARRAY['other'];
Enter fullscreen mode Exit fullscreen mode

Both queries will return the same result set, consisting of rows where the "email" array contains the value 'other'. The choice between using ANY() or the @> operator depends on your preference and the specific requirements of your application.

Date/Time manipulation

The date/time data types are used to store and manipulate temporal data.They include:

  • date: Stores a calendar date (year, month, day). Example:'2023-07-06'
  • time: Stores the time of day (hour, minute, second). Example: '14:30:45'
  • time with time zone: Similar to time, but includes time zone information.Example: '14:30:45+05:30'
  • timestamp: Stores both date and time (year, month, day, hour, minute, second). Example: '2023-07-06 14:30:45'
  • timestamp with time zone: Similar to timestamp, but includes time zone information.Example: '2023-07-06 14:30:45+05:30'
  • interval: Represents a duration of time. Example: '1 day 3 hours'

PostgreSQL also provides a rich set of functions and operators for manipulating and performing operations on date/time data. These include functions to extract specific components (e.g., year, month, day) from a date/time value, perform arithmetic operations on dates and intervals, and format dates/times as strings. As we shall see bellow.

We can start with simple calculations such as :

  • When you subtract two date time values you get an integer datatype.
  • Subtracting two timestamps gives us an interval.
  • Adding an integer to a date we get a date value equivalent to the sum of the two. SELECT date '2020-09-11' + integer '3';
  • You can also manipulate date time data using other arithmetic operations. SELECT timestamp '2023-07-06 14:30:45' + 21 * interval '1 day 3 hours'

AGE()

This function allows us to calculate the difference between two timestamps.

-- table creation
CREATE TABLE sample (
     start_date timestamp,
     end_date timestamp
     );

-- insert into table
INSERT INTO sample (start_date, end_date)
VALUES ('2023-07-01 12:00:00', '2023-07-06 14:30:00');

-- timestamp difference
SELECT AGE(start_date, end_date) AS diff
FROM sample;
Enter fullscreen mode Exit fullscreen mode

Retrieving the current timestamp

The CURRENT_TIMESTAMP and the NOW() functions are used to retrieve the current timestamp in the format 'YYYY-MM-DD HH:MI:SS+TZ'. Both functions provide the current date and time information. You can use these functions in SELECT queries, INSERT statements, or anywhere you need to retrieve the current timestamp in your SQL statements.

  1. CURRENT_TIMESTAMP Function
SELECT CURRENT_TIMESTAMP;
Enter fullscreen mode Exit fullscreen mode

Example Output: '2023-07-06 14:30:45+05:30'

  1. NOW() Function
SELECT NOW();
Enter fullscreen mode Exit fullscreen mode

Example Output: '2023-07-06 14:30:45+05:30'

Casting enables the conversion from one timestamp to another. We can thus get the now value without a timestamp as shown

SELECT NOW() :: timestamp;

You can also use the CAST() function to get the same results.

`SELECT CAST(NOW() as timestamp);

To make things much easier you can use CURRENT_DATE as shown:

SELECT CURRENT_DATE;

To select just the time without date value:

SELECT CURRENT_TIME;

Extracting and Transforming date and datetime data

The EXTRACT() function

Used to retrieve specific components or fields from date or datetime data.
The syntax is as follows:

EXTRACT(field FROM source)

Some examples include:

`
SELECT EXTRACT(MINUTE FROM INTERVAL '3 hours 30 minutes') AS extracted_minute;

-- example two
SELECT EXTRACT(YEAR FROM '2023-07-06'::DATE) AS extracted_year;

-- example three
SELECT EXTRACT(quarter FROM TIMESTAMP '2023-07-06 14:30:00') AS quarter;
`

The DATE_PART() function

It performs the same function as extract but their syntax differ a bit as shown.

DATE_PART(field,source)

Lets redefine the examples above

`
SELECT DATE_PART(MINUTE, INTERVAL '3 hours 30 minutes') AS extracted_minute;

-- example two
SELECT DATE_PART(YEAR, '2023-07-06'::DATE) AS extracted_year;

-- example three
SELECT DATE_PART(quarter, TIMESTAMP '2023-07-06 14:30:00') AS quarter;
`
DATE_ADD
Used to retrieve the next day's date.For example:
Given the table :
------------+-----------+-------------+
| product_id | new_price | date |
+------------+-----------+-------------+
| 1 | 20 | 2019-08-14 |
| 2 | 50 | 2019-08-12 |
| 3 | 35 | 2019-08-16 |
| 4 | 65 | 2019-08-17 |
| 5 | 20 | 2019-08-18 |
+------------+-----------+-------------+

`
SELECT
curr_date,
DATE_ADD(change_date, INTERVAL 1 DAY) AS day_after
FROM your_table_name;

`

The DATE_TRUNC function

It is used to round off timestamps or interval values to a specified precision. It allows you to remove the smaller units of time from a given date or timestamp while preserving the desired level of precision.
The syntax is as shown:

DATE_TRUNC(unit, source)

Lets look at some examples:

`
-- returns the year in the formart '2023-01-01'
SELECT DATE_TRUNC('year', DATE '2023-07-06') AS truncated_date;

-- returns '2023-07-06 14:00:00'
SELECT DATE_TRUNC('hour', TIMESTAMP '2023-07-06 14:30:45') AS truncated_timestamp;

-- returns '2005-05-01'
SELECT DATE_TRUNC( 'month', TIMESTAMP '2005-05-21 15:30:30');
`

Text manipulation

It is made up of VARCHAR, CHAR and Text data.
Text datatype is considered to be mostly used in any database system and can be manipulated in various ways.

String concatenation

This includes merging of two or more strings together or a non-string data with a string data.
It can be done using pipes(||) or the CONCAT() function.


SELECT first_name,
last_name
first_name || ' ' || last_name AS full_name
FROM customers

The full name column will now contain the last_name and first_name of a customer separated by a space.
The same results can be achieved using the CONCAT() function as shown


SELECT first_name,
last_name,
CONCAT(first_name ' ' last_name) AS full_name
FROM customers;

Concatenating multiple rows

GROUP_CONCAT() is used to concatenate values from multiple rows into a single string. It is often used in conjunction with the GROUP BY clause to aggregate data in a way that combines multiple rows into one, typically for reporting or display purposes.
Consider a simplified example with a table called orders:

sql

+---------+-----------+
| order_id| product |
+---------+-----------+
| 1 | Apple |
| 2 | Banana |
| 3 | Orange |
| 1 | Banana |
| 2 | Apple |
+---------+-----------+

If you want to concatenate the distinct product names for each order, sorted lexicographically and separated by a comma, you can use:

SELECT
order_id,
GROUP_CONCAT(DISTINCT product ORDER BY product ASC SEPARATOR ', ') AS products
FROM orders
GROUP BY order_id;

The result would be:

+---------+-----------------+
| order_id| products |
+---------+-----------------+
| 1 | Apple, Banana |
| 2 | Apple, Banana |
| 3 | Orange |
+---------+-----------------+

Changing the case of a string

Some commonly used functions to change the case of strings include:
-** UPPER()**: converts a string to uppercase. For example:

SELECT UPPER('Hello World') AS upper_case;

Output: 'HELLO WORLD'

  • LOWER(): converts a string to lowercase. For example:

SELECT LOWER('Hello World') AS lower_case;

Output: 'hello world'

  • INITCAP(): converts a string to camel case. For Example:

SELECT INITCAP('hello world') AS init_cap;

Output: 'Hello World'

Replacing a string

The REPLACE() function locates a string and replaces it with another string.
For example:


SELECT
REPLACE (currency, 'dollars', 'shillings) AS new_currency
FROM salary

The above query replaces all the dollar strings to shillings in the currency column and selects them as new_currency.

Parsing string and character data

The CHAR_LENGTH() function
This function is used to determine the number of characters in a string.
Syntax:

CHAR_LENGTH(string)

Example:
Assuming we have a table called film with film titles and descriptions, we can find the title length as shown:


SELECT title,
CHAR_LENGTH(title) AS len
FROM films;

The LENGTH() function can be used interchangeably with CHAR_LENGTH() function to produce similar results, and is mostly preferred.


SELECT title,
LENGTH(title) AS len
FROM films;

The POSITION() function
It is used to return the position of a string from left to right.
Syntax: POSITION('string' IN source)
Example:
Selecting from the table created in Array manipulation above


SELECT email
POSITION('@' IN email)
FROM grades;

The STRPOS() function is an alias of POSITION() with a slightly different syntax as shown:

STRPOS(source, 'string')

For example:


SELECT email
STRPOS(email, '@')
FROM grades;

The LEFT() function
Allows us to select a set of characters from the left of a string.
Syntax:

LEFT(column, number_limit)

Example:
Considering a film table with title and description columns


SELECT
LEFT(description, 50)
FROM film;

The RIGHT() function
Allows us to select a set of characters from the right side of a string.
Syntax:

RIGHT(column, number_limit)

Example:
Considering a film table with title and description columns


SELECT
RIGHT(description, 50)
FROM film;

The SUBSTRING() function
This function selects characters in between a string.
Syntax:

SUBSTRING(column, start, end)

Example:


SELECT
SUBSTRING(description, 10, 50)
FROM film;

The substring function can be combined with other functions to provide more context.For example selecting the name before @ in an email

`
-- before @
SELECT
SUBSTRING(email, 0, POSITION('@' IN email)) AS name
FROM grades;

-- after @
SELECT
SUBSTRING(email, POSITION('@' IN email)+1 FOR LENGTH (email) ) AS domain
FROM grades;
or
SELECT
SUBSTRING(email, FROM POSITION('@' IN email)+1) AS domain
FROM grades;
`

The SUBSTR() is another alternative of SUBSTRING but does not allow other functions in it or the FOR keyword.


SELECT
SUBSTR(description, 10, 50)
FROM film;

The TRIM() function
It is used to remove whitespaces from strings.
It follows the syntax

TRIM([leading|trailing|both] [characters] from string)

If the first parameter is not given, the default value is both.
If the second parameter is not given the default value is blank lines.
Most of the times only the string is passed.

SELECT TRIM(' john doe ')

Output: 'john doe'

Other trim like functions include:
LTRIM() - which removes whitespaces only from the beginning of the string.

SELECT LTRIM(' john doe ')

Output: 'john doe '

RTRIM() - which removes whitespaces at the end of the string

SELECT TRIM(' john doe ')

Output: ' john doe'

The LPAD and RPAD functions
They append a spesified number of characters to a string with LPAD() appending to the left of a string and RPAD() to the right of a string.
Syntax: LPAD/RPAD(string, no_of_times, string_to_be_added)
Example:

LPAD('food', 10, '*');

If no 'string_to_be_added' is provided then the string is padded by white spaces.

Text searching
The LIKE character as learnt in introduction to sql is used to match characters and strings.
We can use _ to match exactly one character or % to match more than one character as shown.


SELECT
title
FROM film
WHERE title LIKE 'Queen%';
or
WHERE title LIKE '%Charlotte';

Either of the where clauses will return the title 'Queen Charlotte'
However, LIKE is case sensitive and might not give us accurate results if we are performing a full text search and is also quite expensive in terms of performance.
We can therefore utilize the to_tsvector() and to_tsquery() in our where clause which perform natural language queries of text and data in the database, as shown:


SELECT
title
FROM film
WHERE to_tsvector(title)@@to_tsquery('Queen');

tsvector vectorises the title and tsquery searches for the subquery within the vectors.

Extending the Capabilities of PostgreSQL

The CREATE_TYPE command
It is used to create user defined datatypes with specific attributes and behaviors.
It registers the type in a postgres db and can be used where postgres expects a type name.

  1. Creating a Composite Type They are user-defined types that can contain multiple fields or attributes. Syntax:


CREATE TYPE type_name AS (
attribute1 datatype1,
attribute2 datatype2,
...
);

Example:


CREATE TYPE person_type AS (
name text,
age integer,
email text
);

Usage:

`
-- create a table using the composite type
CREATE TABLE persons (
id serial primary key,
person person_type
);

-- insert into the table
INSERT INTO persons (person)
VALUES (('John Doe', 30, 'john@example.com'));

-- query the table
SELECT person.name, person.age, person.email
FROM persons;
`

  1. Creating an Enumerated Type: It allows you to define a fixed set of values that a column or variable can take. Syntax:

CREATE TYPE type_name AS ENUM ('value1', 'value2', ...);

Example:

CREATE TYPE day_of_week AS ENUM ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');

Once created, these custom types can be used in table definitions, column definitions, function parameters, and more.

User defined functions
They allow you to create custom functions tailored to your specific requirements. You can define functions that encapsulate complex logic, perform calculations, manipulate data, or provide any other functionality you need.

Syntax:


CREATE FUNCTION function_name (parameter1 data_type1, parameter2 data_type2, ...)
RETURNS return_type
AS $$
-- Function body goes here
BEGIN
-- SQL statements or other logic
RETURN result_value;
END;
$$
LANGUAGE plpgsql; -- or other supported language like SQL, Python, etc.

Example:

`
CREATE FUNCTION square (i integer)
RETURNS integer
AS $$
BEGIN
RETURN i * i;
END;
$$
LANGUAGE plpgsql;

-- you then can use it in SQL queries just like any built-in function
SELECT square(5) AS square_result;
`

LEVEINSTEIN() - claculates the distance required to make 2 strings simmilar in values between 0 and 1 , 1 being most accurate.

SIMILARIRY()- claclulates how simmilar or different two strings are from each other

Postgres Extensions
PostgreSQL provides a command called CREATE EXTENSION to install an extension.
Syntax:

CREATE EXTENSION extension_name;

Example:

CREATE EXTENSION hstore;

Extensions can also be installed using package managers, such as apt, yum, or brew, depending on the operating system.
Some popular extensions include:

  • PostGIS: Adds support for geographic objects and spatial indexing, enabling the use of PostgreSQL as a geospatial database.
  • pgcrypto: Provides cryptographic functions for secure data storage and retrieval.
  • pg_trgm: Offers fuzzy string matching and indexing capabilities.
  • hstore: Allows storing key-value pairs within a single column, providing a basic NoSQL-like functionality.
  • uuid-ossp: Generates universally unique identifiers (UUIDs) using various algorithms.
  • pg_stat_statements: Collects statistics about SQL statements executed in a database, aiding in query optimization.
  • PostPic: allows for image processing within the db

To view extensions available for install query


SELECT name
FROM pg_available_extension;

To view already installed extensions


SELECT extname
FROM pg_extension;

You can read more about the usage of postgres extensions in the postgreSQL documentation.

Thank you for reading, I hope this helps you dive deeper into postgres. Stay curious!!! and Happy hacking.

Top comments (0)