Dmitry Romanoff

Posted on Oct 19

Populating a Vertica Database with Random Data

#database #devops #sql #productivity

In this article, we’ll explore how to populate a Vertica database table with random data, specifically focusing on a sales table. This process is beneficial for testing, simulating data scenarios, and validating query performance without relying on real data. Let's walk through the entire procedure, from table creation to data insertion.

Step 1: Creating the Sales Table

First, we need to define the structure of our sales table. This table will contain various fields such as transaction_id, customer_id, product_id, quantity, price, transaction_date, and location.

Here’s the SQL statement to create the table:

CREATE TABLE sales (
    transaction_id INT,
    customer_id INT,
    product_id INT,
    quantity INT,
    price DECIMAL(10, 2),
    transaction_date DATE,
    location VARCHAR(50)
);

This table design captures essential attributes of sales transactions, making it suitable for various types of analyses.

Step 2: Inserting Random Data

Next, we’ll insert random data into the sales table. This is where we leverage SQL functions to generate a diverse range of values for each column.

The Insert Statement

Here’s the SQL statement used to populate the sales table:

INSERT INTO sales
SELECT
    ROW_NUMBER() OVER () AS transaction_id,
    (RANDOM() % 1000000) + 1 AS customer_id,
    (RANDOM() % 10000) + 1 AS product_id,
    (RANDOM() % 10) + 1 AS quantity,
    ROUND(RANDOM() * 100, 2) AS price,
    DATE '2023-01-01' + (RANDOM() % 365) AS transaction_date,
    (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[RANDOM() % 5 + 1] AS location 
FROM (
    SELECT 1 AS dummy FROM 
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS a,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS b,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS c,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS d,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS e,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS f,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS g
) AS numbers
LIMIT 10000000;

Breakdown of the Insert Statement

Generating Unique IDs: ROW_NUMBER() OVER () generates a unique transaction ID for each row.
Random Customer and Product IDs: (RANDOM() % 1000000) + 1 ensures we generate random customer IDs, and similar logic is applied for product IDs.
Quantity and Price: We randomly generate quantities between 1 and 10 and prices up to 100, rounded to two decimal places.
Transaction Dates: By adding a random number of days to a base date (January 1, 2023), we generate random transaction dates within a year.
Locations: We select a random location from a predefined array of cities.

The final statement inserts 10 million random records into the sales table, simulating a realistic sales dataset.

Step 3: Verifying Data Insertion

After executing the insertion, you can verify the number of rows in the sales table with:

SELECT COUNT(1) FROM sales;

You should see an output confirming that 10 million records were successfully inserted:

 count   
----------
 10000000

Sample Data

To view a few entries from the sales table, you can run:

SELECT * FROM sales LIMIT 10;

This will display the first 10 records, allowing you to inspect the structure and values.

VMart=> select * from sales limit 10;
 transaction_id | customer_id | product_id | quantity | price | transaction_date |  location   
----------------+-------------+------------+----------+-------+------------------+-------------
        1048577 |           2 |          2 |        2 | 15.67 | 2023-01-01       | Los Angeles
        1048578 |           1 |          2 |        1 | 60.93 | 2023-01-01       | Los Angeles
        1048579 |           2 |          2 |        1 | 75.34 | 2023-01-01       | Chicago
        1048580 |           1 |          1 |        2 | 26.36 | 2023-01-01       | Chicago
        1048581 |           2 |          2 |        2 | 95.09 | 2023-01-01       | Los Angeles
        1048582 |           2 |          2 |        2 | 90.85 | 2023-01-01       | Los Angeles
        1048583 |           2 |          1 |        1 |  2.88 | 2023-01-01       | Los Angeles
        1048584 |           1 |          1 |        2 |  8.58 | 2023-01-01       | Chicago
        1048585 |           2 |          1 |        2 | 61.93 | 2023-01-01       | Los Angeles
        1048586 |           2 |          2 |        1 | 99.82 | 2023-01-01       | Chicago
(10 rows)

VMart=>

Conclusion

Populating a Vertica database with random data is a straightforward process that can significantly enhance your testing and development efforts. By following the steps outlined in this article, you can create a rich dataset that mimics real-world sales transactions. This setup can be especially useful for performance testing, data analysis, and application development.

DEV Community

Populating a Vertica Database with Random Data

Step 1: Creating the Sales Table

Step 2: Inserting Random Data

Step 3: Verifying Data Insertion

Conclusion

Top comments (0)

Read next

Create a Windows 11 VM that is highly available

Understanding Kubernetes: part 53 – Kubernetes 1.32 Changelog

8 Must-Learn Backend Development Tools for 2025: Your Ultimate Guide

Move Data from Oracle to ClickHouse in Minutes