DEV Community

Cover image for Populating a Vertica Database with Random Data
Dmitry Romanoff
Dmitry Romanoff

Posted on

Populating a Vertica Database with Random Data

In this article, we’ll explore how to populate a Vertica database table with random data, specifically focusing on a sales table. This process is beneficial for testing, simulating data scenarios, and validating query performance without relying on real data. Let's walk through the entire procedure, from table creation to data insertion.

Step 1: Creating the Sales Table

First, we need to define the structure of our sales table. This table will contain various fields such as transaction_id, customer_id, product_id, quantity, price, transaction_date, and location.

Here’s the SQL statement to create the table:

CREATE TABLE sales (
    transaction_id INT,
    customer_id INT,
    product_id INT,
    quantity INT,
    price DECIMAL(10, 2),
    transaction_date DATE,
    location VARCHAR(50)
);
Enter fullscreen mode Exit fullscreen mode

This table design captures essential attributes of sales transactions, making it suitable for various types of analyses.

Step 2: Inserting Random Data

Next, we’ll insert random data into the sales table. This is where we leverage SQL functions to generate a diverse range of values for each column.

The Insert Statement

Here’s the SQL statement used to populate the sales table:

INSERT INTO sales
SELECT
    ROW_NUMBER() OVER () AS transaction_id,
    (RANDOM() % 1000000) + 1 AS customer_id,
    (RANDOM() % 10000) + 1 AS product_id,
    (RANDOM() % 10) + 1 AS quantity,
    ROUND(RANDOM() * 100, 2) AS price,
    DATE '2023-01-01' + (RANDOM() % 365) AS transaction_date,
    (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[RANDOM() % 5 + 1] AS location 
FROM (
    SELECT 1 AS dummy FROM 
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS a,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS b,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS c,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS d,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS e,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS f,
        (SELECT 1 UNION ALL 
        SELECT 2 UNION ALL 
        SELECT 3 UNION ALL 
        SELECT 4 UNION ALL 
        SELECT 5 UNION ALL 
        SELECT 6 UNION ALL 
        SELECT 7 UNION ALL 
        SELECT 8 UNION ALL 
        SELECT 9 UNION ALL 
        SELECT 0) AS g
) AS numbers
LIMIT 10000000;
Enter fullscreen mode Exit fullscreen mode

Breakdown of the Insert Statement

  • Generating Unique IDs: ROW_NUMBER() OVER () generates a unique transaction ID for each row.
  • Random Customer and Product IDs: (RANDOM() % 1000000) + 1 ensures we generate random customer IDs, and similar logic is applied for product IDs.
  • Quantity and Price: We randomly generate quantities between 1 and 10 and prices up to 100, rounded to two decimal places.
  • Transaction Dates: By adding a random number of days to a base date (January 1, 2023), we generate random transaction dates within a year.
  • Locations: We select a random location from a predefined array of cities.

The final statement inserts 10 million random records into the sales table, simulating a realistic sales dataset.

Step 3: Verifying Data Insertion

After executing the insertion, you can verify the number of rows in the sales table with:

SELECT COUNT(1) FROM sales;
Enter fullscreen mode Exit fullscreen mode

You should see an output confirming that 10 million records were successfully inserted:

 count   
----------
 10000000
Enter fullscreen mode Exit fullscreen mode

Sample Data

To view a few entries from the sales table, you can run:

SELECT * FROM sales LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

This will display the first 10 records, allowing you to inspect the structure and values.

VMart=> select * from sales limit 10;
 transaction_id | customer_id | product_id | quantity | price | transaction_date |  location   
----------------+-------------+------------+----------+-------+------------------+-------------
        1048577 |           2 |          2 |        2 | 15.67 | 2023-01-01       | Los Angeles
        1048578 |           1 |          2 |        1 | 60.93 | 2023-01-01       | Los Angeles
        1048579 |           2 |          2 |        1 | 75.34 | 2023-01-01       | Chicago
        1048580 |           1 |          1 |        2 | 26.36 | 2023-01-01       | Chicago
        1048581 |           2 |          2 |        2 | 95.09 | 2023-01-01       | Los Angeles
        1048582 |           2 |          2 |        2 | 90.85 | 2023-01-01       | Los Angeles
        1048583 |           2 |          1 |        1 |  2.88 | 2023-01-01       | Los Angeles
        1048584 |           1 |          1 |        2 |  8.58 | 2023-01-01       | Chicago
        1048585 |           2 |          1 |        2 | 61.93 | 2023-01-01       | Los Angeles
        1048586 |           2 |          2 |        1 | 99.82 | 2023-01-01       | Chicago
(10 rows)

VMart=> 

Enter fullscreen mode Exit fullscreen mode

Conclusion

Populating a Vertica database with random data is a straightforward process that can significantly enhance your testing and development efforts. By following the steps outlined in this article, you can create a rich dataset that mimics real-world sales transactions. This setup can be especially useful for performance testing, data analysis, and application development.

Top comments (0)