In this article, we’ll explore how to populate a Vertica database table with random data, specifically focusing on a sales table. This process is beneficial for testing, simulating data scenarios, and validating query performance without relying on real data. Let's walk through the entire procedure, from table creation to data insertion.
Step 1: Creating the Sales Table
First, we need to define the structure of our sales table. This table will contain various fields such as transaction_id, customer_id, product_id, quantity, price, transaction_date, and location.
Here’s the SQL statement to create the table:
CREATE TABLE sales (
transaction_id INT,
customer_id INT,
product_id INT,
quantity INT,
price DECIMAL(10, 2),
transaction_date DATE,
location VARCHAR(50)
);
This table design captures essential attributes of sales transactions, making it suitable for various types of analyses.
Step 2: Inserting Random Data
Next, we’ll insert random data into the sales table. This is where we leverage SQL functions to generate a diverse range of values for each column.
The Insert Statement
Here’s the SQL statement used to populate the sales table:
INSERT INTO sales
SELECT
ROW_NUMBER() OVER () AS transaction_id,
(RANDOM() % 1000000) + 1 AS customer_id,
(RANDOM() % 10000) + 1 AS product_id,
(RANDOM() % 10) + 1 AS quantity,
ROUND(RANDOM() * 100, 2) AS price,
DATE '2023-01-01' + (RANDOM() % 365) AS transaction_date,
(ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[RANDOM() % 5 + 1] AS location
FROM (
SELECT 1 AS dummy FROM
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 0) AS a,
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 0) AS b,
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 0) AS c,
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 0) AS d,
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 0) AS e,
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 0) AS f,
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 0) AS g
) AS numbers
LIMIT 10000000;
Breakdown of the Insert Statement
- Generating Unique IDs: ROW_NUMBER() OVER () generates a unique transaction ID for each row.
- Random Customer and Product IDs: (RANDOM() % 1000000) + 1 ensures we generate random customer IDs, and similar logic is applied for product IDs.
- Quantity and Price: We randomly generate quantities between 1 and 10 and prices up to 100, rounded to two decimal places.
- Transaction Dates: By adding a random number of days to a base date (January 1, 2023), we generate random transaction dates within a year.
- Locations: We select a random location from a predefined array of cities.
The final statement inserts 10 million random records into the sales table, simulating a realistic sales dataset.
Step 3: Verifying Data Insertion
After executing the insertion, you can verify the number of rows in the sales table with:
SELECT COUNT(1) FROM sales;
You should see an output confirming that 10 million records were successfully inserted:
count
----------
10000000
Sample Data
To view a few entries from the sales table, you can run:
SELECT * FROM sales LIMIT 10;
This will display the first 10 records, allowing you to inspect the structure and values.
VMart=> select * from sales limit 10;
transaction_id | customer_id | product_id | quantity | price | transaction_date | location
----------------+-------------+------------+----------+-------+------------------+-------------
1048577 | 2 | 2 | 2 | 15.67 | 2023-01-01 | Los Angeles
1048578 | 1 | 2 | 1 | 60.93 | 2023-01-01 | Los Angeles
1048579 | 2 | 2 | 1 | 75.34 | 2023-01-01 | Chicago
1048580 | 1 | 1 | 2 | 26.36 | 2023-01-01 | Chicago
1048581 | 2 | 2 | 2 | 95.09 | 2023-01-01 | Los Angeles
1048582 | 2 | 2 | 2 | 90.85 | 2023-01-01 | Los Angeles
1048583 | 2 | 1 | 1 | 2.88 | 2023-01-01 | Los Angeles
1048584 | 1 | 1 | 2 | 8.58 | 2023-01-01 | Chicago
1048585 | 2 | 1 | 2 | 61.93 | 2023-01-01 | Los Angeles
1048586 | 2 | 2 | 1 | 99.82 | 2023-01-01 | Chicago
(10 rows)
VMart=>
Conclusion
Populating a Vertica database with random data is a straightforward process that can significantly enhance your testing and development efforts. By following the steps outlined in this article, you can create a rich dataset that mimics real-world sales transactions. This setup can be especially useful for performance testing, data analysis, and application development.
Top comments (0)