DEV Community

Cover image for The Strength of Vertica DB: A Case Study in Sales Analytics
Dmitry Romanoff
Dmitry Romanoff

Posted on

The Strength of Vertica DB: A Case Study in Sales Analytics

In today’s data-driven world, organizations need robust database solutions that can handle vast amounts of information while delivering quick insights. Vertica DB stands out as an exemplary platform designed for high-performance analytics. This article showcases its capabilities through a case study involving a simulated sales dataset.

Creating the Sales Table

To illustrate the power of Vertica, we begin by creating a sales table that captures essential transaction details. The structure includes fields for transaction ID, customer ID, product ID, quantity, price, transaction date, and location:

CREATE TABLE sales (
    transaction_id INT,
    customer_id INT,
    product_id INT,
    quantity INT,
    price DECIMAL(10, 2),
    transaction_date DATE,
    location VARCHAR(50)
);
Enter fullscreen mode Exit fullscreen mode

Generating a Massive Dataset

To fully leverage Vertica’s capabilities, we populate the sales table with a massive dataset of 50 million records. This is done using a series of SQL commands that utilize randomization to simulate realistic sales data:

INSERT INTO sales
SELECT
    ROW_NUMBER() OVER () AS transaction_id,
    (RANDOM() % 1000000) + 1 AS customer_id,
    (RANDOM() % 10000) + 1 AS product_id,
    (RANDOM() % 10) + 1 AS quantity,
    ROUND(RANDOM() * 100, 2) AS price,
    DATE '2023-01-01' + (RANDOM() % 365) AS transaction_date,
    (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[RANDOM() % 5 + 1] AS location 
FROM (
    SELECT 1 AS dummy FROM 
        (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 0) AS a,
        -- (continued with multiple unions to create a large dataset)
) AS numbers
LIMIT 50000000;
Enter fullscreen mode Exit fullscreen mode

Once executed, we confirmed the insertion of 50 million records:

SELECT count(1) FROM sales;
Enter fullscreen mode Exit fullscreen mode

The result indicates the successful creation of our dataset.

Analyzing Sales Data

Total Sales by Product

Using Vertica’s advanced analytical capabilities, we can quickly aggregate sales data to identify the top-selling products. The query below summarizes total sales for each product, returning results in a matter of milliseconds:

SELECT product_id, SUM(quantity * price) AS total_sales
FROM sales
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Result:

The top two products generate significant revenue, with total sales exceeding $1.87 billion for each.

Execution Time:

First fetch: 861.693 ms

Monthly Sales Trends

Understanding trends over time is crucial for businesses. The following query groups total sales by month, allowing us to see how sales fluctuate across the year:

SELECT DATE_TRUNC('month', transaction_date) AS month, SUM(quantity * price) AS total_sales
FROM sales
GROUP BY month
ORDER BY month;
Enter fullscreen mode Exit fullscreen mode

Result:

The total sales for January 2023 amounted to over $3.74 billion.

Execution Time:

First fetch: 1150.676 ms

Running Total of Sales

For deeper insights, we can calculate a running total of sales over a specified range. This analysis helps businesses track performance trends over time:

SELECT transaction_date, 
       SUM(quantity * price) OVER (ORDER BY transaction_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS running_total
FROM sales;
Enter fullscreen mode Exit fullscreen mode

Result:

This query returns a running total that enables the visualization of cumulative sales, which is invaluable for sales forecasting and performance analysis.

The Strength of Vertica DB

Conclusion

The results from this case study highlight the strengths of Vertica DB in handling large-scale data and performing complex analytics efficiently. With its ability to execute queries on vast datasets in seconds, Vertica emerges as a powerful tool for businesses seeking to harness the potential of their data.

Whether for tracking sales performance, identifying trends, or calculating cumulative metrics, Vertica’s performance is exemplary, making it a top choice for organizations focused on data analytics and business intelligence.

Top comments (0)