DEV Community

UTIBE BASSEY
UTIBE BASSEY

Posted on • Updated on

HNG STAGE ZERO: ANALYZING RETAIL SALES DATA AT FIRST GLANCE

Introduction

The Kaggle dataset “Sample Sales Data” by Kyanyoga provides a sample dataset for sales data analysis. It includes sales data with attributes such as order details, product information, customer details, and geographical data, allowing for various analyses like sales trends, product performance, and revenue analysis. This dataset is useful for practicing data analysis, visualization, and predictive modeling.

  • Purpose: This report aims to analyze the provided sales data to extract meaningful insights that can help understand sales trends, product performance, and revenue generation.

  • Scope: This report covers data exploration, cleaning, analysis, and visualization of the sales data.

Observations

1. Dataset Overview:

  • Source: Kaggle — Sample Sales Data by Kyanyoga

  • Description: The dataset contains 2823 entries and 25 columns including sales records. Columns include order numbers, quantities, prices, sales amounts, dates, statuses, product lines, customer details, and geographical data.

2. Data Exploration
Data Structure:

  • Number of records: 2823
  • Number of features: 25 columns

Features include Product Category, Order Quantity, Sales Value, Date, etc.

Initial Observations: Summary statistics of key features such as mean, median, and standard deviation.

3. Data Cleaning
Missing Values:

  • ADDRESSLINE2 (302 non-null entries).
  • STATE (1337 non-null entries).
  • POSTALCODE (2747 non-null entries).
  • TERRITORY (1749 non-null entries).

Data Types:

  • Numerical: ORDERNUMBER, QUANTITYORDERED, PRICEEACH, ORDERLINENUMBER, SALES, QTR_ID, MONTH_ID, YEAR_ID, MSRP.
  • Categorical: STATUS, PRODUCTLINE, PRODUCTCODE, CUSTOMERNAME, PHONE, ADDRESSLINE1, ADDRESSLINE2, CITY, STATE, POSTALCODE, COUNTRY, TERRITORY, CONTACTLASTNAME, CONTACTFIRSTNAME, DEALSIZE.
  • Date/Time: ORDERDATE.

Visualization

The following are the parameters I used for the sales data visualization;

  • Sales Over Time
  • Top Products by Sales
  • Sales by Region

Sales Over time

In this study, I use a line chart to visualize sales over time between January 2003 and April 2005 based on the Sales Sample Data. The ORDERDATE column was converted to the datetime data format. Find image below.

Image description

The line chart above shows total sales over time (2003 - 2005). The x-axis represents the date (by month), and the y-axis represents the total sales for each month. This visualization helps identify trends and patterns in sales over a given period.

Top Products by Sales

I used bar chart to visualize top products by sales. First, I aggregated the PRODUCTLINE column. Find image below.

Image description

From the above image, find the percentage of sales below by percentage.

  • Motorcycles; 11.7%
  • Classic Cars; 34.3%
  • Trucks and Buses; 10.7%
  • Vintage Cars; 21.5%
  • Planes; 10.8%
  • Ships; 8.3%
  • Trains; 2.7%

From the above statistics, the top 3 highest Products by sales are Classic Cars (34.3%), Vintage Cars (21.5%), and Motorcycles with a staggering 11.7%.

Sales by Region
In order to visualize top sales by region, I aggregated the COUNTRY column using a pie chart. Find image below;

Image description

The above visualization shows the top 3 sales by region with US topping the chart with 35.6%, followed by Spain with 12.1% and France with 11.1%.

Conclusion

This analysis is an overview at first glance of sales transactions

Learn more about HNG by clicking on any of the link below;

https://hng.tech/internship
https://hng.tech/premium

Top comments (0)