In this article, we discuss what window functions are and how they help you do your job as a data analyst or specialist.
Outline:
- What Are Window Functions?
- Types Of Window Functions
- Case study example
- How are Window Functions Useful and who should use them?
What Are Window Functions?
Window Functions are types of SQL function that allows you to perform calculations based on data in different groups of rows. The term window describes the set of rows on which the function operates. This window is defined using the OVER clause and can be based on various criteria, such as partitioning and ordering.
Here’s what the window_function syntax looks like:
image credit: database star
The OVER Keyword: indicates that this is to be used as a window function.
The PARTITION BY clause: will let you define the window of data to look at
The ORDER BY clause: defines the order in which the function will run on the data.
Types of Window Functions.
The window functions are divided into four main groups:
Ranking Functions – Assign a rank to each row in the dataset. E.g ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), PERCENT_RANK() etc.
Analytic Functions – This allows you to access the values of the previous or following rows (in relation to the current row). They can also return the first or last value and divide rows into (close to) equal groups. E.g LEAD(), LAG(), FIRST_VALUE(), LAST_VALUE(), etc.
Aggregate Functions – Regular aggregate functions can be used as window functions. Most commonly, you can count the values in the window, sum them, or find their average, minimum, and maximum values. Since these are window functions, you can do this using one or several grouping criteria. MAX(), MIN(), AVG(), SUM(), COUNT(), etc.
Distribution Functions – Calculate the cumulative or relative rank of a value within the dataset. E.g CUME_DIST().
The Data for the Examples
Let’s say we want to calculate the ranking of employees based on their salaries within different departments.
Consider the following sample database table named "Employee":
employee_id – The ID of the employees and the primary key (PK) of the table.
- Employee_Name – The name of the employee
- Department – The department of the employees in the organization.
- Salary – The amount the employees earn in a month.
Now, let's calculate the rank of employees within each department based on their salaries. We'll use the RANK() function for this example:
In this SQL query:
We select the employee's name, department, and salary, and calculate the rank of employees within each department based on their salary.
- The PARTITION BY clause divides the result set into partitions by the "Department" column. This means that ranking will restart for each department.
- The ORDER BY clause sorts the employees within each department by salary in descending order.
- The RANK() function assigns a rank to each employee based on their salary within their respective department.
The result of this query would look like:
In this result, you can see that within each department, employees are ranked based on their salaries, with the highest salary receiving a lower rank. The RANK() function handles ties by assigning the same rank to employees with equal salaries (as seen with Alice and Bob in the IT department or Cathy and John in the HR department).
How Are SQL Window Functions Helpful?
Knowing SQL Window Functions makes writing complex reports easier. In business cases, they can help you rank data, analyze time series data, and make time period comparisons (i.e. year-to-year, quarter-to-quarter, month-to-month. They are perfect for all data analysts who work with SQL. If you want to SQL expert and take your reporting to a new level, then window functions are for you.
Top comments (0)