Window Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
Window functions in SQL allow you to perform calculations across a set of table rows related to the current row while maintaining all the rows of the original dataset.
1. ROW_NUMBER() Function
Purpose: Assigns a unique sequential integer to rows within a partition of a result set, starting from 1.
Use Case: When you want to give each row a unique identifier within a group, regardless of ties.
Example: Assume you have a table employees with columns employee_id, department_id, and salary.
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM employees;
Explanation: This query assigns a unique row number to each employee within each department, sorted by descending salary. If two employees in the same department have the same salary, they will have different row numbers.
2. RANK() Function
Purpose: Assigns a rank to each row within a partition of a result set, with rows having equal values receiving the same rank. The rank increments by the number of tied rows.
Use Case: Useful when you need to rank data with ties and want to skip ranks after ties.
Example:
SELECT
employee_id,
department_id,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees;
Explanation: Employees within the same department are ranked by salary in descending order. If two employees have the same salary, they receive the same rank, and the next rank is incremented accordingly. For example, if two employees are ranked 1, the next rank is 3.
3. DENSE_RANK() Function
Purpose: Similar to RANK(), but does not leave gaps in the ranking sequence.
Use Case: Useful when you need consecutive ranking without gaps.
Example:
SELECT
employee_id,
department_id,
salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM employees;
Explanation: Employees with the same salary receive the same rank, but the next rank is not skipped. For instance, if two employees are ranked 1, the next rank is 2.
4. NTILE() Function
Purpose: Divides the result set into a specified number of roughly equal groups and assigns a number to each row indicating which group it belongs to.
Use Case: Useful for distributing data into quantiles or groups.
Example:
SELECT
employee_id,
department_id,
salary,
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS quartile
FROM employees;
Explanation: The NTILE(4) function divides the employees into four groups within each department, sorted by salary in descending order. Each row is assigned a number from 1 to 4, representing the group it belongs to.
5. LEAD()
Provides access to a subsequent row (following row) in the result set without the need for self-joins.
Syntax:
LEAD(column_name, offset, default_value) OVER (PARTITION BY column_name ORDER BY column_name)
offset: The number of rows forward from the current row. The default is 1.
default_value: A default value to return if the lead row is beyond the bounds of the window.
Use Case: Compare a row with its following row or access future values in a time series.
Example:
SELECT
EmployeeID,
Name,
Salary,
LEAD(Salary, 1) OVER (ORDER BY Salary) AS NextSalary
FROM Employees;
6. LAG()
Similar to LEAD(), but provides access to a previous row in the result set.
Syntax:
LAG(column_name, offset, default_value) OVER (PARTITION BY column_name ORDER BY column_name)
Use Case: Compare a row with its preceding row or access historical values in a time series.
Example:
SELECT
EmployeeID,
Name,
Salary,
LAG(Salary, 1) OVER (ORDER BY Salary) AS PreviousSalary
FROM Employees;
Aggregate Functions over Windows: SUM(), AVG(), MIN(), MAX()
Aggregate functions can also be applied over a window of rows using the OVER() clause to provide cumulative or moving calculations.
Examples:
SUM():
Computes the running total or cumulative sum over a window of rows.
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY salary DESC) AS cumulative_salary
FROM employees;
Explanation: Calculates a running total of salaries in descending order. Each row shows the total salary up to that point.
AVG():
Computes the average value over a window of rows.
SELECT
employee_id,
department_id,
salary,
AVG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS avg_salary
FROM employees;
Explanation: Calculates the average salary for each employee within their department, ordered by salary. The average is recomputed for each row in the result set.
MIN() and MAX():
Determine the minimum or maximum value over a window of rows.
SELECT
employee_id,
department_id,
salary,
MIN(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS min_salary
Top comments (0)