DEV Community

Cover image for Mastering MySQL Aggregate Functions: Simplifying Data Analysis
senkae.ll
senkae.ll

Posted on

Mastering MySQL Aggregate Functions: Simplifying Data Analysis

In today’s world of data-driven decision-making, efficiency and precision are key to success. MySQL’s aggregate functions are powerful tools that allow users to quickly compute, analyze, and summarize data. This article will introduce you to MySQL aggregate functions, making complex data operations simple and efficient.

Tools:
-Database: MySQL community 8.1
-GUI:SQLynx Pro 3.5.0

Sample Data:

CREATE TABLE student_score (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  subject VARCHAR(50),
  score INT
);

INSERT INTO
  student_score (id, name, subject, score)
VALUES
  (1, 'Tom', 'Math', 80),
  (2, 'Tom', 'English', 90),
  (3, 'Tim', 'English', 98),
  (4, 'Alice', 'Math', 85),
  (5, 'Alice', 'English', 87),
  (6, 'Bob', 'Math', 78),
  (7, 'Bob', 'Science', null),
  (8, 'Charlie', 'History', 92),
  (9, 'Charlie', 'Math', 81),
  (10, 'Diana', 'English', 93);
Enter fullscreen mode Exit fullscreen mode

1. COUNT()

  • Purpose: Returns the number of rows that match a specified condition.
  • Note: COUNT(*) counts all rows, including those with NULL values. COUNT(column) counts non-NULL values in the specified column.
  • Example:

count

2. SUM()

  • Purpose: Returns the sum of values in a numeric column.
  • Note: Only non-NULL values are included in the sum. If all values are NULL, it returns NULL.
  • Example:

SUM

SUM

3. AVG()

  • Purpose: Calculates the average value of a numeric column.
  • Note: Only non-NULL values are considered. AVG() returns NULL if there are no non-NULL values.
  • Example:

AVG

4. MAX()

  • Purpose: Returns the maximum value from a column.
  • Note: Works with numeric, date, and string types. Ignores NULL values.
  • Example:

MAX

MAX

5. MIN()

  • Purpose: Returns the minimum value from a column.
  • Note: Like MAX(), it works with numeric, date, and string types, and ignores NULL values.
  • Example:

MIN

MIN

6. GROUP_CONCAT()

  • Purpose: Concatenates values from a column into a single string, with an optional separator.
  • Note: Useful for aggregating strings from different rows into one. You can specify a separator (default is a comma). Only non-NULL values are concatenated.
  • Example:

GROUP_CONCAT

7. JSON_ARRAYAGG()

  • Purpose: Aggregates values from multiple rows into a JSON array.
  • Note: It converts the result set of a column into a JSON array. Only non-NULL values are included in the resulting array.
  • Example:

JSON_ARRAYAGG

8. JSON_OBJECTAGG()

  • Purpose: Aggregates key-value pairs from multiple rows into a JSON object.
  • Note: The first argument provides the keys, and the second provides the values for the resulting JSON object. Only non-NULL key-value pairs are included in the result.
  • Example:

JSON_OBJECTAGG

9. STD()

  • Purpose: Computes the standard deviation of a numeric column, reflecting the amount of variation or dispersion in the dataset.
  • Note: Both STD() and STDDEV() are aliases for STDDEV_POP(), which calculates the population standard deviation. Only non-NULL values are considered. If you need to compute the sample standard deviation, use STDDEV_SAMP().
  • Example:

STD

10. STD_SAMP()

  • Purpose: Calculates the sample standard deviation of a numeric column, providing a measure of how spread out the values are in a sample dataset.
  • Note: Only non-NULL values are considered. Unlike STD() or STDDEV(), which calculate the population standard deviation, STD_SAMP() is specifically used for sample data, dividing by n-1 to account for sample size bias.
  • Example:

STD_SAMP

11. VAR_POP()

  • Purpose: Calculates the population variance of a numeric column, measuring how data points in the entire population are spread out.
  • Note: Only non-NULL values are considered. VAR_POP() is used when the data represents the entire population, dividing by n (the total number of data points).
  • Example:

VAR_POP

12. VAR_SAMP()

  • Purpose: Calculates the sample variance of a numeric column, measuring how data points in a sample are spread out.
  • Note: Only non-NULL values are considered. VAR_SAMP() is used when the data represents a sample of the population, dividing by n-1 to adjust for sample size and avoid bias.
  • Example:

VAR_SAMP

13. BIT_AND()

  • Purpose: Returns the bitwise AND of all values in a column.
  • Note: Works on integer values and ignores NULL entries.
  • Example:

BIT_AND

14. BIT_OR()

  • Purpose: Returns the bitwise OR of all values in a column.
  • Note: Similar to BIT_AND(), it operates on integers.
  • Example:

BIT_OR

15. BIT_XOR()

  • Purpose: Returns the bitwise XOR of all values in a column.
  • Note: Bitwise XOR can be useful for parity checks or similar tasks.
  • Example:

BIT_XOR


These aggregate functions provide powerful ways to summarize, calculate, and manipulate data, making them essential tools in data analysis and reporting. When using them, consider how they handle NULL values and be aware of the specific SQL mode or MySQL version requirements (e.g., JSON functions).

Top comments (0)