Apache AGE (A Graph Extension) is a powerful graph extension for the PostgreSQL Database. It provides a rich set of statistical and mathematical functions to analyze and manipulate data efficiently.
In the last blog post we explored the features and syntax of the percentileCont()
function in Apache AGE Click here to read.
Today, we will delve into another essential function offered by Apache AGE which is the percentileDisc()
function. This function is also used to calculate percentiles in a dataset. Let's explore this function in detail and understand how and when to use it.
What are Percentiles?
Before we dive into the specifics of the function, let us briefly understand the concept of percentiles. A percentile is a statistical measure that indicates the relative position of a particular value within a dataset. It represents the value below which a given percentage of observations in the dataset falls. For example, the 75th percentile (also known as the third quartile) indicates that 75% of the values in the dataset are below that specific value.
The percentileDisc Function
The percentileDisc()
function in Apache AGE uses a rounding method and calculates the nearest value to the percentile. It returns the value from the dataset that represents the nearest data point to the requested percentile. Unlike percentileCont()
, it does not perform any interpolation. The function returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0.
The function also takes two arguments (the expression and the percentile) and returns a float.
Query Syntax
Given this dataset
demo=# SELECT * FROM cypher('percentile', $$
demo$# CREATE (:Person {name: 'Paul', age: 20}), (:Person {name: 'Mark', age: 22}),
demo$# (:Person {name: 'Peter', age: 42}), (:Person {name: 'Bob', age: 12}),
demo$# (:Person {name: 'Robin', age: 30}), (:Person {name: 'Grace', age: 24}),
demo$# (:Person {name: 'Martha', age: 32}), (:Person {name: 'Keith', age: 28})
demo$# $$) AS (a agtype);
a
---
(0 rows)
demo=# SELECT * FROM cypher('percentile', $$
demo$# MATCH (n:Person)
demo$# RETURN percentileDisc(n.age, 0.5)
demo$# $$) as (percentile_disc_age agtype);
percentile_disc_age
---------------------
24.0
(1 row)
In this case, The 50th percentile of the values in the property age is returned.
Things to Note:
- The percentile argument is a numeric value between 0 and 1, representing the desired percentile. For example, 0.5 represents the 50th percentile (median).
- If the percentile falls between two data points, this function returns the value from the dataset that is closest to the requested percentile.
- The
percentileDisc()
function returns a discrete value that exists in the dataset, not an interpolated estimate.
Use Cases
- The
percentileDisc()
function is appropriate when you need to find the closest actual value in the dataset for a given percentile. - The
percentileDisc()
function is commonly used when dealing with discrete or categorical data, where linear interpolation does not make sense.
Conclusion
It is important to grasp the difference between the percentileCont()
and percentileDisc()
functions in Apache AGE as it is crucial for accurate percentile calculations. While the percentileCont()
function provides a continuous estimate through linear interpolation, The percentileDisc()
function returns the closest actual value from the dataset. Choosing the appropriate function depends on the nature of your data and the level of precision required in your analysis. By considering the characteristics of your dataset and the desired outcome, you can leverage these functions effectively to derive meaningful insights from your data.
References
- The Aggregation functions in Apache AGE
- Mastering the percentileCont Function in Apache AGE
- Visit Apache AGE Website: https://age.apache.org/
- Visit Apache AGE GitHub: https://github.com/apache/age
- Visit Apache AGE Viewer GitHub: https://github.com/apache/age-viewer
Top comments (0)