DEV Community

es404020
es404020

Posted on • Edited on

Identifying Outliers in a data set

What are Outliers

An Outlier is an extremely high or extremely low value in our data .It can be identify if it is greater than Q3 + 1.5(IQR) or lower tha Q1 - 1.5(IQR).

IQR = Q3 - Q1

Note:

  • IQR means Interquartile Range

  • Q1 means first quartile

  • Q3 means third quartile

`import numpy as np

data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97,
101, 105, 112, 116]

Q1 = np.median(data[:10])

Q3 = np.median(data[10:])

IQR = Q3 - Q1

print(IQR)

`

Other example

import numpy as np
import pandas as pd
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})


q75, q25 = np.percentile(df['points'], [75 ,25])
iqr = q75 - q25


iqr

5.75
Enter fullscreen mode Exit fullscreen mode

Top comments (0)