DEV Community

pacdev
pacdev

Posted on • Updated on

Beans !

Utilisation de pandas.groupby et de pandas.cut
Imaginons qu'on ait un Dataframe représentant une time serie qui représente l'élévation du niveau d'eau et la direction de la houle à chaque pas de temps.

#+BEGIN_SRC
import pandas as pd
import numpy as np
import random as rdn

# build fake dataframe
df = pd.DataFrame()
df['dir'] = [rdn.randint(0, 360) for _ in range(100)]
df['elevation'] = [rdn.uniform(-3, 3) for _ in range(100)]
print("df:\n", df)
#+END_SRC

#+RESULTS:
df:
     dir  elevation
0    48   2.437835
1     4   1.489722
2   130  -0.576090
3   169   2.721867
4   315  -1.120620
..  ...        ...
95  247  -1.256091
96  301   2.014663
97  175   0.901548
98  289   2.903849
99   98  -0.712153

[100 rows x 2 columns]
Enter fullscreen mode Exit fullscreen mode

Cela peut être fastidieux à lire. On peut vouloir lire les informations par secteur. On effectue alors un binning.

#+BEGIN_SRC python
# create the bins
bins = pd.cut(df['dir'], np.arange(0, 390, 30))

df_grp = df.groupby(bins).describe()['elevation']
print("df_grp:\n", df_grp)
#+END_SRC

#+RESULTS:
df_grp:
             count      mean       std  ...       50%       75%       max
dir                                    ...                              
(0, 30]       5.0  0.249184  1.670480  ...  0.605261  1.489722  1.771682
(30, 60]      6.0  0.530561  2.120228  ...  1.271229  2.242043  2.437835
(60, 90]     14.0  0.280099  1.937723  ...  1.548581  1.775393  2.223731
(90, 120]     9.0 -0.894609  1.444726  ... -0.712153 -0.239352  1.684734
(120, 150]    5.0 -0.542193  0.655393  ... -0.576090 -0.259147  0.374463
(150, 180]   11.0 -0.000794  1.906180  ... -0.093552  1.132952  2.759483
(180, 210]    5.0 -0.104412  1.175992  ... -0.471943  1.037076  1.221931
(210, 240]   11.0  0.447931  1.605676  ...  0.832420  1.552965  2.335645
(240, 270]    4.0 -0.556329  1.708697  ... -1.300140 -0.443048  1.996078
(270, 300]   12.0 -0.406191  1.708407  ... -0.800965 -0.041385  2.903849
(300, 330]   11.0  0.168334  1.585117  ...  0.135546  1.177059  2.553054
(330, 360]    7.0  0.411437  1.725589  ...  0.466246  1.667764  2.716199

[12 rows x 8 columns]

Enter fullscreen mode Exit fullscreen mode

Pour centrer autour de 0

condition = (df["dir"] > 360 - bin_size/2) | (df["dir"] < bin_size/2)
df1 = df[condition]
df2 = df[~condition]
Enter fullscreen mode Exit fullscreen mode

Top comments (0)