Utilisation de pandas.groupby et de pandas.cut
Imaginons qu'on ait un Dataframe représentant une time serie qui représente l'élévation du niveau d'eau et la direction de la houle à chaque pas de temps.
#+BEGIN_SRC
import pandas as pd
import numpy as np
import random as rdn
# build fake dataframe
df = pd.DataFrame()
df['dir'] = [rdn.randint(0, 360) for _ in range(100)]
df['elevation'] = [rdn.uniform(-3, 3) for _ in range(100)]
print("df:\n", df)
#+END_SRC
#+RESULTS:
df:
dir elevation
0 48 2.437835
1 4 1.489722
2 130 -0.576090
3 169 2.721867
4 315 -1.120620
.. ... ...
95 247 -1.256091
96 301 2.014663
97 175 0.901548
98 289 2.903849
99 98 -0.712153
[100 rows x 2 columns]
Cela peut être fastidieux à lire. On peut vouloir lire les informations par secteur. On effectue alors un binning.
#+BEGIN_SRC python
# create the bins
bins = pd.cut(df['dir'], np.arange(0, 390, 30))
df_grp = df.groupby(bins).describe()['elevation']
print("df_grp:\n", df_grp)
#+END_SRC
#+RESULTS:
df_grp:
count mean std ... 50% 75% max
dir ...
(0, 30] 5.0 0.249184 1.670480 ... 0.605261 1.489722 1.771682
(30, 60] 6.0 0.530561 2.120228 ... 1.271229 2.242043 2.437835
(60, 90] 14.0 0.280099 1.937723 ... 1.548581 1.775393 2.223731
(90, 120] 9.0 -0.894609 1.444726 ... -0.712153 -0.239352 1.684734
(120, 150] 5.0 -0.542193 0.655393 ... -0.576090 -0.259147 0.374463
(150, 180] 11.0 -0.000794 1.906180 ... -0.093552 1.132952 2.759483
(180, 210] 5.0 -0.104412 1.175992 ... -0.471943 1.037076 1.221931
(210, 240] 11.0 0.447931 1.605676 ... 0.832420 1.552965 2.335645
(240, 270] 4.0 -0.556329 1.708697 ... -1.300140 -0.443048 1.996078
(270, 300] 12.0 -0.406191 1.708407 ... -0.800965 -0.041385 2.903849
(300, 330] 11.0 0.168334 1.585117 ... 0.135546 1.177059 2.553054
(330, 360] 7.0 0.411437 1.725589 ... 0.466246 1.667764 2.716199
[12 rows x 8 columns]
Pour centrer autour de 0
condition = (df["dir"] > 360 - bin_size/2) | (df["dir"] < bin_size/2)
df1 = df[condition]
df2 = df[~condition]
Top comments (0)