If you’re a data scientist, you likely spend a lot of time cleaning and manipulating data for use in your applications. One of the core libraries for preparing data is the Pandas library for Python.
In a previous post, we explored the background of Pandas and the basic usage of a Pandas DataFrame, the core data structure in Pandas. Check out that post if you want to get up to speed with the basics of Pandas.
In this post, we’ll explore a few of the core methods on Pandas DataFrames. These methods help you segment and review your DataFrames during your analysis.
We’ll cover
- Using Pandas groupby to segment your DataFrame into groups.
- Exploring your Pandas DataFrame with counts and value_counts.
Let’s get started.
Pandas Groupby
Pandas is typically used for exploring and organizing large volumes of tabular data, like a super-powered Excel spreadsheet. Often, you’ll want to organize a pandas DataFrame into subgroups for further analysis.
For example, perhaps you have stock ticker data in a DataFrame, as we explored in the last post. Your Pandas DataFrame might look as follows:
>>> df
date symbol open high low close volume
0 2019-03-01 AMZN 1655.13 1674.26 1651.00 1671.73 4974877
1 2019-03-04 AMZN 1685.00 1709.43 1674.36 1696.17 6167358
2 2019-03-05 AMZN 1702.95 1707.80 1689.01 1692.43 3681522
3 2019-03-06 AMZN 1695.97 1697.75 1668.28 1668.95 3996001
4 2019-03-07 AMZN 1667.37 1669.75 1620.51 1625.95 4957017
5 2019-03-01 AAPL 174.28 175.15 172.89 174.97 25886167
6 2019-03-04 AAPL 175.69 177.75 173.97 175.85 27436203
7 2019-03-05 AAPL 175.94 176.00 174.54 175.53 19737419
8 2019-03-06 AAPL 174.67 175.49 173.94 174.52 20810384
9 2019-03-07 AAPL 173.87 174.44 172.02 172.50 24796374
10 2019-03-01 GOOG 1124.90 1142.97 1124.75 1140.99 1450316
11 2019-03-04 GOOG 1146.99 1158.28 1130.69 1147.80 1446047
12 2019-03-05 GOOG 1150.06 1169.61 1146.19 1162.03 1443174
13 2019-03-06 GOOG 1162.49 1167.57 1155.49 1157.86 1099289
14 2019-03-07 GOOG 1155.72 1156.76 1134.91 1143.30 1166559
Perhaps we want to analyze this stock information on a symbol-by-symbol basis rather than combining Amazon (“AMZN”) data with Google (“GOOG”) data or that of Apple (“AAPL”).
This is where the Pandas groupby method is useful. You can use groupby to chunk up your data into subsets for further analysis.
Read more on the Kite blog!
Top comments (0)