Pandas - Brief

#panda #beginners #datascience #tutorial

What is pandas?
Pandas is python package built on two python packages Matplotlib and Numpy.
14 million users

DataFrame: 2 dimensional, Mutable, heterogeneous(Can be),Tabular Data structure

.info() Method: Generates Summary of the dataFrame with column names, Non-null counts, Dtype, memory Usage.
.head() Method: returns the first few rows (the “head” of the DataFrame).
.describe() Method: use for calculating statistical properties like mean, max, std Deviation, percentiles
.values Returns Numpy representation of the dataFrame. But new method that is to_numpy() should be used rather than .values.
.columns List all column heading for database and its data types.
.index
List all index in the dataFrame. These index means numbers of rows
.shape Function:
Returns the tuple of shape such as rows and columns
.size Function:
Returns overall number of elements in that data frame
.ndim Function:
Returns dimensions of Database
DataFrame column selecting
You can select also multiple columns in database by double square bracket syntax. First square bracket is for syntax of dataFrame selection and second is for List of columns.

column1 = dataFrame['columnName']
column1 = dataFrame.columnName
column1 = dataFrame[['columnName', 'col2']]

DataFrame row selecting with logical testing
And or Operators in row selection
Specific Value row selection: This selects particular row from given column where value is value. We can use different logical operator here also

row1 = dataFrame.[dataFrame.column == 'Value']
row1 = dataFrame.[dataFrame[column]== 'Value']

Sorting Dataframe:

sortedDataFrame = dataFrame.sort_values('column_to_sort')
sortedDataFrame = dataFrame.sort_values(by = ['column_to_sort1', 'column_to_sort2'])

Sorting can be perform on numbers, Dates.
Extra Attributes -
ascending = True / False,
na_position = first/ last - where to put Nan Values.
Example:

homelessness_reg_fam = homelessness.sort_values(['region','family_members'],ascending=[True,False])

isin() Method: isin() is used in filtering DataFrame. With Particular Value and particular column.

# The Mojave Desert states
canu = ["California", "Arizona", "Nevada", "Utah"]

# Filter for rows in the Mojave Desert states
mojave_homelessness = homelessness[homelessness.state.isin(canu)]

Adding New Column to Database: Terms for adding new columns: Mutating/transforming DataFrame or feature engineering

dataframe['new_column'] = old_column.some_transformation

Summary Statistics Summary statistics is the way you can summarise and know more about your data. mean(), median(),mode(),min(),max(),var(), std(), sum(), quantile(), agg(), agg() method is use to calculate custom summary statistic. agg() function takes more than one parameter function in the form of list. Example of custom percentile is as follows.

def percentile30(column):
   return column.quantile(0.4)

dataFrame[columnName].agg(percentile30)

Functions like min,max works on Date columns also.

Calculating Cumulative Statistics
cumsum(), cummax(),cummin(),cumprod()

To be Continued...

DEV Community

Pandas - Brief

Top comments (0)

Read next

Y Combinator Co-Founder Jessica Livingston on the Beginnings of YC

Documentation for your Supabase API! - Supaweek Day 3

Understanding the HTTP 431 Error: A Developer's Guide

What is the Wikipedia API? How to Use It and Alternatives