Python is a widely used programming language that is renowned for its simplicity and adaptability.
Pandas is a versatile and user-friendly Python package that is mostly used for working with data sets.
Specifically for cleaning, exploring, manipulating, and analyzing data, Pandas is a fantastic tool for data analysis.
In this tutorial I will be shring some basic operations using pandas.
Let's dive right!
Installing Pandas.
Installing pandas is quite easy, just open your terminal program if you are using Mac, or your command line for (Pc users).
Enter the following commands.
Next We Want To import Pandas.
Use this:
A quick background knowledge:
The series and the dataframe are two parts of pandas. A dataframe is a multi-dimensional table made up of a collection of series, whereas a series is essentially a column.
Let's examine the Python dataframe creation process.
There are several ways to build a DataFrame from start, and using a simple dictionary is one of your best options.
Let's imagine we operate a fruit stand with a focus on selling apples and oranges. Our goal is to have a row for each customer's purchase and a column for each fruit.
In order to organize this data into a pandas dictionary, we may use the following strategy:
Then Using pandas dataframe constructor to create the dataframe:
How to read data in pandas
We used pd.read_csv here because we are working with a csv file, an excel file is read as pd.read_excel.
Loading the bikes dataset:
Let's take a look at the dataset
.head() gives an output of the first five rows of your dataframe, you could also pass the number
of your desired output.
To get Information about your data, run this command:
To see the shape of your dataset:
Shape is another attribute that helps you quickly see the numbers of rows and columns in your dataset.
Dropping Duplicates
drop_duplicates() is a method used to remove duplicates.
Selecting Column:
Using this method makes it simple to choose the column so that you can clean it up as needed as some datasets may contain column names containing symbols, upper- and lowercase words, spaces, and mistakes.
Checking Missing Value:
You'll probably come across missing or null values when analyzing data, which are simply placeholders for values that don't exist.
Depending on whether a cell is null, isnull() produces a DataFrame with each cell having a True or False value.
Removing Missing Values:
To drop rows with missing values you can also drop columns with null values by setting axis=1:
There are various methods and functions not covered in this tutorial, this is just to introduce you to basic analysis in pandas.
The pandas methods we didn't cover in this tutorial, such as "nunique," "describe," "merge," "pivot," "unique," and many others, will be expanded upon in my subsequent post.
Wrapping up
Data cleaning is more important when analyzing data; as an analyst, this will occupy roughly 80% of your time.
You should work on projects more, and you can read more about pandas documentation by clicking the link below: https://pandas.pydata.org/docs/reference/general_functions.html.
Keep working!
Top comments (0)