Python Pandas for Beginners - A Complete Guide (Part 1)
This post series, Python Pandas for Beginner, will be the best starting point to learn pandas library for the beginner. You will learn some of the most important pandas features such as exploring, cleaning, transforming, visualizing data.
Pandas is an open-source library in Python. It is the most popular Python library that is used for data analysis today. The powerful machine learning and visualization tools, it provides you the high-performance tool to analyze a big data set.
In this post, we will go over the essential information about pandas, from installation to advantage. You should make yourself a cup of coffee, take your favorite biscuit. After that, enjoy and read this article slowly. Feel free to stop and resume later, don't overwhelm yourself with a lot of info in a short time. Just follow step by step carefully, the pandas will come to you.
What is Pandas?
Pandas is a library for analytics, data processing, and data science. It's a huge open-source project with 1,500+ contributors. Here is the link of project Pandas on GitHub
Installations
The easiest way to install Pandas is by using Anaconda distribution. You haven't installed Anaconda yet, read our post for Anaconda installation guide.
If you don't want to install Anaconda, you can install it via pip.
pip install pandas
Data Structure of Pandas
The two primary data structures of Pandas are Series
and DataFrame
. A Series
is simply a column when we join multiple series (columns), so we have a DataFrame
.
Creating Your Series And DataFrame
Getting Start With Series
Firstly, creating a series data by passing a list of values. Pandas will count index from 0 by default.
import numpy as np
import pandas as pd
data_series = pd.Series([1, 9, 3, np.nan, 8])
print(data_series)
""" Output:
0 1.0
1 9.0
2 3.0
3 NaN
4 8.0
dtype: float64
"""
Create DataFrame In The Easiest Way
To create a DataFrame, there are many ways in Python. However, the easiest way is to create a dict
. After that, pass the dictionary data to the DataFrame constructor and it will do the job.
import pandas as pd
data = {
'Paris': [3, 2, 0, 1],
'Berlin': [0, 3, 7, 2]
}
purchases = pd.DataFrame(data)
print(purchases)
""" Output:
Paris Berlin
0 3 0
1 2 3
2 0 7
3 1 2
"""
Create DataFrame With Numpy
Passing the Numpy array, datetime data as index and column labels to DataFrame constructor:
import numpy as np
import pandas as pd
dates = pd.date_range('20191001', periods=6)
dataframe = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(dataframe)
""" Output:
A B C D
2019-10-01 0.304466 -0.699206 -2.090317 1.564566
2019-10-02 -0.876682 0.876720 1.275542 -0.757827
2019-10-03 0.029740 -1.282535 -0.420332 -1.176261
2019-10-04 -0.153740 -0.087788 1.314169 -1.835564
2019-10-05 0.301839 0.036301 0.138372 1.755769
2019-10-06 1.546020 -0.148291 0.781045 -1.789371
"""
In the example, we can see that index
will represent row labels. In other way, column
parameter is using for column labels.
References
We have used below documents for the reference while creating the series. If you love to use Pandas, may be you should read it.
Summary of Part 1
Through the first part of the series, Python Pandas for Beginners, you basically understand what is pandas and how to install it via pip or Anaconda. Furthermore, you can create your data Series or DataFrame.
In part 2, you will learn how to read the pandas data from JSON file, and some important operations of pandas.
See you in the next article, if you like this series, please share it for other Python Geeks. Leave a comment for us to help us improve in the next post.
The post Python Pandas for Beginners – A Complete Guide (Part 1) appeared first on Python Geeks.
Top comments (0)