Hey reader👋hope you are doing well!!
In the last post we have learnt about dataframe and libraries in Python. In this post we are to learn about Exploratory Data Analysis.
So let's get started🔥.
Introduction
While working with data it is very important to have deep knowledge of your data. As it will help in addressing the problem in more effective and creative way. Also this will help in building more accurate models.
To understand our data in a better we need to explore it deeply and analyze it thoroughly. We need to see the relationship between different variables present in data, their contribution in making prediction, the type of variables used in dataset etc. And for this purpose we perform something called Exploratory Data Analysis (EDA) on our dataset.
What is Exploratory Data Analysis?
EDA stands for Exploratory Data Analysis , it refers to the method of studying and exploring datasets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables.
Imagine you’re about to embark on a road trip. Before hitting the road, what do you do? You check the map, right? Well, EDA is pretty much like that map — it helps us navigate through our data landscape, understand its terrain, and uncover hidden gems along the way.
It is a very first step that we must perform before building any model.
Now what these traits that EDA tells us.
Foremost Goals of EDA
1. Understanding the Data’s Structure and Composition -:
EDA helps us grasp the basic layout of our dataset — its dimensions (size of our dataset), variables, and overall structure. By familiarizing ourselves with the data’s anatomy, we lay the foundation for deeper analysis and exploration.
Here we see that does our dataset contains categorical values, numerical values or both.
We find that whether our dataset contains any duplicate values (if present we remove them).
We check for any missing values are present in our dataset (if present we will treat them accordingly).
We see the size of our dataset.
If there are any unnecessary columns in our dataset we will remove them. We can manipulate columns according to our need.
2. Identifying Anomalies and Outliers -:
One of the key goals of EDA is to spot any irregularities or outliers hiding within the data. These outliers can skew our analysis and lead to erroneous conclusions. By identifying and addressing them early on, we ensure the integrity and reliability of our insights.
Outliers are data points that are significantly different from the majority of the data in a dataset.
We have different techniques to detect these values and to handle them appropriately so that they can't corrupt our data.
3. Uncovering Patterns and Relationships -:
EDA is all about connecting the dots — uncovering hidden patterns, trends, and relationships within the data. Whether it’s a correlation between variables or a seasonal trend in sales figures, EDA helps us make sense of the data’s underlying structure and dynamics.
4. Communicating Insights Effectively -:
Last but not least, EDA is about communicating our findings effectively. Whether it’s through visualizations, reports, or presentations, EDA empowers us to convey complex insights in a clear and compelling manner. By telling the story behind the data, we inspire action and drive meaningful change.
Now we have known a bit about it let's move forward.
Univariate Analysis
Univariate analysis involves examining the distribution and characteristics of a single variable.
In this type of analysis we are concerned for just single column and examine everything about it.
Bivariate Analysis
Bivariate analysis involves analyzing the relationship between two variables. It aims to understand how the value of one variable changes concerning the value of another variable.
This is it for this post in the next post we will see how EDA is performed on a dataset.
I hope you have understood it well. If you have any queries please do comment. I'll try my best to solve your queries.
Please follow me and if you like my post please like it 💙
Top comments (0)