What is Visualization?
Visualization is the ability to graphically represent data so that insights can be derived from the dataset, some of the insights that can be derived are the correlation between the various features, the count of distinct values for each feature, and if you're looking at a time series data (Data that changes with time, such as daily temperatures), visualization helps in seeing the general direction of the data. In python, there are various libraries that enable data visualization. Some of the most commonly used ones are Matplotlib and Seaborn.
For the purpose of demonstration, I will be using the House Rent Prediction Dataset from Kaggle. To install both the mentioned libraries, if you're using an online notebook such as Google Colab
it should be pre-installed, else in your local terminal you can use the following commands if you already have python installed:
pip install matplotlib
Matplotlib
To import the Matplotlib library, the code below is used:
import matplotlib.pyplot as plt
From the code the import
is used to following the following library to be imported. Now the next part is interesting, so we will generally use the pyplot
component of matplotlib
. pyplot
essentially makes matplotlib
act like matlab
. Finally plt
is a name we are giving to access the libary, so instead of using matplotlib.pyplot.etc
we can just use plt.etc
The various graphs that are available in matplotlib are:
- Line Charts
- Bar Graphs
- Pie Charts
- Scatter Plots
We can also display images as well which will be covered at the end. It would also be helpful to note that matplotlib also supports 3D plots which won't be covered in this post.
After importing the House Rent Dataset, we can see that the various features of the dataset are: Posted On
, BHK
, Rent
, Size
, Floor
, Area Type
,Area Locality
, City
, Furnishing Status
, Tenant Preferred
, Bathroom
, and Point of Contact
Line Charts
Line Graphs in very simple terms, show how data moves over time or the relationship between two features. To plot a line graph, you can call the function plt.plot()
. Let us plot a Line Graph where the x axis is the size of the apartment
and the y axis is the Rent
as shown below:
plt.plot(df['Size'], df['Rent'])
However you can see that the line is all over the place.
This is where we sort both the features, using one as reference. In this case, we will sort both with respect to size
.
zl = zip(df['Size'], df['Rent']) #combines lists into a tuple
sl = sorted(zl, key=lambda x: x[0]) #sort tuple based on first value
x,y= zip(sl) #Splits the values into x and y
Now we get a better graph.
Bar Graphs
Bar Graphs essentially show the numeric values of various categories, ie, the count for each category, the values associated with each category, etc. We will create a bar plot where the categorical values are the cities
and the heights of each bar is based on the Rent
. The code to create a bar graph is given below:
plt.bar(df['City'],df['Rent'])
Pie Charts
A Pie Chart is a circular representation of the total distribution of values of categories, ie, count of the categories. In this example, we will create a pie chart for the cities:
city_counts = df['City'].value_counts() # to get unique values and their counts
plt.pie(city_counts, labels=city_counts.index) #we use the unique values (cities) as the label
plt.show()
Scatter Plots
A Scatter Plot essentially helps in visualizing the relationship between two features. In this example, we will create a scatter plot in which the x-axis is the Size
and y-axis is the Rent
:
zl = zip(df['Size'], df['Rent']) #combines lists into a tuple
sl = sorted(zl, key=lambda x: x[0]) #sort tuple based on first value
x,y= zip(*sl) #Splits the values into x and y
plt.scatter(df['Size'],df['Rent'])
Finally, you can display images using the imgshow()
function as shown below. To read an image we use the cv2
library which will be covered in another post.
import cv2 as cv
image = cv.imread("/kaggle/input/transformers transformers_cars-HD-2149224348.jpg") # Reads the image
plt.imshow(image) #displays the image
If you would like to go more in depth about matplotlib. Here are a few resources available:
If you have any inquiries, or want to point out any corrections or have any feedback in general, feel free to let me know!
Top comments (0)