DEV Community

Cover image for Introduction to Matplotlib
Aaron Johnson
Aaron Johnson

Posted on

Introduction to Matplotlib

What is Visualization?

Visualization is the ability to graphically represent data so that insights can be derived from the dataset, some of the insights that can be derived are the correlation between the various features, the count of distinct values for each feature, and if you're looking at a time series data (Data that changes with time, such as daily temperatures), visualization helps in seeing the general direction of the data. In python, there are various libraries that enable data visualization. Some of the most commonly used ones are Matplotlib and Seaborn.

For the purpose of demonstration, I will be using the House Rent Prediction Dataset from Kaggle. To install both the mentioned libraries, if you're using an online notebook such as Google Colab it should be pre-installed, else in your local terminal you can use the following commands if you already have python installed:



pip install matplotlib


Enter fullscreen mode Exit fullscreen mode

Matplotlib

To import the Matplotlib library, the code below is used:



import matplotlib.pyplot as plt


Enter fullscreen mode Exit fullscreen mode

From the code the import is used to following the following library to be imported. Now the next part is interesting, so we will generally use the pyplot component of matplotlib. pyplot essentially makes matplotlib act like matlab. Finally plt is a name we are giving to access the libary, so instead of using matplotlib.pyplot.etc we can just use plt.etc

The various graphs that are available in matplotlib are:

  • Line Charts
  • Bar Graphs
  • Pie Charts
  • Scatter Plots

We can also display images as well which will be covered at the end. It would also be helpful to note that matplotlib also supports 3D plots which won't be covered in this post.

After importing the House Rent Dataset, we can see that the various features of the dataset are: Posted On, BHK, Rent, Size, Floor, Area Type,Area Locality, City, Furnishing Status, Tenant Preferred, Bathroom, and Point of Contact

Line Charts

Line Graphs in very simple terms, show how data moves over time or the relationship between two features. To plot a line graph, you can call the function plt.plot(). Let us plot a Line Graph where the x axis is the size of the apartment and the y axis is the Rent as shown below:



plt.plot(df['Size'], df['Rent'])


Enter fullscreen mode Exit fullscreen mode

However you can see that the line is all over the place.

Output

This is where we sort both the features, using one as reference. In this case, we will sort both with respect to size.



zl = zip(df['Size'], df['Rent']) #combines lists into a tuple
sl = sorted(zl, key=lambda x: x[0]) #sort tuple based on first value
x,y= zip(sl) #Splits the values into x and y


Enter fullscreen mode Exit fullscreen mode

Now we get a better graph.

Output

Bar Graphs

Bar Graphs essentially show the numeric values of various categories, ie, the count for each category, the values associated with each category, etc. We will create a bar plot where the categorical values are the cities and the heights of each bar is based on the Rent. The code to create a bar graph is given below:



plt.bar(df['City'],df['Rent'])


Enter fullscreen mode Exit fullscreen mode

Ouput

Pie Charts

A Pie Chart is a circular representation of the total distribution of values of categories, ie, count of the categories. In this example, we will create a pie chart for the cities:



city_counts = df['City'].value_counts() # to get unique values and their counts
plt.pie(city_counts, labels=city_counts.index) #we use the unique values (cities) as the label
plt.show()


Enter fullscreen mode Exit fullscreen mode

Output

Scatter Plots

A Scatter Plot essentially helps in visualizing the relationship between two features. In this example, we will create a scatter plot in which the x-axis is the Size and y-axis is the Rent:



zl = zip(df['Size'], df['Rent']) #combines lists into a tuple
sl = sorted(zl, key=lambda x: x[0]) #sort tuple based on first value
x,y= zip(*sl) #Splits the values into x and y

plt.scatter(df['Size'],df['Rent'])


Enter fullscreen mode Exit fullscreen mode

Output

Finally, you can display images using the imgshow() function as shown below. To read an image we use the cv2 library which will be covered in another post.



import cv2 as cv

image = cv.imread("/kaggle/input/transformers transformers_cars-HD-2149224348.jpg") # Reads the image
plt.imshow(image) #displays the image


Enter fullscreen mode Exit fullscreen mode

Output

If you would like to go more in depth about matplotlib. Here are a few resources available:

If you have any inquiries, or want to point out any corrections or have any feedback in general, feel free to let me know!

Top comments (0)