Introduction
Photo by Brett Jordan on Unsplash
Welcome to the first article of my 52-week blog challenge. I will be covering technical and description articles in the field of data science and artificial intelligence.
Let's jump right into the definitions first.
Temperature - is a physical quantity that expresses the perception of hotness and coldness. In other words, the measure of hotness and coldness is expressed in terms of scales.
Variation - is the extent something is different from another
So....
- Temperature variation is the measure of the difference in temperature in a specific area at a particular range of time.
Goals
The goal of this project is to create an animated spiral of Kenya's variation in temperature from 1991 to 2016.
By the end of this blog post you will have learned:
Exploratory data analysis - ETL( Extraction, Transformation and Loading data)
Data Visualization
Generation of a GIF
Reporting and presenting the data's story after transforming it from data to information and insights.
Why?
Descriptive analysis- It will describe the current situation on the ground.
Informed decision making-The insight will help with making informed decisions in climate policy-making.
Disaster preparedness-The visualization can help show early signs of unusual temperature spikes that could help prepare better for them.
Background
Ed Hawkins, a climate scientist, unveiled an animated visualization in 2017 that captivated the world. This visualization showed the deviations in the global average temperature from 1850 to 2017. It was re-shared millions of times over Twitter and Facebook and a version of it was even shown at the opening ceremony for the Rio Olympics.
This animation is created with the help of https://www.dataquest.io/blog/climate-temperature-spirals-python/ written by Srini Kadamati.
Historical weather data was retrieved from africa open data https://africaopendata.org/dataset/kenya-climate-data-1991-2016
The data was collected for the climate knowledge portal by the World Bank.
Building the spiral visualization.
1. ETL( Extraction, Transformation and Loading data)
#importing libraries we'll use
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import matplotlib.animation as animation
#reading the temperature file into a pandas dataframe
temp_data = pd.read_csv(
"temp data.csv",
delim_whitespace=True,
usecols=[0, 1],
header=None)
Let's take a quick look at the data frame and some properties of the data.
temp_data
Result:
0 1
0 Year,Month Average,Temperature
1 1991,Jan Average,25.1631
2 1991,Feb Average,26.0839
3 1991,Mar Average,26.2236
4 1991,Apr Average,25.5812
... ... ...
308 2016,Aug Average,24.0942
309 2016,Sep Average,24.437
310 2016,Oct Average,26.0317
311 2016,Nov Average,25.5692
312 2016,Dec Average,25.7401
temp_data.describe()
Result:
0 1
count 313 313
unique 313 313
top Year,Month Average,Temperature
freq 1 1
From the results you get, check if there is a need to make it more readable.
With this particular case, you need to separate year, month, and average temperature.
temp_data[['Year', 'Month']] = temp_data['Year'].str.split(',', expand=True)
temp_data[['Average', 'Temparature']] = temp_data['Average'].str.split(',', expand=True)
temp_data.head()
Result:
0 1 Year Month Average Temperature Temparature
0 Year,Month Average,Temperature Year Month Average Average,Temperature Temperature
1 1991,Jan Average,25.1631 1991 Jan Average Average,25.1631 25.1631
2 1991,Feb Average,26.0839 1991 Feb Average Average,26.0839 26.0839
3 1991,Mar Average,26.2236 1991 Mar Average Average,26.2236 26.2236
4 1991,Apr Average,25.5812 1991 Apr Average Average,25.5812 25.5812
It is best practice to drop the columns that are repetitive.
temp_data_1 = temp_data.drop(temp_data.columns[[0, 1, 4, 5]], axis=1)
temp_data_1
Result:
Year Month Temparature
0 Year Month Temperature
1 1991 Jan 25.1631
2 1991 Feb 26.0839
3 1991 Mar 26.2236
4 1991 Apr 25.5812
... ... ... ...
308 2016 Aug 24.0942
309 2016 Sep 24.437
310 2016 Oct 26.0317
311 2016 Nov 25.5692
312 2016 Dec 25.7401
Now let's get to know the data types in the data.
#getting to know what data types my data frame has
temp_data_2.dtypes
Result:
Year object
Month object
Temparature object
dtype: object
All the data is in object form
You need to convert the temperature column data type from object to float. This is because it is the only way you can perform mathematical operations on it and visualize it on a scale.
temp_data_2['Temparature'] = temp_data_2['Temparature'].astype(str).astype(float)
#view data types of each column
temp_data_2.dtypes
Result
Year object
Month object
Temparature float64
dtype: object
Now you will write a function that converts month names to numbers. Here you utilize the datetime python library.
# Define a function to convert month names to numbers
def month_string_to_number(string):
dt = datetime.strptime(string, "%b")
return dt.month
## Apply the function to the month column to convert to numbers
temp_data_2['month_number'] = temp_data_2['Month'].apply(month_string_to_number)
temp_data_2.head(20)
Result:
Year Month Temparature month_number
1 1991.0 Jan 25.1631 1
2 1991.0 Feb 26.0839 2
3 1991.0 Mar 26.2236 3
4 1991.0 Apr 25.5812 4
5 1991.0 May 24.6618 5
6 1991.0 Jun 23.9439 6
7 1991.0 Jul 22.9982 7
8 1991.0 Aug 23.0391 8
9 1991.0 Sep 23.9423 9
10 1991.0 Oct 25.5236 10
11 1991.0 Nov 24.5875 11
12 1991.0 Dec 24.7398 12
13 1992.0 Jan 24.4359 1
14 1992.0 Feb 26.2892 2
15 1992.0 Mar 26.5409 3
16 1992.0 Apr 26.0819 4
17 1992.0 May 24.7852 5
18 1992.0 Jun 24.0563 6
19 1992.0 Jul 22.8377 7
20 1992.0 Aug 22.7902 8
It is best practice to drop the unnecessary month name column.
temp_data_2 = temp_data_2.drop('Month', axis=1)
Checking for null or missing values is very important in the ETL process.
temp_data_2.isnull().sum()
Result:
Year 0
Temparature 0
month_number 0
dtype: int64
There are no missing values in this data.
Now you find the mean of the temperature column and subtract the mean from each individual value in the column. This will help you find the temperature variation of every month against the year's mean temperature. This is a sort of normalization of data.
2. Visualizing the data.
Cartesian versus polar coordinate system
There are a few key phases to recreating Ed's GIF:
-learning how to plot on a polar coordinate system
-transforming the data for polar visualization
-customizing the aesthetics of the plot
-stepping through the visualization year-by-year and turning the plot into a GIF
- Preparing data for polar plotting
You need to subset the data by year and use the following coordinates:
r: temperature value for a given month, adjusted to contain no negative values.
Matplotlib supports plotting negative values, but not in the way you think. You want -0.1 to be closer to the center than 0.1, which isn't the default matplotlib behavior.
You also want to leave some space around the origin of the plot for displaying the year as text.
theta: generate 12 equally spaced angle values that span from 0 to 2*pi.
You'll start with how to plot just the data for the year 1991 in matplotlib, then scale up to all years.
To generate a matplotlib Axes object that uses the polar system, you need to set the projection parameter to "polar" when creating it.
fig = plt.figure(figsize=(8,8))
ax1 = plt.subplot(111, projection='polar')
To adjust the data to contain no negative temperature values, you need to first calculate the minimum temperature value:
temp_data_2['Temparature'].min()
Result:
-2.3378881410256405
You'll add
2 to all temperature values, so they'll be positive but there's still some space reserved around the origin for displaying text:
Note; adjust your value according to your data's minimum temperature.
You'll also generate 12 evenly spaced values from 0 to 2*pi and use the first 12 as the theta values:
# returns a boolean Series that selects only the rows
#where the Year column is equal to 1991.
hc_1991 = temp_data_2[temp_data_2['Year'] == 1991]
#the code creates a new figure with
#the plt.figure() function and sets the size of the figure to be 8 inches by 8 inches with figsize=(8,8).
fig = plt.figure(figsize=(8,8))
ax1 = plt.subplot(111, projection='polar')
r = hc_1991['Temparature'] + 2
theta = np.linspace(0, 2*np.pi, 12)
# Plot the data on the polar axes
ax1.plot(theta, r)
# hide all of the tick labels for both axes
ax1.axes.get_yaxis().set_ticklabels([])
ax1.axes.get_xaxis().set_ticklabels([])
#Background color within the polar plot to be black, and the color surrounding the polar plot to be gray.
#I can use
#fig.set_facecolor() to set the foreground color and Axes.set_axis_bgcolor() to set the background color of the plot:
fig.set_facecolor("#323331")
ax1.set_facecolor('#000100')
#add the title and labels
ax1.set_ylabel('Temperature')
ax1.set_title("Kenya's Temperature Change (1991-2016)", color='white', fontdict={'fontsize': 30})
# Display the plot
plt.show()
Plotting the remaining years
To plot the spirals for the remaining years, you need to repeat what you just did but for all of the years in the dataset. The one tweak you should make here is to manually set the axis limit for
r (or y in matplotlib). This is because matplotlib scales the size of the plot automatically based on the data that's used. This is why, in the last step, I observed that the data for just 1991 was displayed at the edge of the plotting area. You'll calculate the maximum temperature value in the entire dataset and add a generous amount of padding (to match what Ed did).
Now, you can use a for loop to generate the rest of the data. You'll leave out the code that generates the center text for now (otherwise each year will generate text at the same point and it'll be very messy):
You will use the color (or c) parameter when calling the Axes.plot() method and draw colors from plt.cm.(index).
ig = plt.figure(figsize=(14,14))
ax1 = plt.subplot(111, projection='polar')
# hide all of the tick labels for both axes
ax1.axes.get_yaxis().set_ticklabels([])
ax1.axes.get_xaxis().set_ticklabels([])
#fig.set_facecolor() to set the foreground color and Axes.set_axis_bgcolor() to set the background color of the plot:
fig.set_facecolor("#323331")
#ax1.set_ylim(0, 3.25)
theta = np.linspace(0, 2*np.pi, 12)
ax1.set_title("Kenya's Temperature Change (1991-2016)", color='white', fontdict={'fontsize': 30})
ax1.set_facecolor('#000100')
years = temp_data_2['Year'].unique()
for index,Year in enumerate(years):
r=temp_data_2.loc[temp_data_2["Year"]== Year,"Temparature"]+2
ax1.plot(theta,r,c=plt.cm.viridis(index*2))
plt.show()
Adding Temperature Rings
At this stage, the viewer can't actually understand the underlying data at all. There is no indication of temperture values in the visualization.
Next, You will add temperature rings at 0.0, 1.5, 2.0 degrees Celsius:
Then, finally Generating The GIF Animation
Now you're ready to generate a GIF animation from the plot. An animation is a series of images that are displayed in rapid succession. You'll use the
matplotlib.animation.FuncAnimation function to help with this. To take advantage of this function, you need to write code that:
defines the base plot appearance and properties
updates the plot between each frames with new data
you'll use the following required parameters when calling
FuncAnimation():
fig: the matplotlib Figure object
func: the update function that's called between each frame
frames: the number of frames (you want one for each year)
interval: the number of milliseconds each frame is displayed (there are 1000 milliseconds in a second)
This function will return a
matplotlib.animation.FuncAnimation object, which has a save() method you can use to write the animation to a GIF file.
The code block below shows all these above steps added to produce a GIF.
from mpl_toolkits.mplot3d import Axes3D
months=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
fig=plt.figure(figsize=(15,15))
ax1=plt.subplot(111,projection="polar")
ax1.plot(full_circle_thetas, blue_one_radii, c='blue')
ax1.plot(full_circle_thetas, red_one_radii, c='red')
ax1.plot(full_circle_thetas, red_two_radii, c='red')
ax1.plot(full_circle_thetas, red_three_radii, c='red')
ax1.plot(full_circle_thetas, red_four_radii, c='red')
#fig.set_facecolor() to set the foreground color and Axes.set_axis_bgcolor() to set the background color of the plot:
fig.set_facecolor("#323331")
#ax1.set_ylim(0, 3.25)
ax1.text(np.pi/2, 1.0, "0.0 C", color="blue", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 2.0, "0.5 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 2.5, "1.0 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 3.0, "1.5 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 3.5, "2.0 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_xticklabels([])
ax1.set_yticklabels([])
theta = np.linspace(0, 2*np.pi, 12)
ax1.set_title("Kenya's Temperature Change Spiral (1991-2016)", color='white', fontdict={'fontsize': 30})
ax1.set_facecolor('#000100')
years = temp_data_2['Year'].unique()
fig.text(0.78,0,"Kenya Temperature data",color="white",fontsize=20)
fig.text(0.05,0.02,"Everlynn Muthoni; Data Stories",color="white",fontsize=20)
fig.text(0.05,0,"Inspired by Ed Hawkins's 2017 Visualization",color="white",fontsize=15)
#add months ring
months_angles= np.linspace((np.pi/2)+(2*np.pi),np.pi/2,13)
for i,month in enumerate(months):
ax1.text(months_angles[i],5.0,month,color="white",fontsize=15,ha="center")
#for index,Year in enumerate(years):
#r=temp_data_2.loc[temp_data_2["Year"]== Year,"Temparature"]+2
#ax1.plot(theta,r,c=plt.cm.viridis(index*15))
def update(i):
# Remove the last year text at the center
for txt in ax1.texts:
if(txt.get_position()==(0,0)):
txt.set_visible(False)
# Specify how we want the plot to change in each frame.
# We need to unravel the for loop we had earlier.
Year = years[i]
r = temp_data_2[temp_data_2['Year'] == Year]['Temparature'] + 2
ax1.plot(theta, r, c=plt.cm.viridis(i*30))
ax1.text(0,0,Year,fontsize=20,color="white",ha="center")
return ax1
anim = animation.FuncAnimation(fig, update, frames=len(years), interval=10)
ffmpeg_writer = animation.FFMpegWriter();
anim.save("Spiral.gif", writer = 'pillow', fps = 5, dpi=100);
Final result:
3. The story our data visualization tells.
So....from the analysis and visualization, the following insights are deduced;
- Since 1990 the temperature variation has been gradually increasing between February and June with the highest variation occurring mostly between June and July
-High-temperature variation mostly occurs during most of the first half of the year.
And that's it. Congrats, you have successfully visualized temperature data using a climate spiral!
Click here if you'd like to check out the source code.
4. Recommendations
For a better 3d visualization, explore the project using Matlab
For even better real time descriptive analysis, try to find data with the latest dates.
Like, subscribe and share your thoughts with me. Bye! and Happy coding.
Top comments (0)