This paper attempts to analyze and visualize the earthquake data of Nepal ranging from 1994–2019. The data was extracted from the National Seismological Centre (NSC) database. The code run to fetch and clean the data can be found here.
Where did the data require cleaning?
Once the data was scraped from the webpage it required a bit of cleaning. Most of how the cleaning was done has been described with comments in the code, however few things I was required to do were:
i. Change the date to a proper format:
The dates in the database were not in the standard DateTime format. The Nepali and the English dates were concatenated together. So the first task was to change the dates to a proper format.
ii. Changing the time to a proper format:
The time column in the database was not in a standard format. Therefore, this too required changing.
Once these two tasks were completed, my database was more or less ready for quick analysis and visualization.
iii. Dropped rows:
One of my precondition for this analysis was that the Epicenter had to be inside Nepal. Therefore, I dropped off any row where the Epicenter was countries other than Nepal.
Finally, there were a few coordinates where there were errors in latitudes and longitudes. Once those were accounted for and cleaned the data was ready to be used.
Data analysis
The first step of the analysis was to import the required libraries.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-pastel')
With the library ready, next up was importing the data file.
df = pd.read_csv('earthquake.csv')
I was able to import the required CSV file pandas ‘read_csv’ function. I quickly viewed the contents of the data frame now conveniently called df.
df.head()
df.describe()
This gave me a quick overview of what the data looked like and a quick summary stats for the numerical data.
A quick glance at the summary statistics showed that there have been 933 earthquakes in Nepal since 1994. It has been stated that “More than 30,000 tremors”were felt in 2015 alone after the massive earthquake. However, it must be noted that NSC processes the data of significant earthquakes and aftershocks.
Earthquake Magnitude Plot
plt.figure(figsize=(10,10))
sns.scatterplot(y="Magnitude", x=df.Magnitude.index , data=df)
plt.annotate('Gorkha,2015', xy=(389, 7.5), xytext=(385, 7.25),arrowprops=dict(facecolor='black', shrink=0.01),)
plt.annotate('Gorkha,2015', xy=(390, 6.5), xytext=(350, 6.0),arrowprops=dict(facecolor='black', shrink=0.01),)
plt.annotate('Dolkha, 2015', xy=(443, 6.84), xytext=(445, 6.3),arrowprops=dict(facecolor='black', shrink=0.01),)
plt.annotate('Dolkha, 2015', xy=(540, 6.74), xytext=(541, 6.5),arrowprops=dict(facecolor='black', shrink=0.01),)
plt.annotate('Taplejung-Sikkim border, 2011', xy=(282, 6.74), xytext=(100, 6.6),arrowprops=dict(facecolor='black', shrink=0.01),)
plt.savefig('overall\_earthquakes.png')
We can observe from the scatter plot there have been some massive earthquakes especially four massive ones in 2015 alone.
Yearly Earthquakes from 1994–2019
It can be observed from the graph the surge of frequency in earthquakes in 2015. The number of earthquakes has gone down and is following the pattern prior to 2015.
Earthquake Magnitude Classes
Earthquakes are also classified in categories ranging from minor to great, depending on their magnitude. (Source)
ClassMagnitude
- Great: 8 or more
- Major: 7–7.9
- Strong: 6–6.9
- Moderate: 5–5.9
- Light: 4–4.9
- Minor: 3 -3.9
#array for storing the size\_class
size\_class = []
for magnitude in df.Magnitude:
if magnitude >= 3.0 and magnitude <=3.9:
size\_class.append("Minor")
elif magnitude >=4.0 and magnitude <=4.9:
size\_class.append("Light")
elif magnitude >=5.0 and magnitude <=5.9:
size\_class.append("Moderate")
elif magnitude >=6.0 and magnitude <=6.9:
size\_class.append("Strong")
elif magnitude >=7.0 and magnitude <=7.9:
size\_class.append("Major")
else:
size\_class.append("Great")
#Creating a column in the dataframe called size\_class
df['size\_class'] = size\_class
df\_size\_class = pd.DataFrame(df.size\_class.groupby(df.size\_class).count())
With this done, I had a column in the data showing the various magnitude classes. Plotting these
Mapping the Earthquake on a geographical map
A new column was created in the database based on the earthquake’s magnitude and its estimated effect.[Source]. Based on magnitude affect the earthquakes were plotted in a map of Nepal.
scale = []
for magnitude in df.Magnitude:
if magnitude >= 3.0 and magnitude <=3.9:
scale.append("Limited Damage")
elif magnitude >=4.0 and magnitude <=4.9:
scale.append("Minor Damage")
elif magnitude >=5.0 and magnitude <=5.9:
scale.append("Slight Damage")
elif magnitude >=6.0 and magnitude <=6.9:
scale.append("Severe Damage")
elif magnitude >=7.0 and magnitude <=7.9:
scale.append("Serious Damage")
else:
scale.append("Great Damage")
df['scale'] = scale
Finally, the Jupyter notebook for the entire code can be found here.
Top comments (1)
Why you gave the light earthquake biggest peak? It is a great work done in data cleaning