DEV Community

Gichuki Edwin
Gichuki Edwin

Posted on • Edited on

Comprehensive Weather Data Analysis Using Python: Temperature, Rainfall Trends, and Visualizations

Weather Data Analysis and Forecasting for Different Cities in Kenya


Introduction

In this article, I’ll walk you through analyzing weather patterns using Python. From identifying temperature trends to visualizing rainfall, this step-by-step guide is perfect for anyone interested in using data science techniques for weather analysis. I’ll explore code, data manipulation, and visualizations for practical insights.

In Kenya, Weather plays a critical role in many sectors, particularly agriculture, tourism, and outdoor activities. Farmers, businesses, and event planners need accurate weather information in order to make decisions. However, weather patterns can vary significantly across different regions, and current forecasting systems may not always provide localised insights.

The objective of this project is to collect real-time weather data from from OpenWeatherMap API and Weather API for different regions across Kenya. This data will be stored in a database and analysed using Python to uncover insights into:-

  • Temperature trends
  • Rainfall patterns - Humidity and wind conditions

In this project, I analyze a dataset containing weather information for various cities in Kenya. The dataset includes over 3,000 rows of weather observations, including temperature, humidity, pressure, wind speed, visibility, and rainfall, among other factors. Using these insights, we aim to provide accurate, region specific weather forecast that can aid decision-making in weather sensitive sectors like agriculture, tourism, and even management.

Dataset overview

The dataset was structured using several columns:

  • Datetime - Timestamp indicating when the weather was recorded.
  • City and Country - Location of the weather observation.
  • Latitude and Longitude - Geographical coordinates of the location.
  • Temperature (Celsius) - The temperature recorded.
  • Humidity (%) - The percentage of humidity in the air.
  • Pressure (hPa) - The atmospheric pressure in hectopascals.
  • Wind Speed (m/s) - The speed of the wind at the time.
  • Rain (mm) - The amount of rainfall measured in millimeters.
  • Clouds (%) - The percentage of cloud coverage.
  • Weather Condition and Weather Description - General and detailed descriptions of the weather (e.g., 'Clouds', 'Scattered Clouds').

This is how the data is structured in the database.
Database structure


Exploratory Data Analysis

The first step in the analysis involved basic exploration of the data.
_ Data dimensions - The dataset contains 3,000 rows and 14 columns.
_ Null Values - Minimal missing data, ensuring that the dataset was reliable for further analysis.

print(df1[['temperature_celsius', 'humidity_pct', 'pressure_hpa', 'wind_speed_ms', 'rain', 'clouds']].describe())
Enter fullscreen mode Exit fullscreen mode

Using the code above, we computed summary statistics for the numerical columns, that provided insights into the range, mean, and spread of temperature, humidity, pressure, rainfall and clouds.

Visualising Key Weather Features

To gain a clearer understanding of the weather features, we plotted various distributions:

Temperature Distribution

sns.displot(df1['temperature_celsius'], bins=50, kde=True)
plt.title('Temperature Distribution')
plt.xlabel('Temperature (Celsius)')
Enter fullscreen mode Exit fullscreen mode

This distibution reveals the general spread of temperatures across the cities. The KDE line plot gives a smooth estimate of the probability distribution of temperature.

Rainfall Distribution

sns.displot(df1['rain'], bins=50, kde=True)
plt.title('Rainfall Distribution')
plt.xlabel('Rainfall (mm/h)')
Enter fullscreen mode Exit fullscreen mode

This code analyzes rainfall distribution across kenyan cities.

Humidity, Pressure and Wind Speed

Similar distribution plots for Humidity (%), Pressure (hPa), and Wind Speed (m/s), each providing useful insights into the variations of these parameters across the dataset.

Weather Condition Analysis

Weather conditions (e.g., 'Clouds', 'Rain') were counted and visualized using a pie chart to show their proportional distribution:

condition_counts = df1['weather_condition'].value_counts()

plt.figure(figsize=(8,8))
plt.pie(condition_counts, labels=condition_counts.index, autopct='%1.1f%%', pctdistance=1.1, labeldistance=0.6, startangle=140)
plt.title('Distribution of Weather Conditions')
plt.axis('equal')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Distribution of Weather Conditions

City-wise Rainfall

One of the key analysis was the total rainfall by city:

rainfall_by_city = df1.groupby('city')['rain'].sum().sort_values()

plt.figure(figsize=(12,12))
rainfall_by_city.plot(kind='barh', color='skyblue')
plt.title('Total Rainfall by City')
plt.xlabel('Total Rainfall (mm)')
plt.ylabel('City')
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

This bar plot highlighted which cities received the most rain over the observed period, with a few outliers showing significant rainfall compared to others.

Total rainfall by city

Average Monthly Temperature

avg_temp_by_month.plot(kind='line')
plt.title('Average Monthly Temperature')
Enter fullscreen mode Exit fullscreen mode

The line chart revealed temperature fluctuations across different months, showing seasonal changes.

Average monthly Temperature

Average Monthly Rainfall

monthly_rain.plot(kind='line')
plt.title('Average Monthly Rainfall')
Enter fullscreen mode Exit fullscreen mode

Similarly, rainfall was analyzed to observe how it varied month-to-month.

Average monthly rainfall

We also visualized the data using heatmaps for a more intuitive understanding of monthly temperature and rainfall.
Here are the heatmaps for the average monthly temperature and rainfall

Average monthly temperature

Average monthly rainfall

Correlation Between Weather Variables

Next, I calculated the correlation matrix between key weather variables:

correlation_matrix = df1[['temperature_celsius', 'humidity_pct', 'pressure_hpa', 'wind_speed_ms', 'rain', 'clouds']].corr()
correlation_matrix
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Between Weather Variables')
Enter fullscreen mode Exit fullscreen mode

This correlation heatmap allowed us to identify relationships between variables. For example, we observed a negative correlation between temperature and humidity, as expected.

Case Study: City Specific Trends

I have focused on individual cities such as Mombasa and Nyeri, to explore their unique weather patterns:

Mombasa Temperature Trends

plt.plot(monthly_avg_temp_msa)
plt.title('Temperature Trends in Mombasa Over Time')
Enter fullscreen mode Exit fullscreen mode

This city showed significant variation in temperature across the year.

Nyeri Rainfall Trends

plt.plot(monthly_avg_rain_nyr)
plt.title('Rainfall Trends in Nyeri Over Time')
Enter fullscreen mode Exit fullscreen mode

The rainfall data for Nyeri displayed a clear seasonal pattern, with rainfall peaking during certain months.

Conclusion

This analysis provides a comprehensive overview of the weather conditions in major cities, highlighting the temperature, rainfall, and other key weather variables. By using visualizations like histograms, line charts, pie charts, and heatmaps, we were able to extract meaningful insights into the data. Further analysis could involve comparing these trends with historical weather patterns or exploring predictive modeling to forecast future weather trends.

You can find the Jupyter Notebook with the full code for this analysis in my GitHub repository.


Top comments (0)