I am currently reading a very interesting book by James Gate, entitled Storytelling with Data: The New Visualization Data Guide to Reaching Your Business Aim in The Fastest Way. In the book, I found a very interesting chapter, that describes the importance of the Context, when extracting insights from data.
I have elaborated on the author’s thoughts and here I describe what I have learned, regarding the importance of Data Context. I exploit a practical example to illustrate the concepts I learned.
Context Analysis involves the analysis of all the world around a dataset. The world around a dataset may include different aspects. For example, if you are measuring the temperature of the sea surface over time, context may include the weather conditions, the presence of some ships, and so on.
Three elements concur to define Context Analysis:
- Events
- Environment
- Time
I consider the three aspects separately, and I use a practical example to explain them.
1 Setup of the Scenario
For example, I consider the number of passengers carried by Italian air transport from 1970 to 2020. The used dataset is released by the World Data Bank under the CC-BY 4.0 license.
The objective of this section is to convert the raw dataset into a time series, that contains the considered indicator for Italy.
Firstly, I load the dataset as a Pandas Dataframe:
import pandas as pd
df = pd.read_csv('API_ITA_DS2_en_csv_v2_3472313.csv')
The original dataset contains different indicators, thus I select only what interests me:
indicator = 'Air transport, passengers carried'
df_ind = df[df['Indicator Name'] == indicator]
Then, I drop the unused columns:
df_ind.drop(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code','Unnamed: 65'], axis=1, inplace = True)
I rename the index to value:
df_ind.rename(index={511 : 'value'}, inplace=True)
Now, I build the time series:
Continue reading on Towards Data Science
Top comments (0)