Intro:
The myth that "there's no documentation in Agile" is not accurate. While Agile prioritizes working software, it recognizes the value of documentation. Agile promotes lightweight and flexible documentation that supports collaboration and communication within the team and with stakeholders. Documentation is integrated throughout the Agile process, focusing on important information and evolving with the product. It is a shared responsibility of the Agile team and is created as needed, updated iteratively, and kept relevant.
So the experimentation was to check how ChatGPT with prompt engineering could assist developers to speed up with a base / draft version of documentation for the code that they have written.
Prompts for Software documentation
Use case | GPT Prompt |
---|---|
Introduction to program | explain the below code in simple words for beginner |
Software Spec Documentation | Create a software spec documentation to explain the use of above code |
Flow Diagram | Please assist to create a flowchart in mermaid html for the above code |
Program Documentation | HTML markdown program documentation for the above code |
Write Test Case | write test case for the above code in markdown table |
Write unit test scripts | write test scripts for the above code |
Software Specification Documentation:
can always prompt for HTML markdown:
<!DOCTYPE html>
<html>
<head>
<title>Software Specification Document</title>
</head>
<body>
<h1>Software Specification Document</h1>
<h2>1. Introduction</h2>
<p>This document outlines the software specification for a Python program that performs time series analysis and forecasting using the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. The program takes a dataset of monthly airline passenger numbers as input and provides various functionalities for data preprocessing, model fitting, forecasting, and evaluation.</p>
<h2>2. Purpose</h2>
<p>The purpose of this software is to assist in analyzing historical airline passenger data and generating forecasts for future periods. It automates the process of selecting the best SARIMA model based on the Akaike Information Criterion (AIC) and provides visualizations of the data, model diagnostics, and forecasted results.</p>
<h2>3. Functionality</h2>
<h3>3.1. Data Loading and Preprocessing</h3>
<p>The software offers functionality to load the input dataset from a CSV file and perform preprocessing steps to enhance data readability and usability. This includes converting the "Month" column to a datetime format and setting it as the index.</p>
<h3>3.2. Data Visualization</h3>
<p>The software provides visualization capabilities to plot the loaded data as a time series graph, visualizing the monthly airline passenger numbers over time. Users can configure plot settings, such as figure size, font size, and plot style, for optimal visualization.</p>
<h3>3.3. Model Selection</h3>
<p>The software generates all possible combinations of SARIMA model parameters based on predefined ranges for the order and seasonal_order parameters. It fits each model to the training data and calculates the AIC. Finally, it identifies the SARIMA model with the lowest AIC as the best-fit model.</p>
<h3>3.4. Model Diagnostics</h3>
<p>The software provides visual diagnostics of the best-fit SARIMA model, including plots for residuals, histogram, and normal Q-Q plot. These diagnostics help users assess the goodness of fit for the selected model.</p>
<h3>3.5. Forecasting</h3>
<p>The software enables users to generate one-step-ahead and dynamic forecasts using the best-fit SARIMA model. It also provides confidence intervals for the forecasts and projects future passenger numbers using the get_forecast method.</p>
<h3>3.6. Evaluation</h3>
<p>The software calculates the Mean Absolute Percentage Error (MAPE) to evaluate the accuracy of the forecasted results. It compares the forecasted values with the ground truth values from the test dataset.</p>
<h2>4. Input and Output</h2>
<p><strong>Input:</strong> CSV file containing monthly airline passenger data.</p>
<p><strong>Output:</strong> Graphical visualizations of the data, model diagnostics, and forecasted results. Additionally, the program prints the best-fit model parameters and the MAPE for the forecasted year.</p>
<h2>5. Dependencies</h2>
<p>The software requires the following dependencies:</p>
<ul>
<li>Python 3.x</li>
<li>pandas library for data manipulation</li>
<li>numpy library for numerical operations</li>
<li>matplotlib library for data visualization</li>
<li>statsmodels library for time series analysis</li>
</ul>
<h2>6. Usage</h2>
<p>To use the software, follow these steps:</p>
<ol>
<li>Ensure that Python and the required dependencies are installed on the system.</li>
<li>Place the input dataset file ("international-airline-passengers.csv") in the same directory as the program.</li>
<li>Run the program, which will load the data, perform preprocessing, and display the time series graph.</li>
<li>Examine the model diagnostics plots to assess the goodness of fit for the selected SARIMA model.</li>
<li>Review the forecasted results and their confidence intervals.</li>
<li>Note the best-fit SARIMA model parameters and the MAPE for the forecasted year.</li>
</ol>
<h2>7. Conclusion</h2>
<p>This software provides a convenient and automated way to analyze historical airline passenger data, select the best SARIMA model, generate forecasts, and evaluate their accuracy. It aims to assist users in making informed decisions based on time series analysis and forecasting results.</p>
</body>
</html>
Flow Diagram:
Program Documentation:
<!DOCTYPE html>
<html>
<head>
<title>Python Code Documentation</title>
</head>
<body>
<h1>Python Code Documentation</h1>
<h2>Import Libraries</h2>
<pre><code># Import libraries
import warnings
import itertools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm</code></pre>
<h2>Defaults and Configuration</h2>
<pre><code># Defaults
plt.rcParams['figure.figsize'] = (20.0, 10.0)
plt.rcParams.update({'font.size': 12})
plt.style.use('ggplot')</code></pre>
<h2>Loading Data</h2>
<pre><code># Load the data
data = pd.read_csv('international-airline-passengers.csv', engine='python', skipfooter=3)</code></pre>
<h2>Data Preprocessing</h2>
<pre><code># A bit of pre-processing to make it nicer
data['Month']=pd.to_datetime(data['Month'], format='%Y-%m-%d')
data.set_index(['Month'], inplace=True)</code></pre>
<h2>Plotting the Data</h2>
<pre><code># Plot the data
data.plot()
plt.ylabel('Monthly airline passengers (x1000)')
plt.xlabel('Date')
plt.show()</code></pre>
<h2>Parameter Combinations</h2>
<pre><code># Define the d and q parameters to take any value between 0 and 1
q = d = range(0, 2)
# Define the p parameters to take any value between 0 and 3
p = range(0, 4)
# Generate all different combinations of p, q, and q triplets
pdq = list(itertools.product(p, d, q))
# Generate all different combinations of seasonal p, q, and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]</code></pre>
<h2>Fitting and Selecting Model</h2>
<pre><code># Fitting and selecting the best model based on AIC
train_data = data['1949-01-01':'1959-12-01']
test_data = data['1960-01-01':'1960-12-01']
warnings.filterwarnings("ignore") # specify to ignore warning messages
AIC = []
SARIMAX_model = []
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(train_data,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print('SARIMAX{}x{} - AIC:{}'.format(param, param_seasonal, results.aic), end='\r')
AIC.append(results.aic)
SARIMAX_model.append([param, param_seasonal])
except:
continue
print('The smallest AIC is {} for model SARIMAX{}x{}'.format(min(AIC), SARIMAX_model[AIC.index(min(AIC))][0],SARIMAX_model[AIC.index(min(AIC))][1]))
# Fitting the best model
mod = sm.tsa.statespace.SARIMAX(train_data,
order=SARIMAX_model
Test case Documentation:
Unit Test scripts:
import unittest
import pandas as pd
import numpy as np
import statsmodels.api as sm
class TestTimeSeriesAnalysis(unittest.TestCase):
def setUp(self):
# Set up test data
self.data = pd.DataFrame({'Month': ['1949-01-01', '1949-02-01', '1949-03-01'],
'Passengers': [112, 118, 132]})
self.data['Month'] = pd.to_datetime(self.data['Month'], format='%Y-%m-%d')
self.data.set_index(['Month'], inplace=True)
def test_data_preprocessing(self):
# Test data preprocessing
processed_data = preprocess_data(self.data)
self.assertEqual(len(processed_data), len(self.data))
self.assertEqual(processed_data.index[0], self.data.index[0])
self.assertEqual(processed_data.index[-1], self.data.index[-1])
def test_model_fitting(self):
# Test model fitting
model = fit_model(self.data)
self.assertIsInstance(model, sm.tsa.statespace.SARIMAX)
def test_forecasting(self):
# Test forecasting
forecast = generate_forecast(self.data, steps=3)
self.assertEqual(len(forecast), 3)
self.assertEqual(forecast.index[-1], self.data.index[-1] + pd.DateOffset(months=2))
def test_evaluation(self):
# Test evaluation
forecast = generate_forecast(self.data, steps=3)
evaluation = evaluate_forecast(forecast, self.data)
self.assertIsInstance(evaluation, float)
self.assertTrue(evaluation >= 0)
if __name__ == '__main__':
unittest.main()
Agile values working with living documentation that adds value to the project while adapting to change.
Top comments (0)