Hello, I found a dataset on kaggle in the time of use of a website, so I want to find a ratio between the number of pages visited and the total time in the website.
You can find the dataset and the code in my github : https://github.com/victordalet/Kaggle_analysis/tree/feat/website_traffic
I - Installation
To do this, I use sqlalchemy
in python to convert my csv into a database and plotly
to display my results.
pip install plotly
pip install sqlalchemy
II - Code
I create a Main class, in which I retrieve my csv and put it in a database, using the get_data
method.
The result is a list of tuples, so I create the transform_data
method to obtain a double list.
Finally, I can display a simple graph between the number of pages viewed and the total time.
import pandas as pd
from sqlalchemy import create_engine, text
import plotly.express as px
class Main:
def __init__(self):
self.result = None
self.connection = None
self.engine = create_engine("sqlite:///my_database.db", echo=False)
self.df = pd.read_csv("website_wata.csv")
self.df.to_sql("website_data", self.engine, index=False, if_exists="append")
self.get_data()
self.transform_data()
self.display_graph()
def get_data(self):
self.connection = self.engine.connect()
query = text("SELECT Page_Views, Time_on_Page FROM website_data")
self.result = self.connection.execute(query).fetchall()
def transform_data(self):
for i in range(len(self.result)):
self.result[i] = list(self.result[i])
def display_graph(self):
fig = px.scatter(
self.result, x=0, y=1, title=""
)
fig.show()
Main()
III - Result
The x-axis indicates the number of pages visited by the user, while the y-axis shows the time spent on the website in minutes.
We can see that the users who stay the longest visit between 4 and 6 pages, and that between 11 and 15 pages all users stay at least a few minutes.
Top comments (0)