DEV Community

Phylis Jepchumba, MSc
Phylis Jepchumba, MSc

Posted on

Implementing Machine Learning steps using Regression Model.

From our previous article we looked at the machine learning steps. Lets now have a look at how to implement a machine learning model using Python.

The dataset used is collected from kaggle.

We will be able to predict the insurance amount for a person.

  • We start by importing necessary modules as shown:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
Enter fullscreen mode Exit fullscreen mode
  • Then import the data.
data=pd.read_csv('insurance.csv')
data
Enter fullscreen mode Exit fullscreen mode

Screenshot (37)

  • Clean the data by removing duplicate values and transform the columns into numerical values to make the easier to work with.
label=LabelEncoder()
label.fit(data.sex.drop_duplicates())
data.sex=label.transform(data.sex)

label.fit(data.smoker.drop_duplicates())
data.smoker=label.transform(data.smoker)

label.fit(data.region.drop_duplicates())
data.region=label.transform(data.region)
data

Enter fullscreen mode Exit fullscreen mode

The final dataset is as shown below;
Screenshot (38)

  • Using the cleaned dataset, now split it into training and test sets.
X=data.drop(['charges'], axis=1)
y=data[['charges']]
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

Enter fullscreen mode Exit fullscreen mode
  • After splitting the model choose the suitable algorithm. In this case we will use Linear Regression since we need to predict a numerical value based on some parameters.
model=LinearRegression())
model.fit(X_train,y_train)
Enter fullscreen mode Exit fullscreen mode
  • Now predict the testing dataset and find how accurate your predictions are.

Screenshot (39)

  • Accuracy score is predicted as follows:

Screenshot (40)

  • parameter tuning Lets find the hyperparameters which affect various variables in the dataset.

Screenshot (41)

Top comments (0)