Skip to content

DEV Community

Phylis Jepchumba, MSc

Posted on Aug 18, 2021

Implementing Machine Learning steps using Regression Model.

#machinelearning #python

From our previous article we looked at the machine learning steps. Lets now have a look at how to implement a machine learning model using Python.

The dataset used is collected from kaggle.

We will be able to predict the insurance amount for a person.

We start by importing necessary modules as shown:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

Then import the data.

data=pd.read_csv('insurance.csv')
data

Clean the data by removing duplicate values and transform the columns into numerical values to make the easier to work with.

label=LabelEncoder()
label.fit(data.sex.drop_duplicates())
data.sex=label.transform(data.sex)

label.fit(data.smoker.drop_duplicates())
data.smoker=label.transform(data.smoker)

label.fit(data.region.drop_duplicates())
data.region=label.transform(data.region)
data

The final dataset is as shown below;

Using the cleaned dataset, now split it into training and test sets.

X=data.drop(['charges'], axis=1)
y=data[['charges']]
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

After splitting the model choose the suitable algorithm. In this case we will use Linear Regression since we need to predict a numerical value based on some parameters.

model=LinearRegression())
model.fit(X_train,y_train)

Now predict the testing dataset and find how accurate your predictions are.

Accuracy score is predicted as follows:

parameter tuning Lets find the hyperparameters which affect various variables in the dataset.

Top comments (0)

Subscribe

Read next

Behavioral Questions in AI Interviews: 2025 Insights

Vikas76 - Dec 5

Cómo crear un Wallpaper dinámico con la Hora y Fecha usando Python

Code Chappie - Nov 30

7 Must-Try Open-Source Tools for Python and JavaScript Developers 🚀

Arindam Majumder - Dec 12

Optimizing Large-Scale Data Processing in Python: A Guide to Parallelizing CSV Operations

pawan deore - Dec 1