DEV Community

Cover image for Demystifying Machine Learning for Beginners (Pt.2)
Code_Jedi
Code_Jedi

Posted on • Edited on

Demystifying Machine Learning for Beginners (Pt.2)

If you're a confused beginner like I was when just starting out with machine learning in python, then stick around, because today, I'll be trying my best at demystifying and simplifying machine learning for you!


In the last "Demystifying machine learning for beginners" blog post, I've explained and demonstrated how to plot data, as well as classify new pieces of data. In this blog post, I'll be demonstrating how to predict a value based on another value using linear regression!


Let's get started!

First, import the required libraries(you can install them by using pip install or pip3 install):

from sklearn.linear_model import LinearRegression
import pandas
from sklearn import preprocessing
import numpy as np
Enter fullscreen mode Exit fullscreen mode

For this tutorial, we're going to be using this dataset for medical expenses based on values such as: age, sex, bmi and more...

df = pandas.read_csv('insurance.csv')
Enter fullscreen mode Exit fullscreen mode

Next, we're going make the "age" column represent the X axis, and let the "charges" column represent the Y axis, which will be the set of values which we are going to try to predict:

X = np.array(df["age"]).reshape((-1, 1))
y = np.array(df["charges"])
Enter fullscreen mode Exit fullscreen mode
  • The "age" and "charges" columns are turned into arrays using the "np.array" method.
  • The "reshape" function then makes sure to turn the "age" column into a 2D array.

Before predicting new values, finally add these lines of code:

model = LinearRegression()
model.fit(X, y)
Enter fullscreen mode Exit fullscreen mode
  • This will define our linear regression model, as well as fit it with the "X" and "y" values defined earlier.

Now we can finally predict new values!

Put these lines of code at the end of your script:

X_predict = [[35]]
y_predict = model.predict(X_predict)
print(y_predict)
Enter fullscreen mode Exit fullscreen mode
  • _This will try to predict the insurance price based on the age, in this case, the age is "35 years old".

If you run your code, your output should look like this:

>[12186.1766594]
Enter fullscreen mode Exit fullscreen mode
  • As you can see, our python script predicted that someone aged 35 years old would have medical charges of about 12186$

Now of course, this prediction isn't very accurate because a person and a person's medical expenses aren't going to be defined solely by their age, that's why, we're going to add some more values to our model.


First of all, to add another value to our model, replace X = np.array(df["age"]).reshape((-1, 1)) with:

X = list(zip(df["age"], df["bmi"]))
Enter fullscreen mode Exit fullscreen mode
  • This will add the values of the "bmi" column to the age values to create a 2D array.

Now we'll be able to predict a person's medical charges based not only on their age, but also based on their bmi:

X_predict = [[35, 45]]
y_predict = model.predict(X_predict)
print(y_predict)
Enter fullscreen mode Exit fullscreen mode
  • Here our python script is going to predict one's medical charges based on their age and their bmi, where "35" is their age and "45" is their bmi.

If you run your script, the output should look something like this:

>[17026.20170095]
Enter fullscreen mode Exit fullscreen mode

As you can see, this is different from our previous result because we've introduced new pieces of information into our model.


Nice! your first working linear regression model that can predict values.

noiiiice


You can now experiment with this code, add more values, predict different sets of values and more!


Byeeeeeđź‘‹

Top comments (1)

Collapse
 
salarc123 profile image
SalarC123

Why didn’t you use train_test_split before training your data?