amananandrai

Posted on Jun 18, 2020 • Edited on Oct 31, 2023

Basic Machine Learning Cheatsheet using Python [10 Classification & Regression Methods]

#machinelearning #datascience #beginners #python

Machine Learning is the technology which is growing at a very fast pace in today's world. It is a subset of Artificial Intelligence. Some of the uses of machine learning in our daily life are Face recognition which we use to unlock our smartphones, Home assistants like Google Home, Amazon Alexa and Self-driving cars.

Machine Learning is teaching the computer to perform certain tasks without without being explicitly coded. It means that the system gets a certain degree of decision making capability. Machine Learning can be divided into three major categories:-

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning

Supervised Learning is known as supervised because in this method the model learns under the supervision of a teacher. The model has both input and output used for training. It means that the learner knows the output during the training process and trains the model to reduce the error in predict. The two major types of supervised learning methods are - Classification and Regression.

Unsupervised Learning

Unsupervised Learning means that there is no supervisor for the process of learning. The model uses just input for training. The output is learned from the inputs only. The major types of unsupervised learning are Clustering in which we cluster similar types of things together and finding patterns in unlabelled datasets.

Reinforcement Learning

Reinforcement Learning is the type of learning in which the model learns to take decisions based on rewards or punishment. The learner takes a decision and it receives feedback for the decision in the form of reward or punishment. The learner tries to maximize the rewards. It is used in solving Gaming algorithms or in Robotics where the robots learns by performing tasks and getting feedback in the form of rewards or punishment.

In this post I am going to explain the two major methods of Supervised Learning :-

Classification - In Classification, the output is discrete data. In simpler words, it means that we are going to categorize data based on certain features. Some of the basic examples are :- Differentiating between Apples and Oranges based on their shapes, color, texture, etc. In this example shape, color and texture are known as features and the output is "Apple" or "Orange" which are known as Classes. As the output is known as classes therefore the method is called Classification.
Regression - In Regression, the output is continuous data. In this method, we predict the trends of data based on the features and the result does not belong to a certain category or class, it gives a numeric output which is real number. Some of the basic examples are:- Predicting the House Prices based on certain features like size of the house, location of the house, and no. of floors, etc. Another example of regression is predicting the sales of a certain good or the stock price of a certain company.

Python provides a lot of tools for performing Classification and Regression. One of the most used library is scikit-learn. It provides many models for Machine Learning.

The basic steps of supervised machine learning are-

Loading the necessary libraries
Loading the dataset
Splitting the dataset into training and test set
Training the model
Evaluating the model

Loading the Libraries



#Numpy deals with large arrays and linear algebra
import numpy as np
# Library for data manipulation and analysis
import pandas as pd 

# Metrics for Evaluation of model Accuracy and F1-score
from sklearn.metrics  import f1_score,accuracy_score

#Importing the Decision Tree from scikit-learn library
from sklearn.tree import DecisionTreeClassifier

# For splitting of data into train and test set
from sklearn.model_selection import train_test_split

Loading the Dataset



train=pd.read_csv("/input/hcirs-ctf/train.csv")
# read_csv function of pandas reads the data in CSV format
# from path given and stores in the variable named train
# the data type of train is DataFrame

Splitting into Train & Test set



#first we split our data into input and output
# y is the output and is stored in "Class" column of dataframe
# X contains the other columns and are features or input
y = train.Class
train.drop(['Class'], axis=1, inplace=True)
X = train

# Now we split the dataset in train and test part
# here the train set is 75% and test set is 25%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=2)

Training the model



# Training the model is as simple as this
# Use the function imported above and apply fit() on it
DT= DecisionTreeClassifier()
DT.fit(X_train,y_train)

Evaluating the model



# We use the predict() on the model to predict the output
pred=DT.predict(X_test)

# for classification we use accuracy and F1 score
print(accuracy_score(y_test,pred))
print(f1_score(y_test,pred))

# for regression we use R2 score and MAE(mean absolute error)
# all other steps will be same as classification as shown above
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
print(mean_absolute_error(y_test,pred))
print(r2_score(y_test,pred))

As I have shown the basic steps and how to do the Classification and Regression now its time to learn about some Classification and Regression methods. I have compiled a collection of 10 Classification and 10 Regression functions which are popular. Import these methods and use in place of DecisionTreeClassifier() and enjoy Machine Learning.

10 popular Classification Methods

Logistic Regression


 python
from sklearn.linear_model import LogisticRegression

Support Vector Machine


 python
from sklearn.svm import SVC

Naive Bayes(Guassian, Multinomial)


 python
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB

Stochastic Gradient Descent Classifier


 python
from sklearn.linear_model import SGDClassifier

KNN (k-nearest neighbour)


 python
from sklearn.neighbors import KNeighborsClassifier

Decision Tree


 python
from sklearn.tree import DecisionTreeClassifier

Random Forest


 python
from sklearn.ensemble import RandomForestClassifier

Gradient Boosting Classifier


 python
from sklearn.ensemble import GradientBoostingClassifier

LGBM Classifier


 python
from lightgbm import LGBMClassifier

XGBoost Classifier


 python
from xgboost.sklearn import XGBClassifier

10 popular Regression Methods

Linear Regression


 python
from sklearn.linear_model import LinearRegression

LGBM Regressor


 python
from lightgbm import LGBMRegressor

XGBoost Regressor


 python
from xgboost.sklearn import XGBRegressor

CatBoost Regressor


 python
from catboost import CatBoostRegressor

Stochastic Gradient Descent Regressor


 python
from sklearn.linear_model import SGDRegressor

Kernel Ridge Regression


 python
from sklearn.kernel_ridge import KernelRidge

Elastic Net Regression


 python
from sklearn.linear_model import ElasticNet

Bayesian Ridge Regression


 python
from sklearn.linear_model import BayesianRidge

Gradient Boosting Regression


 python
from sklearn.ensemble import GradientBoostingRegressor

Support Vector Machine


 python
from sklearn.svm import SVR

I hope it was helpful for the Machine Learning newbies and budding Data scientists. Please, upvote and share among your friends if you liked the post.

DEV Community

Basic Machine Learning Cheatsheet using Python [10 Classification & Regression Methods]

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Loading the Libraries

Loading the Dataset

Splitting into Train & Test set

Training the model

Evaluating the model

10 popular Classification Methods

Logistic Regression

Support Vector Machine

Naive Bayes(Guassian, Multinomial)

Stochastic Gradient Descent Classifier

KNN (k-nearest neighbour)

Decision Tree

Random Forest

Gradient Boosting Classifier

LGBM Classifier

XGBoost Classifier

10 popular Regression Methods

Linear Regression

LGBM Regressor

XGBoost Regressor

CatBoost Regressor

Stochastic Gradient Descent Regressor

Kernel Ridge Regression

Elastic Net Regression

Bayesian Ridge Regression

Gradient Boosting Regression

Support Vector Machine

Top comments (0)

Read next

🚀 When to Use VPS, Vercel, and Cloudflare Worker: A Detailed Comparison

PHP OOP Part-2: Constructor and Destructor

Oh My Zsh: A Simple Guide for Developers

Connect to multiple databases, make or generate SQL queries, analyze or visualize.