Talking about Machine Learning (I): Setup

#machinelearning #tutorials #python #beginners

Next couple of post of this series will be a tutorial about machine learning, one of the most popular branches of AI.

Environment

I will work with the following libraries(NumPy, SciPy, scikit-learn, matplotlib). I build a tiny install script.

mkdir -p talkingaboutml/talkingaboutml
python3 -m virtualenv talkingaboutml/venv
talkingaboutml/venv/bin/pip install numpy scipy scikit-learn matplotlib

now, your talkingaboutml dir looks like:

talkingaboutml/
├── talkingaboutml (here we store our examples)
└── venv

First example

On our first example i will use sckit datasets (are avaiable on sklearn.datasets), there are many example datasets. I Choose iris. This dataset is a multi-class classification dataset.

As a First example, i will train a simple classification and run a predict.

We need some imports, datasets, accuracy metric and a linear svc:

from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC

Load dataset, this datasets are already divided (data, target).

iris = datasets.load_iris()
X = iris.data # each register, a iris with features
y = iris.target # classification for each register

feature_number = X.shape[1]

Create classification, train and predict.


clf = SVC(kernel='linear', C=1.0, probability=True, random_state=0) # 'Linear SVC'

clf.fit(X, y) # Train

y_pred = clf.predict(X)
accuracy = accuracy_score(y, y_pred)
print(accuracy)

So... let's do this, in this example i train a single classificator with different C(penalty) values. This parameter tells svm how match you want to avoid misclassifying each training example. A good explanation can be found here or here.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC

iris = datasets.load_iris()

X = iris.data # each register, a iris with features
y = iris.target # clasiffication ir each register


feature_number = X.shape[1]

penalties = list(np.arange(0.5,10.0, 0.1))

accs = []

for C in penalties:
    clf = SVC(kernel='linear', C=C, probability=True, random_state=0) # 'Linear SVC'

    clf.fit(X, y) # Train

    y_pred = clf.predict(X)
    accuracy = accuracy_score(y, y_pred)
    accs.append(accuracy)


# plot the data
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(penalties, accs, 'r')
plt.show()

As we can see, the penalty factor. If it is too large, we have too many support vector and it may cause overfit.

DEV Community

Talking about Machine Learning (I): Setup

Environment

First example

Top comments (0)

Read next

#? List vs Tuples in python

How to update composer

DUCK (file structure) YOU!

Open Source IDS/IPS Suricata for Beginners