Next couple of post of this series will be a tutorial about machine learning, one of the most popular branches of AI.
Environment
I will work with the following libraries(NumPy, SciPy, scikit-learn, matplotlib). I build a tiny install script.
mkdir -p talkingaboutml/talkingaboutml
python3 -m virtualenv talkingaboutml/venv
talkingaboutml/venv/bin/pip install numpy scipy scikit-learn matplotlib
now, your talkingaboutml dir looks like:
talkingaboutml/
├── talkingaboutml (here we store our examples)
└── venv
First example
On our first example i will use sckit datasets (are avaiable on sklearn.datasets), there are many example datasets. I Choose iris. This dataset is a multi-class classification dataset.
As a First example, i will train a simple classification and run a predict.
We need some imports, datasets, accuracy metric and a linear svc:
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC
Load dataset, this datasets are already divided (data, target).
iris = datasets.load_iris()
X = iris.data # each register, a iris with features
y = iris.target # classification for each register
feature_number = X.shape[1]
Create classification, train and predict.
clf = SVC(kernel='linear', C=1.0, probability=True, random_state=0) # 'Linear SVC'
clf.fit(X, y) # Train
y_pred = clf.predict(X)
accuracy = accuracy_score(y, y_pred)
print(accuracy)
So... let's do this, in this example i train a single classificator with different C(penalty) values. This parameter tells svm how match you want to avoid misclassifying each training example. A good explanation can be found here or here.
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC
iris = datasets.load_iris()
X = iris.data # each register, a iris with features
y = iris.target # clasiffication ir each register
feature_number = X.shape[1]
penalties = list(np.arange(0.5,10.0, 0.1))
accs = []
for C in penalties:
clf = SVC(kernel='linear', C=C, probability=True, random_state=0) # 'Linear SVC'
clf.fit(X, y) # Train
y_pred = clf.predict(X)
accuracy = accuracy_score(y, y_pred)
accs.append(accuracy)
# plot the data
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(penalties, accs, 'r')
plt.show()
As we can see, the penalty factor. If it is too large, we have too many support vector and it may cause overfit.
Top comments (0)