PCA is a complexity reduction technique that tries to reduce a set of variables down to a smaller set of components that represent most of the information in the variables. This can be thought of as for a collection of data points applying lossy compression, meaning storing the points in a way that require less memory by trading some precision. At a conceptual level, PCA works by identifying sets of variables that share variance, and creating a component to represent that variance.
Earlier, when we were doing transpose or the matrix inverse, we relied on using Tensorflow's built in functions but for PCA, there is no such function, except one in the Tensorflow Extended (tft).
There are multiple ways you can implement a PCA in Tensorflow but since this algorithm is such an important one in the machine learning world, we will take the long route.
The reason for having PCA under Linear Algebra is to show that PCA could be implemented using the theorems we studied in this Chapter.
# To start working with PCA, let's start by creating a 2D data set
x_data = tf.multiply(5, tf.random.uniform([100], minval=0, maxval=100, dtype = tf.float32, seed = 0))
y_data = tf.multiply(2, x_data) + 1 + tf.random.uniform([100], minval=0, maxval=100, dtype = tf.float32, seed = 0)
X = tf.stack([x_data, y_data], axis=1)
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.plot(X[:,0], X[:,1], '+', color='b')
plt.grid()
We start by standardizing the data. Even though the data we created are on the same scales, its always a good practice to start by standardizing the data because most of the time the data you will be working with will be in different scales.
def normalize(data):
# creates a copy of data
X = tf.identity(data)
# calculates the mean
X -=tf.reduce_mean(data, axis=0)
return X
normalized_data = normalize(X)
plt.plot(normalized_data[:,0], normalized_data[:,1], '+', color='b')
plt.grid()
Recall that PCA can be thought of as applying lossy compression to a collection of x data points. The way we can minimize the loss of precision is by finding some decoding function f(x) ≈ c where c will be the corresponding vector.
PCA is defined by our choice of this decoding function. Specifically, to make the decoder very simple, we chose to use matrix multiplication to map c and define g(c) = Dc. Our goal is to minimize the distance between the input point x to its reconstruction and to do that we use L^2 norm. Which boils down to our encoding function c = D^T x.
Finally, to reconstruct the PCA we use the same matrix D to decode all the points and to solve this optimization problem, we use eigendecomposition.
Please note that the following equation is the final version of a lot of matrix transformations. I don't provide the derivatives because the goal is to focus on the mathematical implementation, rather than the derivation. But for the curious, You can read about the derivation in Chapter 2 Section 11.
d^* = argmax_d Tr(d^T X^T Xd) subject to dd^T = 1
To find d we can calculate the eigenvectors X^T X.
# Finding the Eigne Values and Vectors for the data
eigen_values, eigen_vectors = tf.linalg.eigh(tf.tensordot(tf.transpose(normalized_data), normalized_data, axes=1))
print("Eigen Vectors: \n{} \nEigen Values: \n{}".format(eigen_vectors, eigen_values))
Eigen Vectors:
[[-0.8908606 -0.45427683]
[ 0.45427683 -0.8908606 ]]
Eigen Values:
[ 16500.715 11025234. ]
The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude.
Now, let's use these Eigenvectors to rotate our data. The goal of the rotation is to end up with a new coordinate system where data is uncorrelated and thus where the basis axes gather all the variance. Thereby reducing the dimension.
Recall our encoding function c = D^T x, where D is the matrix containing the eigenvectors that we have calculated before.
X_new = tf.tensordot(tf.transpose(eigen_vectors), tf.transpose(normalized_data), axes=1)
plt.plot(X_new[0, :], X_new[1, :], '+', color='b')
plt.xlim(-500, 500)
plt.ylim(-700, 700)
plt.grid()
That is the transformed data.
This is section twelve of the Chapter on Linear Algebra with Tensorflow 2.0 of the Book Deep Learning with Tensorflow 2.0.
You can read this section and the following topics:
02.01 — Scalars, Vectors, Matrices, and Tensors
02.02 — Multiplying Matrices and Vectors
02.03 — Identity and Inverse Matrices
02.04 — Linear Dependence and Span
02.05 — Norms
02.06 — Special Kinds of Matrices and Vectors
02.07 — Eigendecomposition
02.08 — Singular Value Decomposition
02.09 — The Moore-Penrose Pseudoinverse
02.10 — The Trace Operator
02.11 — The Determinant
02.12 — Example: Principal Components Analysis
at Deep Learning With TF 2.0: 02.00- Linear Algebra. You can get the code for this article and the rest of the chapter here. Links to the notebook in Google Colab and Jupyter Binder are at the end of the notebook.
Top comments (0)