Relationship Between SVD And PCA
Imagine you have a dataset—maybe a table with different measurements like height and weight of people. Some of these measurements might be highly related (e.g., height and weight are often correlated). Instead of storing all this information separately, wouldn’t it be nice to find a smaller set of "essential features" that still capture most of the original information?
- PCA (Principal Component Analysis) is a way to find the most important patterns (directions of variance) in the data and represent it in a simpler form.
- SVD (Singular Value Decomposition) is a more general mathematical tool that breaks down any dataset into fundamental building blocks.
Interestingly, PCA is computed using SVD! That means SVD is the mathematical engine behind PCA.
How They Relate
-
SVD Breaks Down Any Matrix
- SVD is a universal technique that can decompose any matrix (dataset) into three parts: \(A = U \Sigma V^T\)
- It helps us understand the essential structure of data.
-
PCA Uses SVD to Find Principal Components
- PCA is specifically used for datasets where we want to find the most important directions (principal components) that explain the most variation in the data.
- Instead of working with the original data matrix A, PCA works with the covariance matrix \(A^T A\), which describes how different features relate to each other.
- Applying SVD on this covariance matrix gives us the principal components.
The Mathematical Connection
- Given a centered data matrix X(where the mean of each column is zero), PCA finds the eigenvectors of the covariance matrix \(X^T X\).
- However, instead of computing eigenvectors directly, we can use SVD on X: \(X= U \Sigma V^T\)
- The columns of V are the principal components (directions of highest variance).
- The singular values in \(Σ\) relate to how much variance each principal component explains.
Thus, PCA is just SVD applied to the dataset after centering it.
Summary
- SVD is the broader mathematical technique—it can be used for compression, solving equations, and more.
- PCA is a specific application of SVD, used to find the most important features (dimensions) in data.
- Instead of manually computing eigenvalues/eigenvectors, PCA is efficiently computed using SVD in modern algorithms.
Example: PCA and SVD in Action
Let’s take a simple 2D dataset (height and weight of 3 people) and see how PCA and SVD work together.
Step 1: Sample Dataset (Height & Weight)
\(X = \begin{bmatrix} 180 & 75 \\ 170 & 68 \\ 160 & 60 \end{bmatrix}\)
Each row is a person, and each column is a feature (height in cm, weight in kg).
Step 2: Center the Data (PCA Step)
PCA works best when the data is centered, meaning we subtract the mean of each column.
-
Compute Mean of Each Column
- Height Mean =170
- Weight Mean =67.67
-
Subtract the Mean from Each Value
\(X_{\text{centered}} = \begin{bmatrix} 180 - 170 & 75 - 67.67 \\ 170 - 170 & 68 - 67.67 \\ 160 - 170 & 60 - 67.67 \end{bmatrix} = \begin{bmatrix} 10 & 7.33 \\ 0 & 0.33 \\ -10 & -7.67 \end{bmatrix}\)
Step 3: Compute SVD of Centered Data
Perform Singular Value Decomposition (SVD):
\(Xcentered=UΣV^T\)
Let’s assume (after calculating using a tool like NumPy) the result is:
\(U = \begin{bmatrix} 0.74 & -0.67 \\ 0.05 & 0.74 \\ -0.67 & -0.05 \end{bmatrix}\)
\(\Sigma = \begin{bmatrix} 14.4 & 0 \\ 0 & 0.7 \end{bmatrix}\)
\(V^T = \begin{bmatrix} 0.78 & 0.62 \\ -0.62 & 0.78 \end{bmatrix}\)
Step 4: Interpret the Results
- First Principal Component: V is a matrix of eigenvectors (each column is an eigenvector) .The first row of \(V^T\) is (0.78, 0.62). This means the most important pattern in the data is 78% height and 62% weight.
- Second Principal Component: The second of \(V^T\) row is (-0.62, 0.78), which captures the remaining variance.
- Singular Values: The first value in Σ is much larger than the second (0.7), meaning the first principal component explains almost all the variance.
The eigenvectors are referred to as principal axes or principal directions of the data. When the data is projected onto these principal axes, the resulting values are called principal components, also known as PC scores—these represent new, transformed variables.
The j-th principal component corresponds to the j-th column of XV. Similarly, the coordinates of the i-th data point in the new principal component space are found in the i-th row of XV.
Step 5: Reduce to 1D (Dimensionality Reduction)
We can now project the data onto the first principal component (the most important one):
\(X' = X_{\text{centered}} \cdot V_{\text{first column}} = \begin{bmatrix} 10 & 7.33 \\ 0 & 0.33 \\ -10 & -7.67 \end{bmatrix} \begin{bmatrix} 0.78 \\ 0.62 \end{bmatrix} = \begin{bmatrix} 10.98 \\ 0.20 \\ -11.18 \end{bmatrix}\)
Now, instead of storing 2D data (height & weight), we can approximate each person with just one number, capturing most of the information!