10.2 SVD Applications
1Reading¶
Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4
2Learning Objectives¶
By the end of this page, you should know:
- what is the condition number of a matrix
- how to compute bases of the fundamental subsapces
- what is the pseudoinverse of a matrix and how to compute it
3Introduction¶
The next few lectures will focus on engineering and AI applications of the SVD. For now, we highlight some more technical linear algebra applications: these are all immensely important from a practical perspective and form subroutines for most real-world applications of linear algebra.
3.1The Condition Number¶
Most numerical calculations that require solving a linear equation are as reliable as possible when the SVD of is used. Since the matrices and have orthonormal columns, they do not affect lengths or angles between vectors. For example, for , we have:
for any . Therefore, any numerical issues that arise will be due to the diagonal entries of Σ, i.e., due to the singular values of .
In particular, if some of singular values are much larger than others, this means certain directions are stretched out much more than others, which can lead to rounding errors on a computer. A natural way to quantify this notion is using the singular values of .
Python Break!¶
In this section, we’ll demonstrate the significance of the condition number of the coefficient matrix when solving a system of linear equations (where in this example, we assume is square and invertible). We’ll give a few illustrative examples of what might go wrong when has a high condition number .
Sensitivity to the constant vector : Let’s consider what happens when the constant vector is replace by the vector , where ε is some vector representing a “measurement error” in . Let’s rewrite the linear system as the following:
Here, we see that the magnitude of the error in our estimate of is , and the magnitude of the error in our solution is . Here, we have used the singular value decomposition of , together with the fact that orthogonal matrices preserve norms, to isolate the dependence of the error in on . Then, the quantity denotes the relative error in the constant vector; and the quantity denotes the relative error in the solution vector.
In many applications, we are interested in when small relative changes in the constant vector lead to large changes in the solution vector (this is bad). The ratio of their relative errors is . In the very worst case, where and are chosen so that this quantity is maximized, we see that
Furthermore, this tells us exactly which vectors to pick to achieve this worst case error; we choose to be in the direction of the right singular vector corresonding to the smallest singular value of , and we choose to be in the direction of the right singular vector corresponding to the largest singular value of .
Let’s see this in practice. In the following Python code, we define an ill-conditioned coefficient matrix A
and demonstrate how small relative errors in the constant vector b
can lead to large relative errors in the solution x
. Below, we introduce the np.linalg.svd
function, which returns a tuple (U, S, Vh)
such that A = U @ S @ Vh
(here @
denotes matrix multiplication).
import numpy as np
A = np.array([
[ 0. , 0., 1., ],
[ 0., 1., 1. ],
[ 10**-6, 1., 1. ]
])
print('A (note the bottom left entry):\n', A, '\n')
singular_values = np.linalg.svd(A).S
print('Singular values of A:\n', singular_values, '\n')
condition_number = max(singular_values) / min(singular_values)
print('Condition number of A:\n', condition_number, '\n')
# Choose b and eps to be in the worst possible directions
U, S, Vh = np.linalg.svd(np.linalg.inv(A))
b = Vh[2, :]
eps = 0.001 * Vh[0, :]
print('b:\n', b, '\n')
print('eps:\n', eps, '\n')
b_rel_error = np.linalg.norm(eps) / np.linalg.norm(b)
print('Relative error in b:\n', b_rel_error, '\n')
Ainv = np.linalg.inv(A)
x = np.linalg.solve(A, b)
x_rel_error = np.linalg.norm(np.linalg.solve(A, b + eps) - x) / np.linalg.norm(x)
print('Relative error in x:\n', x_rel_error, '\n')
ratio = x_rel_error / b_rel_error
print('Ratio:\n', ratio, '\n')
A (note the bottom left entry):
[[0.e+00 0.e+00 1.e+00]
[0.e+00 1.e+00 1.e+00]
[1.e-06 1.e+00 1.e+00]]
Singular values of A:
[2.13577921e+00 6.62153447e-01 7.07106781e-07]
Condition number of A:
3020447.91804474
b:
[0.36904818 0.6571923 0.6571923 ]
eps:
[-3.53580218e-16 7.07106781e-04 -7.07106781e-04]
Relative error in b:
0.0009999999999999998
Relative error in x:
3020.447918044611
Ratio:
3020447.918044612
As we can see, for certain choices of b
and eps
, we can make the ratio of the relative error blow up all the way to the condition number! This can even be taken to the extreme. If is high enough, then even without explicitly adding noise, the computer might not even find a good solution to ! This is because the miniscule amount of noise introduced by floating point precision could get blown up by the ill-conditioning.
4Computing Bases of Fundamental Subspaces¶
Given an SVD for an matrix with , let be the left singular vectors, the right singular vectors, and the singular values.
Recall that we showed that forms a basis for . Let be an orthonormal basis for so that form a basis for , computed for example using the Gram-Schmidt Process. Then, by the Fundamental Theorem of Linear Algebra (FTLA), we have that , i.e., these vectors form an orthonormal basis for .
Next, recall that , the eigenvectors of , form an orthonormal basis of . Since for , we have that span a subspace of of dimension . But, by the FTLA, .
Therefore, are an orthonormal basis for .
Finally, . But since the are an orthonormal basis for , and thus are an orthonormal basis for .
Specializing these observations to square matrices, we have the following theorem characterizing invertible matrices:
5The Pseudoinverse of ¶
Recall the least squares problem of finding a vector that minimizes the objective . We saw that the least squares solution is given by the solution to the normal equations:
Let’s rewrite (NE) using the SVD , ()
Let’s start by left multiplying (a) and (b) by to take advantage of :
Now, let’s isolate by multiplying both sides by :
Finally, note that satisfies (7) if (again since ) for any . The special solution can be shown to be the minimum norm least squares solution when several exist such that . The matrix
is called the pseudoinverse of , and is also known as the Moore-Penrose inverse of .
If we look at , we observe that:
i.e., is the orthogonal projection of onto .