10.2 SVD Applications

1Reading¶

Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4

2Learning Objectives¶

By the end of this page, you should know:

what is the condition number of a matrix
how to compute bases of the fundamental subsapces
what is the pseudoinverse of a matrix and how to compute it

3Introduction¶

The next few lectures will focus on engineering and AI applications of the SVD. For now, we highlight some more technical linear algebra applications: these are all immensely important from a practical perspective and form subroutines for most real-world applications of linear algebra.

3.1The Condition Number¶

Most numerical calculations that require solving a linear equation $A\vv x=\vv b$ are as reliable as possible when the SVD of $A$ is used. Since the matrices $U$ and $V$ have orthonormal columns, they do not affect lengths or angles between vectors. For example, for $U\in\mathbb{R}^{m\times r}$ , we have:

\langle U \vv x, U \vv y \rangle = \vv x^T U^T U \vv y = \vv x^T \vv y = \langle \vv x, \vv y \rangle

(1)

for any $\vv x,\vv y\in\mathbb{R}^m$ . Therefore, any numerical issues that arise will be due to the diagonal entries of Σ, i.e., due to the singular values of $A$ .

In particular, if some of singular values are much larger than others, this means certain directions are stretched out much more than others, which can lead to rounding errors on a computer. A natural way to quantify this notion is using the singular values of $A$ .

Python Break!¶

In this section, we’ll demonstrate the significance of the condition number of the coefficient matrix $A$ when solving a system of linear equations $A\vv x = \vv b$ (where in this example, we assume $A$ is square and invertible). We’ll give a few illustrative examples of what might go wrong when $A$ has a high condition number $\kappa(A)$ .

Sensitivity to the constant vector $b$ : Let’s consider what happens when the constant vector $\vv b$ is replace by the vector $\vv b + \vv \epsilon$ , where ε is some vector representing a “measurement error” in $\vv b$ . Let’s rewrite the linear system $A\vv x = \vv b + \vv \epsilon$ as the following:

\begin{align*} &A\vv x = \vv b + \vv \epsilon \\ &\iff \vv x = A^{-1} \vv b + A^{-1} \vv \epsilon \end{align*}

(2)

Here, we see that the magnitude of the error in our estimate of $\vv b$ is $\|\vv \epsilon\|$ , and the magnitude of the error in our solution $\vv x$ is $\|A^{-1}\vv{\epsilon}\| = \| (U\Sigma V^\top)^{-1} \vv \epsilon\| = \|\Sigma^{-1} \vv \epsilon\|$ . Here, we have used the singular value decomposition of $A$ , together with the fact that orthogonal matrices preserve norms, to isolate the dependence of the error in $\vv x$ on $\vv\epsilon$ . Then, the quantity $\frac{\|\vv \epsilon\|}{\|\vv b\|}$ denotes the relative error in the constant vector; and the quantity $\frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\Sigma^{-1}\vv{b}\|}$ denotes the relative error in the solution vector.

In many applications, we are interested in when small relative changes in the constant vector lead to large changes in the solution vector (this is bad). The ratio of their relative errors is $\frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\Sigma^{-1}\vv{b}\|} \div \frac{\|\vv \epsilon\|}{\|\vv b\|} = \frac{\|\vv b\|}{\|\Sigma^{-1}\vv{b}\|} \cdot \frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\vv \epsilon\|}$ . In the very worst case, where $\vv b$ and $\vv \epsilon$ are chosen so that this quantity is maximized, we see that

\max_{\vv b, \vv \epsilon} \left(\frac{\|\vv b\|}{\|\Sigma^{-1}\vv{b}\|} \cdot \frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\vv \epsilon\|}\right) = \frac{\sigma_{\max}(A^{-1})}{\sigma_{\min}(A^{-1})} = \kappa(A^{-1}) = \kappa (A).

(3)

Furthermore, this tells us exactly which vectors to pick to achieve this worst case error; we choose $\vv b$ to be in the direction of the right singular vector corresonding to the smallest singular value of $A^{-1}$ , and we choose $\vv \epsilon$ to be in the direction of the right singular vector corresponding to the largest singular value of $A^{-1}$ .

Let’s see this in practice. In the following Python code, we define an ill-conditioned coefficient matrix A and demonstrate how small relative errors in the constant vector b can lead to large relative errors in the solution x. Below, we introduce the np.linalg.svd function, which returns a tuple (U, S, Vh) such that A = U @ S @ Vh (here @ denotes matrix multiplication).

import numpy as np

A = np.array([
    [   0. ,    0.,     1.,     ],
    [   0.,     1.,     1.      ],
    [   10**-6, 1.,     1.      ]
])
print('A (note the bottom left entry):\n', A, '\n')

singular_values = np.linalg.svd(A).S
print('Singular values of A:\n', singular_values, '\n')

condition_number = max(singular_values) / min(singular_values)
print('Condition number of A:\n', condition_number, '\n')

# Choose b and eps to be in the worst possible directions
U, S, Vh = np.linalg.svd(np.linalg.inv(A))
b = Vh[2, :]
eps = 0.001 * Vh[0, :]

print('b:\n', b, '\n')
print('eps:\n', eps, '\n')

b_rel_error = np.linalg.norm(eps) / np.linalg.norm(b)
print('Relative error in b:\n', b_rel_error, '\n')

Ainv = np.linalg.inv(A)

x = np.linalg.solve(A, b)
x_rel_error = np.linalg.norm(np.linalg.solve(A, b + eps) - x) / np.linalg.norm(x)
print('Relative error in x:\n', x_rel_error, '\n')

ratio = x_rel_error / b_rel_error
print('Ratio:\n', ratio, '\n')

A (note the bottom left entry):
 [[0.e+00 0.e+00 1.e+00]
 [0.e+00 1.e+00 1.e+00]
 [1.e-06 1.e+00 1.e+00]] 

Singular values of A:
 [2.13577921e+00 6.62153447e-01 7.07106781e-07] 

Condition number of A:
 3020447.91804474 

b:
 [0.36904818 0.6571923  0.6571923 ] 

eps:
 [-3.53580218e-16  7.07106781e-04 -7.07106781e-04] 

Relative error in b:
 0.0009999999999999998 

Relative error in x:
 3020.447918044611 

Ratio:
 3020447.918044612

As we can see, for certain choices of b and eps, we can make the ratio of the relative error blow up all the way to the condition number! This can even be taken to the extreme. If $\kappa(A)$ is high enough, then even without explicitly adding noise, the computer might not even find a good solution to $A\vv x = b$ ! This is because the miniscule amount of noise introduced by floating point precision could get blown up by the ill-conditioning.

4Computing Bases of Fundamental Subspaces¶

Given an SVD for an $m\times n$ matrix $A\in\mathbb{R}^{m\times n}$ with $\text{rank}(A)=r$ , let $\vv u_1,\ldots,\vv u_r$ be the left singular vectors, $\vv v_1,\ldots,\vv v_r$ the right singular vectors, and $\sigma_1,\ldots,\sigma_r$ the singular values.

Recall that we showed that $\vv u_1,\ldots,\vv u_r$ forms a basis for $\text{Col}(A)$ . Let $\vv u_{r+1},\ldots,\vv u_m$ be an orthonormal basis for $\text{Col}(A)^\perp$ so that $\vv u_1,\ldots,\vv u_m$ form a basis for $\mathbb{R}^m$ , computed for example using the Gram-Schmidt Process. Then, by the Fundamental Theorem of Linear Algebra (FTLA), we have that $\text{Col}(A)^\perp = \text{Null}(A^T) = \text{span}\{\vv u_{r+1},\ldots,\vv u_m\}$ , i.e., these vectors form an orthonormal basis for $\text{Null}(A^T)=\text{LNull}(A)$ .

Next, recall that $\vv v_1,\ldots,\vv v_r,\vv v_{r+1},\ldots,\vv v_n$ , the eigenvectors of $A^TA$ , form an orthonormal basis of $\mathbb{R}^n$ . Since $A\vv v_i = 0$ for $i = r+1, \ldots, n$ , we have that $\vv v_{r+1},\ldots,\vv v_n$ span a subspace of $\text{Null}(A)$ of dimension $n-r$ . But, by the FTLA, $\text{dim Null}(A) = n - \text{rank}(A) = n-r$ .

Therefore, $\vv v_{r+1},\ldots,\vv v_n$ are an orthonormal basis for $\text{Null}(A)$ .

Finally, $\text{Null}(A)^\perp = \text{Col}(A^T) = \text{Row}(A)$ . But $\text{Null}(A)^\perp = \text{span}\{\vv v_1,\ldots,\vv v_r\}$ since the $\vv v_i$ are an orthonormal basis for $\mathbb{R}^n$ , and thus $\vv v_1,\ldots,\vv v_r$ are an orthonormal basis for $\text{Row}(A)$ .

Specializing these observations to square matrices, we have the following theorem characterizing invertible matrices:

5The Pseudoinverse of $A$ ¶

Recall the least squares problem of finding a vector $\vv x$ that minimizes the objective $\|A \vv x-\vv b\|^2$ . We saw that the least squares solution is given by the solution to the normal equations:

A^TA\hat{\vv x} = A^T\vv b \quad \text{(NE)}

(4)

Let’s rewrite (NE) using the SVD $A = U\Sigma V^T$ , $A^T = V\Sigma^T U^T = V\Sigma U^T$ ( $\Sigma = \Sigma^T$ )

A^TA\hat{\vv x} = V\Sigma \underbrace{U^T U}_{I}\Sigma V^T\hat{\vv x} = \underbrace{V\Sigma^2 V^T\hat{\vv x}}_{(a)} = \underbrace{V\Sigma U^T}_{(b)}\vv b

(5)

Let’s start by left multiplying (a) and (b) by $V^T$ to take advantage of $V^TV = I_r$ :

V^T(V\Sigma^2V^T\hat{\vv x}) = V^T(V\Sigma U^T \vv b) \Rightarrow \Sigma^2V^T\hat{\vv x} = \Sigma U^T \vv b.

(6)

Now, let’s isolate $V^T\hat{\vv x}$ by multiplying both sides by $\Sigma^{-2}$ :

V^T\hat{\vv x} = \Sigma^{-1}U^T\vv b.

(7)

Finally, note that $\hat{\vv x}$ satisfies (7) if $\hat{\vv x} = V\Sigma^{-1}U^T\vv b + \vv n$ (again since $V^TV=I$ ) for any $\vv n \in \text{Null}(V^T) = \text{Col}(V)^\perp$ . The special solution $\hat{\vv x}^* = V\Sigma^{-1}U^T\vv b$ can be shown to be the minimum norm least squares solution when several $\hat{\vv x}$ exist such that $A\hat{\vv x} = \vv b$ . The matrix

A^+ = V\Sigma^{-1}U^T

(8)

is called the pseudoinverse of $A$ , and is also known as the Moore-Penrose inverse of $A$ .

If we look at $A\hat{\vv x}^* = AA^+ \vv b$ , we observe that:

A\hat{\vv x}^* = U\Sigma \underbrace{V^TV}_{I}\Sigma^{-1}U^Tb = UU^T\vv b,

(9)

i.e., $A\hat{\vv x}^*$ is the orthogonal projection $\hat{\vv b}$ of $\vv b$ onto $\text{Col}(A)$ .