Skip to article frontmatterSkip to article content

10.2 SVD Applications

Dept. of Electrical and Systems Engineering
University of Pennsylvania

Binder

Lecture notes

1Reading

Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4

2Learning Objectives

By the end of this page, you should know:

  • what is the condition number of a matrix
  • how to compute bases of the fundamental subsapces
  • what is the pseudoinverse of a matrix and how to compute it

3Introduction

The next few lectures will focus on engineering and AI applications of the SVD. For now, we highlight some more technical linear algebra applications: these are all immensely important from a practical perspective and form subroutines for most real-world applications of linear algebra.

3.1The Condition Number

Most numerical calculations that require solving a linear equation Ax=bA\vv x=\vv b are as reliable as possible when the SVD of AA is used. Since the matrices UU and VV have orthonormal columns, they do not affect lengths or angles between vectors. For example, for URm×rU\in\mathbb{R}^{m\times r}, we have:

Ux,Uy=xTUTUy=xTy=x,y\langle U \vv x, U \vv y \rangle = \vv x^T U^T U \vv y = \vv x^T \vv y = \langle \vv x, \vv y \rangle

for any x,yRm\vv x,\vv y\in\mathbb{R}^m. Therefore, any numerical issues that arise will be due to the diagonal entries of Σ, i.e., due to the singular values of AA.

In particular, if some of singular values are much larger than others, this means certain directions are stretched out much more than others, which can lead to rounding errors on a computer. A natural way to quantify this notion is using the singular values of AA.

Python Break!

In this section, we’ll demonstrate the significance of the condition number of the coefficient matrix AA when solving a system of linear equations Ax=bA\vv x = \vv b (where in this example, we assume AA is square and invertible). We’ll give a few illustrative examples of what might go wrong when AA has a high condition number κ(A)\kappa(A).

Sensitivity to the constant vector bb: Let’s consider what happens when the constant vector b\vv b is replace by the vector b+ϵ\vv b + \vv \epsilon, where ε is some vector representing a “measurement error” in b\vv b. Let’s rewrite the linear system Ax=b+ϵA\vv x = \vv b + \vv \epsilon as the following:

Ax=b+ϵ    x=A1b+A1ϵ\begin{align*} &A\vv x = \vv b + \vv \epsilon \\ &\iff \vv x = A^{-1} \vv b + A^{-1} \vv \epsilon \end{align*}

Here, we see that the magnitude of the error in our estimate of b\vv b is ϵ\|\vv \epsilon\|, and the magnitude of the error in our solution x\vv x is A1ϵ=(UΣV)1ϵ=Σ1ϵ\|A^{-1}\vv{\epsilon}\| = \| (U\Sigma V^\top)^{-1} \vv \epsilon\| = \|\Sigma^{-1} \vv \epsilon\|. Here, we have used the singular value decomposition of AA, together with the fact that orthogonal matrices preserve norms, to isolate the dependence of the error in x\vv x on ϵ\vv\epsilon. Then, the quantity ϵb\frac{\|\vv \epsilon\|}{\|\vv b\|} denotes the relative error in the constant vector; and the quantity Σ1ϵΣ1b\frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\Sigma^{-1}\vv{b}\|} denotes the relative error in the solution vector.

In many applications, we are interested in when small relative changes in the constant vector lead to large changes in the solution vector (this is bad). The ratio of their relative errors is Σ1ϵΣ1b÷ϵb=bΣ1bΣ1ϵϵ\frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\Sigma^{-1}\vv{b}\|} \div \frac{\|\vv \epsilon\|}{\|\vv b\|} = \frac{\|\vv b\|}{\|\Sigma^{-1}\vv{b}\|} \cdot \frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\vv \epsilon\|}. In the very worst case, where b\vv b and ϵ\vv \epsilon are chosen so that this quantity is maximized, we see that

maxb,ϵ(bΣ1bΣ1ϵϵ)=σmax(A1)σmin(A1)=κ(A1)=κ(A).\max_{\vv b, \vv \epsilon} \left(\frac{\|\vv b\|}{\|\Sigma^{-1}\vv{b}\|} \cdot \frac{\|\Sigma^{-1}\vv{\epsilon}\|}{\|\vv \epsilon\|}\right) = \frac{\sigma_{\max}(A^{-1})}{\sigma_{\min}(A^{-1})} = \kappa(A^{-1}) = \kappa (A).

Furthermore, this tells us exactly which vectors to pick to achieve this worst case error; we choose b\vv b to be in the direction of the right singular vector corresonding to the smallest singular value of A1A^{-1}, and we choose ϵ\vv \epsilon to be in the direction of the right singular vector corresponding to the largest singular value of A1A^{-1}.

Let’s see this in practice. In the following Python code, we define an ill-conditioned coefficient matrix A and demonstrate how small relative errors in the constant vector b can lead to large relative errors in the solution x. Below, we introduce the np.linalg.svd function, which returns a tuple (U, S, Vh) such that A = U @ S @ Vh (here @ denotes matrix multiplication).

import numpy as np

A = np.array([
    [   0. ,    0.,     1.,     ],
    [   0.,     1.,     1.      ],
    [   10**-6, 1.,     1.      ]
])
print('A (note the bottom left entry):\n', A, '\n')

singular_values = np.linalg.svd(A).S
print('Singular values of A:\n', singular_values, '\n')

condition_number = max(singular_values) / min(singular_values)
print('Condition number of A:\n', condition_number, '\n')

# Choose b and eps to be in the worst possible directions
U, S, Vh = np.linalg.svd(np.linalg.inv(A))
b = Vh[2, :]
eps = 0.001 * Vh[0, :]

print('b:\n', b, '\n')
print('eps:\n', eps, '\n')

b_rel_error = np.linalg.norm(eps) / np.linalg.norm(b)
print('Relative error in b:\n', b_rel_error, '\n')

Ainv = np.linalg.inv(A)

x = np.linalg.solve(A, b)
x_rel_error = np.linalg.norm(np.linalg.solve(A, b + eps) - x) / np.linalg.norm(x)
print('Relative error in x:\n', x_rel_error, '\n')

ratio = x_rel_error / b_rel_error
print('Ratio:\n', ratio, '\n')
A (note the bottom left entry):
 [[0.e+00 0.e+00 1.e+00]
 [0.e+00 1.e+00 1.e+00]
 [1.e-06 1.e+00 1.e+00]] 

Singular values of A:
 [2.13577921e+00 6.62153447e-01 7.07106781e-07] 

Condition number of A:
 3020447.91804474 

b:
 [0.36904818 0.6571923  0.6571923 ] 

eps:
 [-3.53580218e-16  7.07106781e-04 -7.07106781e-04] 

Relative error in b:
 0.0009999999999999998 

Relative error in x:
 3020.447918044611 

Ratio:
 3020447.918044612 

As we can see, for certain choices of b and eps, we can make the ratio of the relative error blow up all the way to the condition number! This can even be taken to the extreme. If κ(A)\kappa(A) is high enough, then even without explicitly adding noise, the computer might not even find a good solution to Ax=bA\vv x = b! This is because the miniscule amount of noise introduced by floating point precision could get blown up by the ill-conditioning.

4Computing Bases of Fundamental Subspaces

Given an SVD for an m×nm\times n matrix ARm×nA\in\mathbb{R}^{m\times n} with rank(A)=r\text{rank}(A)=r, let u1,,ur\vv u_1,\ldots,\vv u_r be the left singular vectors, v1,,vr\vv v_1,\ldots,\vv v_r the right singular vectors, and σ1,,σr\sigma_1,\ldots,\sigma_r the singular values.

Recall that we showed that u1,,ur\vv u_1,\ldots,\vv u_r forms a basis for Col(A)\text{Col}(A). Let ur+1,,um\vv u_{r+1},\ldots,\vv u_m be an orthonormal basis for Col(A)\text{Col}(A)^\perp so that u1,,um\vv u_1,\ldots,\vv u_m form a basis for Rm\mathbb{R}^m, computed for example using the Gram-Schmidt Process. Then, by the Fundamental Theorem of Linear Algebra (FTLA), we have that Col(A)=Null(AT)=span{ur+1,,um}\text{Col}(A)^\perp = \text{Null}(A^T) = \text{span}\{\vv u_{r+1},\ldots,\vv u_m\}, i.e., these vectors form an orthonormal basis for Null(AT)=LNull(A)\text{Null}(A^T)=\text{LNull}(A).

Next, recall that v1,,vr,vr+1,,vn\vv v_1,\ldots,\vv v_r,\vv v_{r+1},\ldots,\vv v_n, the eigenvectors of ATAA^TA, form an orthonormal basis of Rn\mathbb{R}^n. Since Avi=0A\vv v_i = 0 for i=r+1,,ni = r+1, \ldots, n, we have that vr+1,,vn\vv v_{r+1},\ldots,\vv v_n span a subspace of Null(A)\text{Null}(A) of dimension nrn-r. But, by the FTLA, dim Null(A)=nrank(A)=nr\text{dim Null}(A) = n - \text{rank}(A) = n-r.

Therefore, vr+1,,vn\vv v_{r+1},\ldots,\vv v_n are an orthonormal basis for Null(A)\text{Null}(A).

Finally, Null(A)=Col(AT)=Row(A)\text{Null}(A)^\perp = \text{Col}(A^T) = \text{Row}(A). But Null(A)=span{v1,,vr}\text{Null}(A)^\perp = \text{span}\{\vv v_1,\ldots,\vv v_r\} since the vi\vv v_i are an orthonormal basis for Rn\mathbb{R}^n, and thus v1,,vr\vv v_1,\ldots,\vv v_r are an orthonormal basis for Row(A)\text{Row}(A).

Specializing these observations to square matrices, we have the following theorem characterizing invertible matrices:

5The Pseudoinverse of AA

Recall the least squares problem of finding a vector x\vv x that minimizes the objective Axb2\|A \vv x-\vv b\|^2. We saw that the least squares solution is given by the solution to the normal equations:

ATAx^=ATb(NE)A^TA\hat{\vv x} = A^T\vv b \quad \text{(NE)}

Let’s rewrite (NE) using the SVD A=UΣVTA = U\Sigma V^T, AT=VΣTUT=VΣUTA^T = V\Sigma^T U^T = V\Sigma U^T (Σ=ΣT\Sigma = \Sigma^T)

ATAx^=VΣUTUIΣVTx^=VΣ2VTx^(a)=VΣUT(b)bA^TA\hat{\vv x} = V\Sigma \underbrace{U^T U}_{I}\Sigma V^T\hat{\vv x} = \underbrace{V\Sigma^2 V^T\hat{\vv x}}_{(a)} = \underbrace{V\Sigma U^T}_{(b)}\vv b

Let’s start by left multiplying (a) and (b) by VTV^T to take advantage of VTV=IrV^TV = I_r:

VT(VΣ2VTx^)=VT(VΣUTb)Σ2VTx^=ΣUTb. V^T(V\Sigma^2V^T\hat{\vv x}) = V^T(V\Sigma U^T \vv b) \Rightarrow \Sigma^2V^T\hat{\vv x} = \Sigma U^T \vv b.

Now, let’s isolate VTx^V^T\hat{\vv x} by multiplying both sides by Σ2\Sigma^{-2}:

VTx^=Σ1UTb.V^T\hat{\vv x} = \Sigma^{-1}U^T\vv b.

Finally, note that x^\hat{\vv x} satisfies (7) if x^=VΣ1UTb+n\hat{\vv x} = V\Sigma^{-1}U^T\vv b + \vv n (again since VTV=IV^TV=I) for any nNull(VT)=Col(V)\vv n \in \text{Null}(V^T) = \text{Col}(V)^\perp. The special solution x^=VΣ1UTb\hat{\vv x}^* = V\Sigma^{-1}U^T\vv b can be shown to be the minimum norm least squares solution when several x^\hat{\vv x} exist such that Ax^=bA\hat{\vv x} = \vv b. The matrix

A+=VΣ1UTA^+ = V\Sigma^{-1}U^T

is called the pseudoinverse of AA, and is also known as the Moore-Penrose inverse of AA.

If we look at Ax^=AA+bA\hat{\vv x}^* = AA^+ \vv b, we observe that:

Ax^=UΣVTVIΣ1UTb=UUTb,A\hat{\vv x}^* = U\Sigma \underbrace{V^TV}_{I}\Sigma^{-1}U^Tb = UU^T\vv b,

i.e., Ax^A\hat{\vv x}^* is the orthogonal projection b^\hat{\vv b} of b\vv b onto Col(A)\text{Col}(A).

Binder