Skip to article frontmatterSkip to article content

10.1 The Singular Value Decomposition

Dept. of Electrical and Systems Engineering
University of Pennsylvania

Binder

Lecture notes

1Reading

Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4

2Learning Objectives

By the end of this page, you should know:

  • how to handle the optimization principle we saw for characterizing eigenvalues, but in the case of general matrices, such as nonsquare matrices
  • what are the singular values of a matrix and how relates to the rank
  • what is the singular value decompostion of a matrix and how to find such a decomposition

3Warmup

The diagonalization theorems we’ve seen for complete and symmetric matrices have played a role in many interesting applications. Unfortunately, not all matrices can be factored as A=PDP1A = PDP^{-1} for a diagonal matrix DD; for example such a factorization makes no sense if AA is not square! Fortunately, a factorization A=PΔQTA = P \Delta Q^T is possible for any matrix m×nm \times n matrix AA! A special factorization of this type, called the singular value decomposition, is one of the most useful and widely applicable matrix factorizations in linear algebra.

The singular value decomposition is based on the following key property of matrix diagonalization which we’ll show can be captured in general rectangular matrices:

The above description is reminiscent of the optimization principle we saw for characterizing eigenvalues of symmetric matrices, albeit with a focus on maximizing length Ax\|A\vv x\| rather than the quadratic form xTAx\vv x^T A \vv x. What we’ll see next is that this description of v1\vv v_1 and λ1|\lambda_1| has an analogue for rectangular matrices that will lead to the singular value decomposition.

4The Singular Values of an m×nm \times n Matrix

Consider an m×nm \times n real matrix ARm×nA \in \mathbb{R}^{m \times n}. Then, ATAA^TA is an n×nn \times n symmetric matrix, and can be orthogonally diagonalized via the spectral factorization, i.e., we can write AA=QΛQA^\top A = Q \Lambda Q^\top, where QQ is an orthogonal matrix composed of orthonormal eigenvectors of AAA^\top A, and Λ is a diagonal matrix containing the eigenvalues of AAA^\top A.

We can write Q=[v1vn]Q = \bm \vv v_1 & \cdots & \vv v_n\em, where v1\vv{v_1} are unit eigenvectors of AAA^{\top} A, and Λ=diag(λ1,,λn)\Lambda = \text{diag}(\lambda_1, \dots, \lambda_n), where λi\lambda_i is the eigenvalue corresponding to vi\vv{v_i}. Then, for i=1,,ni = 1, \dots, n:

Avi2=(Avi)(Avi)=vi(AAvi)=vi(λivi)=λivivi=λivi2=λi(since vi has unit norm).\begin{align*} \|A \vv v_i\|^2 &= (A \vv v_i)^\top(A \vv v_i) = \vv v_i^\top(A^\top A \vv v_i) \\ &= \vv v_i^\top(\lambda_i \vv v_i) \\ &= \lambda_i \vv v_i^\top \vv v_i = \lambda_i \|\vv v_i\|^2 \\ &= \lambda_i \quad\text{(since $\vv{v_i}$ has unit norm)}. \end{align*}

This tells us that each eigenvalue λi\lambda_i of AAA^\top A can be written as the squared norm of AviA \vv{v_i}. In particular, this tells us that AAA^\top A has nonnegative eigenvalues, i.e., is positive semidefinite. We will use these eigenvalues to define the singular values of AA:

Whereas eigenvalues are only defined for square matrices, singular values are defined for any matrices. Soon, we will see how the singular values of a matrix can be used to compute a useful factorization of any matrix, known as the singular value decomposition.

This definition of singular values allows us to give an alternative description of rank(A)\text{rank}(A): the rank of AA is the number of its singular values.

5A Variational Definition of the Singular Values

The above definition was more of a “direct” definition of singular values. In the next example, we will see an analog to the variational characterization of eigenvalues.

In the above example, the fact that Av1A\vv v_1 and Av2A\vv v_2 are orthogonal was no accident. In the next section, we’ll explore this in more detail.

6Computing the SVD

The decomposition of AA involves an r×rr \times r diagonal matrix Σ of the form

Σ=diag(σ1,,σr).\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r).

We note that because r=dim Col(A)=dim Row(A)r = \text{dim Col}(A) = \text{dim Row}(A) by the Fundamental Theorem of Linear Algebra, we must have that rmin{m,n}r \leq \min\{m,n\} if ARm×nA \in \mathbb{R}^{m \times n}.

Such a factorization of AA is called its singular value decomposition, and the columns of UU are called the left singular vectors of AA, while the columns of VV are called the right singular vectors of AA.

Proof 2 (Proof of Theorem 2)

Let λi\lambda_i and vi\vv v_i be the eigenvalues/vectors of ATAA^TA as described previously, so that Av1,,AvrA\vv v_1, \ldots, A\vv v_r is an orthogonal basis for col(A)(A). Normalize each AviA\vv v_i to form an orthonormal basis for col(A)(A):

ui:=1AviAvi=1σiAvi\vv u_i := \frac{1}{\|A\vv v_i\|} A\vv v_i = \frac{1}{\sigma_i} A\vv v_i

and hence Avi=σiuiA\vv v_i = \sigma_i \vv u_i for i=1,,ri=1,\ldots,r. Define the matrices

U=[u1ur]Rm×randV=[v1vr]Rn×rU = \bm \vv u_1 & \cdots & \vv u_r\em \in \mathbb{R}^{m \times r} \quad \text{and} \quad V = \bm \vv v_1 & \cdots & \vv v_r\em \in \mathbb{R}^{n \times r}

By construction, the columns of UU are orthonormal: UTU=IrU^TU = I_r, and similarly for the columns of VV: VTV=IrV^TV = I_r.

Let’s define the following “full” matrices:

U^=[UU]Rm×mandV^=[VV]Rn×n\hat{U} = \bm U & U^\perp \em \in \mathbb{R}^{m \times m} \quad \text{and} \quad \hat{V} = \bm V & V^\perp\em \in \mathbb{R}^{n \times n}

Here, V=[vr+1vn]V^\perp = \bm \vv v_{r+1} & \cdots & \vv v_n\em has orthonormal columns spanning the orthogonal complement of span{v1,,vr}\{\vv v_1, \ldots, \vv v_r\}, so that the columns of V^\hat{V} form an orthonormal basis of Rn\mathbb{R}^n.

Similarly, let UU^\perp have orthonormal columns spanning the orthogonal complement of span{u1,,ur}\{\vv u_1, \ldots, \vv u_r\}, so the columns of U^\hat{U} form an orthonormal basis for Rm\mathbb{R}^m.

Finally, define

Σ^=[Σd0nr}r00}mr]\begin{align*} \hat{\Sigma} &= \begin{bmatrix} \overbrace{\Sigma}^d & \quad \overbrace{0}^{n-r}\} r \\ 0 & \quad \quad \quad 0 \} m-r \end{bmatrix} \end{align*}

. We first show that

A=U^Σ^V^T,or equivalently (since V^ is orthogonal),AV^=U^Σ^.A = \hat{U} \hat{\Sigma} \hat{V}^T, \quad \text{or equivalently (since $\hat{V}$ is orthogonal),} \quad A\hat{V} = \hat{U}\hat{\Sigma}.

First,

AV^=[Av1AvrAvr+1Avn]=[σ1u1σrur00]. A\hat{V} = \bm A\vv v_1 & \cdots & A\vv v_r & A\vv v_{r+1} & \cdots & A\vv v_n\em = \bm \sigma_1 \vv u_1 & \cdots & \sigma_r\vv u_r & \vv 0 & \cdots & \vv 0\em.

Then, notice:

U^Σ^=[u1urur+1um][σ10000σr0000000000]=[σ1u1σrur00].\hat{U} \hat{\Sigma} = \bm \vv u_1 \cdots \vv u_r & \vv u_{r+1} & \cdots & \vv u_m\em \begin{bmatrix} \sigma_1 & & 0 & 0 \cdots 0 \\ & \ddots & & \vdots \\ 0 & & \sigma_r & 0 \cdots 0 \\ 0 & \cdots & 0 & 0 \cdots 0 \\ \vdots & & \vdots & \vdots \\ 0 & \cdots & 0 & 0 \cdots 0 \end{bmatrix} = \bm \sigma_1 \vv u_1 & \cdots & \sigma_r \vv u_r & \vv 0 \cdots \vv 0\em.

So that AV^=U^Σ^A\hat{V} = \hat{U}\hat{\Sigma}, or equivalently, A=U^Σ^V^TA = \hat{U}\hat{\Sigma}\hat{V}^T. But, now, notice:

A=U^Σ^V^T=[UU][Σ000][VT(V)T]=UΣVT,A = \hat{U} \hat{\Sigma} \hat{V}^T = \bm U & U^\perp\em \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V^T \\ (V^\perp)^T \end{bmatrix} = U \Sigma V^T,

proving our result.

6.1Python Break!

In Python, we can find the SVD of a matrix using the numpy.linalg.svd function:

import numpy as np

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

U, S, Vt = np.linalg.svd(A)

# U: Left singular vectors
print("U matrix:")
print(U)

# S: Singular values (returned as a 1D array)
print("\nSingular values:")
print(S)

# Vt: Right singular vectors (transposed)
print("\nV^T matrix:")
print(Vt)
U matrix:
[[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]

Singular values:
[1.68481034e+01 1.06836951e+00 3.33475287e-16]

V^T matrix:
[[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

Binder