1 Reading ¶ Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4
2 Learning Objectives ¶ By the end of this page, you should know:
how to handle the optimization principle we saw for characterizing eigenvalues, but in the case of general matrices, such as nonsquare matrices what are the singular values of a matrix and how relates to the rank what is the singular value decompostion of a matrix and how to find such a decomposition 3 Warmup ¶ The diagonalization theorems we’ve seen for complete and symmetric matrices have played a role in many interesting applications. Unfortunately, not all matrices can be factored as A = P D P − 1 A = PDP^{-1} A = P D P − 1 for a diagonal matrix D D D ; for example such a factorization makes no sense if A A A is not square! Fortunately, a factorization A = P Δ Q T A = P \Delta Q^T A = P Δ Q T is possible for any matrix m × n m \times n m × n matrix A A A ! A special factorization of this type, called the singular value decomposition , is one of the most useful and widely applicable matrix factorizations in linear algebra.
The singular value decomposition is based on the following key property of matrix diagonalization which we’ll show can be captured in general rectangular matrices:
The absolute values of the eigenvalues of a symmetric matrix A A A measure the amounts that A A A stretches or shrinks certain vectors (the eigenvectors). If A x = λ x A \vv x = \lambda \vv x A x = λ x and ∥ x ∥ = 1 \|\vv x\| = 1 ∥ x ∥ = 1 , then
∥ A x ∥ = λ ∥ x ∥ = ∣ λ ∣ ∥ x ∥ = ∣ λ ∣ . \|A \vv x\| = \lambda \|\vv x\| = |\lambda| \|\vv x\| = |\lambda|. ∥ A x ∥ = λ ∥ x ∥ = ∣ λ ∣∥ x ∥ = ∣ λ ∣. If λ 1 \lambda_1 λ 1 is the eigenvalue with the greatest magnitude, i.e., if ∣ λ 1 ∣ ≥ ∣ λ i ∣ |\lambda_1| \geq |\lambda_i| ∣ λ 1 ∣ ≥ ∣ λ i ∣ for i = 1 , … , n i=1,\ldots,n i = 1 , … , n , then a corresponding unit eigenvector v 1 \vv v_1 v 1 identifies the direction in which stretching is greatest. That is, the length of A x A \vv x A x is maximized when x = v 1 \vv x = \vv v_1 x = v 1 , and ∥ A v 1 ∥ = ∣ λ 1 ∣ \|A\vv v_1\| = |\lambda_1| ∥ A v 1 ∥ = ∣ λ 1 ∣ .
The above description is reminiscent of the optimization principle we saw for characterizing eigenvalues of symmetric matrices, albeit with a focus on maximizing length ∥ A x ∥ \|A\vv x\| ∥ A x ∥ rather than the quadratic form x T A x \vv x^T A \vv x x T A x . What we’ll see next is that this description of v 1 \vv v_1 v 1 and ∣ λ 1 ∣ |\lambda_1| ∣ λ 1 ∣ has an analogue for rectangular matrices that will lead to the singular value decomposition.
Example 1 (Finding the maximum “stretch” of a linear map)
The matrix A = [ 4 11 14 8 7 − 2 ] A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix} A = [ 4 8 11 7 14 − 2 ] defines a linear map x ↦ A x \vv x \mapsto A \vv x x ↦ A x from R 3 \mathbb{R}^3 R 3 to R 2 \mathbb{R}^2 R 2 . If we consider the effects of this map on the unit sphere { x ∈ R 3 ∣ ∥ x ∥ = 1 } \{\vv x \in \mathbb{R}^3 \mid \|\vv x\| = 1\} { x ∈ R 3 ∣ ∥ x ∥ = 1 } , we observe that multiplication by A A A transforms this sphere in R 3 \mathbb{R}^3 R 3 into an ellipse in R 2 \mathbb{R}^2 R 2 :
Our task is to find a unit vector x \vv x x at which the length ∥ A x ∥ \|A\vv x\| ∥ A x ∥ is maximized, and compute this maximum length. That is, we want to solve the optimization problem:
Maximize ∥ A x ∥ Subject to ∥ x ∥ = 1 \begin{align*}
&\text{Maximize $\|A \vv x\|$} \\
&\text{Subject to $\|\vv x\| = 1$}
\end{align*} Maximize ∥ A x ∥ Subject to ∥ x ∥ = 1 Our first observation is that the quantity ∥ A x ∥ 2 \|A\vv x\|^2 ∥ A x ∥ 2 is maximized by the same ∥ x ∥ \|\vv x\| ∥ x ∥ that maximizes ∥ A x ∥ \|A\vv x\| ∥ A x ∥ , but that ∥ A x ∥ 2 \|A\vv x\|^2 ∥ A x ∥ 2 is easier to work with (since its a quadratic form instead of the square root of a quadratic form). Specifically, note that
∥ A x ∥ 2 = ⟨ A x , A x ⟩ = ( A x ) T ( A x ) = x T A T A x = x T ( A T A ) x . \|A\vv x\|^2 = \langle A \vv x, A\vv x \rangle = (A\vv x)^T(A\vv x) = \vv x^TA^TA\vv x = \vv x^T(A^TA)\vv x. ∥ A x ∥ 2 = ⟨ A x , A x ⟩ = ( A x ) T ( A x ) = x T A T A x = x T ( A T A ) x . So our task is to now find a unit vector ∥ x ∥ = 1 \|\vv x\|=1 ∥ x ∥ = 1 that maximizes the quadratic form x T ( A T A ) x \vv x^T(A^TA)\vv x x T ( A T A ) x defined by the symmetric (positive semidefinite) matrix A T A A^{T}A A T A : we know how to do this. By our theorem characterizing eigenvalues of symmetric matrices from an optimization perspective, we know the maximum value is the largest eigenvalue λ 1 \lambda_1 λ 1 of the matrix A T A A^TA A T A , and is attained at the unit eigenvector v 1 \vv v_1 v 1 of A T A A^TA A T A corresponding to λ 1 \lambda_1 λ 1 .
For the matrix in this example:
A T A = [ 4 8 11 7 14 − 2 ] [ 4 11 14 8 7 − 2 ] = [ 80 100 40 100 170 140 40 140 200 ] A^TA = \begin{bmatrix}
4 & 8 \\
11 & 7 \\
14 & -2
\end{bmatrix}
\begin{bmatrix}
4 & 11 & 14 \\
8 & 7 & -2
\end{bmatrix} =
\begin{bmatrix}
80 & 100 & 40 \\
100 & 170 & 140 \\
40 & 140 & 200
\end{bmatrix} A T A = ⎣ ⎡ 4 11 14 8 7 − 2 ⎦ ⎤ [ 4 8 11 7 14 − 2 ] = ⎣ ⎡ 80 100 40 100 170 140 40 140 200 ⎦ ⎤ and the eigenvalue/vector pairs are:
λ 1 = 360 , v 1 = [ 1 3 2 3 2 3 ] , λ 2 = 90 , v 2 = [ − 2 3 − 1 3 2 3 ] , λ 3 = 0 , v 3 = [ 2 3 − 2 3 1 3 ] . \begin{align*}
\lambda_1 = 360, \quad \vv v_1 &= \begin{bmatrix} \frac{1}{3} \\ \frac{2}{3} \\ \frac{2}{3} \end{bmatrix}, \quad
\lambda_2 = 90, \quad \vv v_2 &= \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \\ \frac{2}{3} \end{bmatrix}, \quad
\lambda_3 = 0, \quad \vv v_3 &= \begin{bmatrix} \frac{2}{3} \\ -\frac{2}{3} \\ \frac{1}{3} \end{bmatrix}.
\end{align*} λ 1 = 360 , v 1 = ⎣ ⎡ 3 1 3 2 3 2 ⎦ ⎤ , λ 2 = 90 , v 2 = ⎣ ⎡ − 3 2 − 3 1 3 2 ⎦ ⎤ , λ 3 = 0 , v 3 = ⎣ ⎡ 3 2 − 3 2 3 1 ⎦ ⎤ . The maximum value of x T ( A T A ) x = ∥ A x ∥ 2 \vv x^T(A^TA)\vv x = \|A\vv x\|^2 x T ( A T A ) x = ∥ A x ∥ 2 is thus λ 1 = 360 \lambda_1 = 360 λ 1 = 360 , and attained when x = v 1 \vv x = \vv v_1 x = v 1 . The vector A v 1 A\vv v_1 A v 1 is a point on the ellipse in Figure 1 farthest from the origin, namely
A v 1 = [ 4 11 14 8 7 − 2 ] [ 1 3 2 3 2 3 ] = [ 18 6 ] . A\vv v_1 = \begin{bmatrix}
4 & 11 & 14 \\
8 & 7 & -2
\end{bmatrix}
\begin{bmatrix} \frac{1}{3} \\ \frac{2}{3} \\ \frac{2}{3} \end{bmatrix} =
\begin{bmatrix}
18 \\
6
\end{bmatrix}. A v 1 = [ 4 8 11 7 14 − 2 ] ⎣ ⎡ 3 1 3 2 3 2 ⎦ ⎤ = [ 18 6 ] . For ∥ x ∥ = 1 \|\vv x\| = 1 ∥ x ∥ = 1 , the maximum value of ∥ A x ∥ \|A\vv x\| ∥ A x ∥ is ∥ A v 1 ∥ = 360 = 6 10 \|A\vv v_1\| = \sqrt{360} = 6\sqrt{10} ∥ A v 1 ∥ = 360 = 6 10 .
This example suggests that the effect of a matrix A A A on the unit sphere in R 3 \mathbb{R}^3 R 3 is related to the quadratic form x T A T A x \vv x^T A^TA \vv x x T A T A x . What we’ll see next is that the entire geometric behavior of the map x ↦ A x \vv x \mapsto A \vv x x ↦ A x is captured by this quadratic form.
4 The Singular Values of an m × n m \times n m × n Matrix ¶ Consider an m × n m \times n m × n real matrix A ∈ R m × n A \in \mathbb{R}^{m \times n} A ∈ R m × n . Then, A T A A^TA A T A is an n × n n \times n n × n symmetric matrix, and can be orthogonally diagonalized via the spectral factorization , i.e., we can write A ⊤ A = Q Λ Q ⊤ A^\top A = Q \Lambda Q^\top A ⊤ A = Q Λ Q ⊤ , where Q Q Q is an orthogonal matrix composed of orthonormal eigenvectors of A ⊤ A A^\top A A ⊤ A , and Λ is a diagonal matrix containing the eigenvalues of A ⊤ A A^\top A A ⊤ A .
We can write Q = [ v 1 ⋯ v n ] Q = \bm \vv v_1 & \cdots & \vv v_n\em Q = [ v 1 ⋯ v n ] , where v 1 \vv{v_1} v 1 are unit eigenvectors of A ⊤ A A^{\top} A A ⊤ A , and Λ = diag ( λ 1 , … , λ n ) \Lambda = \text{diag}(\lambda_1, \dots, \lambda_n) Λ = diag ( λ 1 , … , λ n ) , where λ i \lambda_i λ i is the eigenvalue corresponding to v i \vv{v_i} v i . Then, for i = 1 , … , n i = 1, \dots, n i = 1 , … , n :
∥ A v i ∥ 2 = ( A v i ) ⊤ ( A v i ) = v i ⊤ ( A ⊤ A v i ) = v i ⊤ ( λ i v i ) = λ i v i ⊤ v i = λ i ∥ v i ∥ 2 = λ i (since v i has unit norm) . \begin{align*}
\|A \vv v_i\|^2 &= (A \vv v_i)^\top(A \vv v_i) = \vv v_i^\top(A^\top A \vv v_i) \\
&= \vv v_i^\top(\lambda_i \vv v_i) \\
&= \lambda_i \vv v_i^\top \vv v_i = \lambda_i \|\vv v_i\|^2 \\
&= \lambda_i \quad\text{(since $\vv{v_i}$ has unit norm)}.
\end{align*} ∥ A v i ∥ 2 = ( A v i ) ⊤ ( A v i ) = v i ⊤ ( A ⊤ A v i ) = v i ⊤ ( λ i v i ) = λ i v i ⊤ v i = λ i ∥ v i ∥ 2 = λ i (since v i has unit norm) . This tells us that each eigenvalue λ i \lambda_i λ i of A ⊤ A A^\top A A ⊤ A can be written as the squared norm of A v i A \vv{v_i} A v i . In particular, this tells us that A ⊤ A A^\top A A ⊤ A has nonnegative eigenvalues, i.e., is positive semidefinite. We will use these eigenvalues to define the singular values of A A A :
Definition 1 (The Singular Values of a Matrix)
Given a matrix A ∈ R m × n A \in \mathbb{R}^{m\times n} A ∈ R m × n , let λ 1 ≥ λ 2 ≥ ⋯ ≥ λ n \lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n λ 1 ≥ λ 2 ≥ ⋯ ≥ λ n be the eigenvalues of A ⊤ A A^\top A A ⊤ A in nonincreasing order, counting multiplicities.
The singular values of A A A are the positive square roots of the nonzero eigenvalues λ i > 0 \lambda_i > 0 λ i > 0 of A ⊤ A A^\top A A ⊤ A , and are denoted σ i \sigma_i σ i .
In other words, if A ⊤ A A^\top A A ⊤ A has rank r r r (or equivalently, A A A has rank r r r ), then A A A has r r r nonzero eigenvalues λ 1 ≥ ⋯ ≥ λ r > 0 \lambda_1 \geq \dots \geq \lambda_r > 0 λ 1 ≥ ⋯ ≥ λ r > 0 , counting multiplicities. Then, A A A has r r r singular values, defined as σ i = λ i \sigma_i = \sqrt{\lambda_i} σ i = λ i for i = 1 , … , r i = 1, \dots, r i = 1 , … , r .
Some texts include the zero eigenvalues λ r + 1 , … , λ n \lambda_{r+1}, \ldots, \lambda_n λ r + 1 , … , λ n of A T A A^TA A T A as singular values of A A A . This is simply a different convention and is mathematically equivalent. However, we find our definition to be more natural for our purposes.
Whereas eigenvalues are only defined for square matrices, singular values are defined for any matrices. Soon, we will see how the singular values of a matrix can be used to compute a useful factorization of any matrix, known as the singular value decomposition.
This definition of singular values allows us to give an alternative description of rank ( A ) \text{rank}(A) rank ( A ) : the rank of A A A is the number of its singular values.
In certain cases, the rank of A A A may be very sensitive to small changes in the entries of A A A . The obvious approach of counting the # of pivot columns in A A A does not work well if A A A is row reduced by a computer, as roundoff errors often create a row echelon form with full rank. In practice, the most reliable way of computing the rank of a large matrix A A A is to count the number of singular values larger than a small threshold ε (typically on the order of 10-12 , but can vary depending on applications). In this case, singular values smaller than ε are treated as zeros for all practical purposes, and the effective rank of A A A is computed by counting the remaining nonzero singular values.
5 A Variational Definition of the Singular Values ¶ The above definition was more of a “direct” definition of singular values. In the next example, we will see an analog to the variational characterization of eigenvalues .
Example 2 (An analog to the variational characterization of eigenvalues)
Using the same A = [ 4 11 14 8 7 − 2 ] A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix} A = [ 4 8 11 7 14 − 2 ] as the previous example,
we have σ 1 = 360 = 6 10 \sigma_1 = \sqrt{360} = 6\sqrt{10} σ 1 = 360 = 6 10 , σ 2 = 90 = 3 10 \sigma_2 = \sqrt{90} = 3\sqrt{10} σ 2 = 90 = 3 10 . In this case, A A A only has two singular values as λ 3 = 0 \lambda_3 = 0 λ 3 = 0 . For this example, r = 2 r = 2 r = 2 , and λ 1 = 360 > λ 2 = 90 > λ 3 = 0 \lambda_1 = 360 > \lambda_2 = 90 > \lambda_3 = 0 λ 1 = 360 > λ 2 = 90 > λ 3 = 0 .
From the previous example, the first singular value of A A A is the maximum of ∥ A x ∥ \|A\vv x\| ∥ A x ∥ over all ∥ x ∥ = 1 \|\vv x\| = 1 ∥ x ∥ = 1 , attained at v 1 \vv v_1 v 1 . Our optimization based characterization of eigenvalues of symmetric matrices tells us that the second singular value of A A A is the maximum of ∥ A x ∥ \|A\vv x\| ∥ A x ∥ over all unit vectors orthogonal to v 1 \vv v_1 v 1 : this is attained by u 2 \vv u_2 u 2 , the second eigenvector of A T A A^TA A T A . For u 2 \vv u_2 u 2 from the previous example:
A u 2 = [ 4 11 14 8 7 − 2 ] [ − 2 3 − 1 3 2 3 ] = [ 3 − 9 ] A \vv u_2 = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix} \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \\ \frac{2}{3} \end{bmatrix} = \begin{bmatrix} 3 \\ -9 \end{bmatrix} A u 2 = [ 4 8 11 7 14 − 2 ] ⎣ ⎡ − 3 2 − 3 1 3 2 ⎦ ⎤ = [ 3 − 9 ] This point is on the minor axis of the ellipse in Figure 1 , just as A v 1 A \vv v_1 A v 1 is on the major axis (see Figure 2 below). The two singular values of A A A are the lengths of the major and mini semiaxes of thes ellipse.
In the above example, the fact that A v 1 A\vv v_1 A v 1 and A v 2 A\vv v_2 A v 2 are orthogonal was no accident. In the next section, we’ll explore this in more detail.
6 Computing the SVD ¶ The decomposition of A A A involves an r × r r \times r r × r diagonal matrix Σ of the form
Σ = diag ( σ 1 , … , σ r ) . \Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r). Σ = diag ( σ 1 , … , σ r ) . We note that because r = dim Col ( A ) = dim Row ( A ) r = \text{dim Col}(A) = \text{dim Row}(A) r = dim Col ( A ) = dim Row ( A ) by the Fundamental Theorem of Linear Algebra, we must have that r ≤ min { m , n } r \leq \min\{m,n\} r ≤ min { m , n } if A ∈ R m × n A \in \mathbb{R}^{m \times n} A ∈ R m × n .
Let A ∈ R m × n A \in \mathbb{R}^{m \times n} A ∈ R m × n be an m × n m \times n m × n matrix of rank r > 0 r > 0 r > 0 . Then A A A can be factored as
A = U Σ V T ,
A = U \Sigma V^T, A = U Σ V T , where U ∈ R m × r U \in \mathbb{R}^{m \times r} U ∈ R m × r has orthonormal columns, so U T U = I r U^TU = I_r U T U = I r , Σ = diag ( σ 1 , … , σ r ) \Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r) Σ = diag ( σ 1 , … , σ r ) is a diagonal matrix with the singular values of A A A σ i \sigma_i σ i along the diagonal, and V ∈ R n × r V \in \mathbb{R}^{n \times r} V ∈ R n × r has orthonormal columns, so V T V = I r V^TV = I_r V T V = I r .
Such a factorization of A A A is called its singular value decomposition , and the columns of U U U are called the left singular vectors of A A A , while the columns of V V V are called the right singular vectors of A A A .
Let λ i \lambda_i λ i and v i \vv v_i v i be the eigenvalues/vectors of A T A A^TA A T A as described previously, so that A v 1 , … , A v r A\vv v_1, \ldots, A\vv v_r A v 1 , … , A v r is an orthogonal basis for col( A ) (A) ( A ) . Normalize each A v i A\vv v_i A v i to form an orthonormal basis for col( A ) (A) ( A ) :
u i : = 1 ∥ A v i ∥ A v i = 1 σ i A v i \vv u_i := \frac{1}{\|A\vv v_i\|} A\vv v_i = \frac{1}{\sigma_i} A\vv v_i u i := ∥ A v i ∥ 1 A v i = σ i 1 A v i and hence A v i = σ i u i A\vv v_i = \sigma_i \vv u_i A v i = σ i u i for i = 1 , … , r i=1,\ldots,r i = 1 , … , r . Define the matrices
U = [ u 1 ⋯ u r ] ∈ R m × r and V = [ v 1 ⋯ v r ] ∈ R n × r U = \bm \vv u_1 & \cdots & \vv u_r\em \in \mathbb{R}^{m \times r} \quad \text{and} \quad V = \bm \vv v_1 & \cdots & \vv v_r\em \in \mathbb{R}^{n \times r} U = [ u 1 ⋯ u r ] ∈ R m × r and V = [ v 1 ⋯ v r ] ∈ R n × r By construction, the columns of U U U are orthonormal: U T U = I r U^TU = I_r U T U = I r , and similarly for the columns of V V V : V T V = I r V^TV = I_r V T V = I r .
Let’s define the following “full” matrices:
U ^ = [ U U ⊥ ] ∈ R m × m and V ^ = [ V V ⊥ ] ∈ R n × n \hat{U} = \bm U & U^\perp \em \in \mathbb{R}^{m \times m} \quad \text{and} \quad \hat{V} = \bm V & V^\perp\em \in \mathbb{R}^{n \times n} U ^ = [ U U ⊥ ] ∈ R m × m and V ^ = [ V V ⊥ ] ∈ R n × n Here, V ⊥ = [ v r + 1 ⋯ v n ] V^\perp = \bm \vv v_{r+1} & \cdots & \vv v_n\em V ⊥ = [ v r + 1 ⋯ v n ] has orthonormal columns spanning the orthogonal complement of span{ v 1 , … , v r } \{\vv v_1, \ldots, \vv v_r\} { v 1 , … , v r } , so that the columns of V ^ \hat{V} V ^ form an orthonormal basis of R n \mathbb{R}^n R n .
Similarly, let U ⊥ U^\perp U ⊥ have orthonormal columns spanning the orthogonal complement of span{ u 1 , … , u r } \{\vv u_1, \ldots, \vv u_r\} { u 1 , … , u r } , so the columns of U ^ \hat{U} U ^ form an orthonormal basis for R m \mathbb{R}^m R m .
Finally, define
Σ ^ = [ Σ ⏞ d 0 ⏞ n − r } r 0 0 } m − r ] \begin{align*}
\hat{\Sigma} &= \begin{bmatrix} \overbrace{\Sigma}^d & \quad \overbrace{0}^{n-r}\} r \\ 0 & \quad \quad \quad 0 \} m-r \end{bmatrix}
\end{align*} Σ ^ = ⎣ ⎡ Σ d 0 0 n − r } r 0 } m − r ⎦ ⎤ .
We first show that
A = U ^ Σ ^ V ^ T , or equivalently (since V ^ is orthogonal), A V ^ = U ^ Σ ^ . A = \hat{U} \hat{\Sigma} \hat{V}^T, \quad \text{or equivalently (since $\hat{V}$ is orthogonal),} \quad A\hat{V} = \hat{U}\hat{\Sigma}. A = U ^ Σ ^ V ^ T , or equivalently (since V ^ is orthogonal), A V ^ = U ^ Σ ^ . First,
A V ^ = [ A v 1 ⋯ A v r A v r + 1 ⋯ A v n ] = [ σ 1 u 1 ⋯ σ r u r 0 ⋯ 0 ] .
A\hat{V} = \bm A\vv v_1 & \cdots & A\vv v_r & A\vv v_{r+1} & \cdots & A\vv v_n\em = \bm \sigma_1 \vv u_1 & \cdots & \sigma_r\vv u_r & \vv 0 & \cdots & \vv 0\em. A V ^ = [ A v 1 ⋯ A v r A v r + 1 ⋯ A v n ] = [ σ 1 u 1 ⋯ σ r u r 0 ⋯ 0 ] . Then, notice:
U ^ Σ ^ = [ u 1 ⋯ u r u r + 1 ⋯ u m ] [ σ 1 0 0 ⋯ 0 ⋱ ⋮ 0 σ r 0 ⋯ 0 0 ⋯ 0 0 ⋯ 0 ⋮ ⋮ ⋮ 0 ⋯ 0 0 ⋯ 0 ] = [ σ 1 u 1 ⋯ σ r u r 0 ⋯ 0 ] . \hat{U} \hat{\Sigma} = \bm \vv u_1 \cdots \vv u_r & \vv u_{r+1} & \cdots & \vv u_m\em
\begin{bmatrix}
\sigma_1 & & 0 & 0 \cdots 0 \\
& \ddots & & \vdots \\
0 & & \sigma_r & 0 \cdots 0 \\
0 & \cdots & 0 & 0 \cdots 0 \\
\vdots & & \vdots & \vdots \\
0 & \cdots & 0 & 0 \cdots 0
\end{bmatrix}
= \bm \sigma_1 \vv u_1 & \cdots & \sigma_r \vv u_r & \vv 0 \cdots \vv 0\em. U ^ Σ ^ = [ u 1 ⋯ u r u r + 1 ⋯ u m ] ⎣ ⎡ σ 1 0 0 ⋮ 0 ⋱ ⋯ ⋯ 0 σ r 0 ⋮ 0 0 ⋯ 0 ⋮ 0 ⋯ 0 0 ⋯ 0 ⋮ 0 ⋯ 0 ⎦ ⎤ = [ σ 1 u 1 ⋯ σ r u r 0 ⋯ 0 ] . So that A V ^ = U ^ Σ ^ A\hat{V} = \hat{U}\hat{\Sigma} A V ^ = U ^ Σ ^ , or equivalently, A = U ^ Σ ^ V ^ T A = \hat{U}\hat{\Sigma}\hat{V}^T A = U ^ Σ ^ V ^ T . But, now, notice:
A = U ^ Σ ^ V ^ T = [ U U ⊥ ] [ Σ 0 0 0 ] [ V T ( V ⊥ ) T ] = U Σ V T , A = \hat{U} \hat{\Sigma} \hat{V}^T = \bm U & U^\perp\em \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V^T \\ (V^\perp)^T \end{bmatrix} = U \Sigma V^T, A = U ^ Σ ^ V ^ T = [ U U ⊥ ] [ Σ 0 0 0 ] [ V T ( V ⊥ ) T ] = U Σ V T , proving our result.
Some textbooks define the singular value decomposition of A A A as A = U ^ Σ ^ V ^ T A = \hat{U} \hat{\Sigma} \hat{V}^T A = U ^ Σ ^ V ^ T --- this is necessary when allowing for singular values equal to zero. When only considering nonzero singular values, as we do, A = U Σ V T A = U\Sigma V^T A = U Σ V T is the appropriate definition. This is sometimes called the compact SVD of A A A , but we will just call it the SVD.
6.1 Python Break! ¶ In Python, we can find the SVD of a matrix using the numpy.linalg.svd
function:
import numpy as np
A = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
U, S, Vt = np.linalg.svd(A)
# U: Left singular vectors
print("U matrix:")
print(U)
# S: Singular values (returned as a 1D array)
print("\nSingular values:")
print(S)
# Vt: Right singular vectors (transposed)
print("\nV^T matrix:")
print(Vt)
U matrix:
[[-0.21483724 0.88723069 0.40824829]
[-0.52058739 0.24964395 -0.81649658]
[-0.82633754 -0.38794278 0.40824829]]
Singular values:
[1.68481034e+01 1.06836951e+00 3.33475287e-16]
V^T matrix:
[[-0.47967118 -0.57236779 -0.66506441]
[-0.77669099 -0.07568647 0.62531805]
[-0.40824829 0.81649658 -0.40824829]]