10.1 The Singular Value Decomposition

1Reading¶

Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4

2Learning Objectives¶

By the end of this page, you should know:

how to handle the optimization principle we saw for characterizing eigenvalues, but in the case of general matrices, such as nonsquare matrices
what are the singular values of a matrix and how relates to the rank
what is the singular value decompostion of a matrix and how to find such a decomposition

3Warmup¶

The diagonalization theorems we’ve seen for complete and symmetric matrices have played a role in many interesting applications. Unfortunately, not all matrices can be factored as $A = PDP^{-1}$ for a diagonal matrix $D$ ; for example such a factorization makes no sense if $A$ is not square! Fortunately, a factorization $A = P \Delta Q^T$ is possible for any matrix $m \times n$ matrix $A$ ! A special factorization of this type, called the singular value decomposition, is one of the most useful and widely applicable matrix factorizations in linear algebra.

The singular value decomposition is based on the following key property of matrix diagonalization which we’ll show can be captured in general rectangular matrices:

Key Observation

The absolute values of the eigenvalues of a symmetric matrix $A$ measure the amounts that $A$ stretches or shrinks certain vectors (the eigenvectors). If $A \vv x = \lambda \vv x$ and $\|\vv x\| = 1$ , then

\|A \vv x\| = \lambda \|\vv x\| = |\lambda| \|\vv x\| = |\lambda|.

(1)

The above description is reminiscent of the optimization principle we saw for characterizing eigenvalues of symmetric matrices, albeit with a focus on maximizing length $\|A\vv x\|$ rather than the quadratic form $\vv x^T A \vv x$ . What we’ll see next is that this description of $\vv v_1$ and $|\lambda_1|$ has an analogue for rectangular matrices that will lead to the singular value decomposition.

Example 1 (Finding the maximum “stretch” of a linear map)

The matrix $A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix}$ defines a linear map $\vv x \mapsto A \vv x$ from $\mathbb{R}^3$ to $\mathbb{R}^2$ . If we consider the effects of this map on the unit sphere $\{\vv x \in \mathbb{R}^3 \mid \|\vv x\| = 1\}$ , we observe that multiplication by $A$ transforms this sphere in $\mathbb{R}^3$ into an ellipse in $\mathbb{R}^2$ :

Our task is to find a unit vector $\vv x$ at which the length $\|A\vv x\|$ is maximized, and compute this maximum length. That is, we want to solve the optimization problem:

\begin{align*} &\text{Maximize $\|A \vv x\|$} \\ &\text{Subject to $\|\vv x\| = 1$} \end{align*}

(2)

\|A\vv x\|^2 = \langle A \vv x, A\vv x \rangle = (A\vv x)^T(A\vv x) = \vv x^TA^TA\vv x = \vv x^T(A^TA)\vv x.

(3)

So our task is to now find a unit vector $\|\vv x\|=1$ that maximizes the quadratic form $\vv x^T(A^TA)\vv x$ defined by the symmetric (positive semidefinite) matrix $A^{T}A$ : we know how to do this. By our theorem characterizing eigenvalues of symmetric matrices from an optimization perspective, we know the maximum value is the largest eigenvalue $\lambda_1$ of the matrix $A^TA$ , and is attained at the unit eigenvector $\vv v_1$ of $A^TA$ corresponding to $\lambda_1$ .

For the matrix in this example:

A^TA = \begin{bmatrix} 4 & 8 \\ 11 & 7 \\ 14 & -2 \end{bmatrix} \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix} = \begin{bmatrix} 80 & 100 & 40 \\ 100 & 170 & 140 \\ 40 & 140 & 200 \end{bmatrix}

(4)

and the eigenvalue/vector pairs are:

\begin{align*} \lambda_1 = 360, \quad \vv v_1 &= \begin{bmatrix} \frac{1}{3} \\ \frac{2}{3} \\ \frac{2}{3} \end{bmatrix}, \quad \lambda_2 = 90, \quad \vv v_2 &= \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \\ \frac{2}{3} \end{bmatrix}, \quad \lambda_3 = 0, \quad \vv v_3 &= \begin{bmatrix} \frac{2}{3} \\ -\frac{2}{3} \\ \frac{1}{3} \end{bmatrix}. \end{align*}

(5)

The maximum value of $\vv x^T(A^TA)\vv x = \|A\vv x\|^2$ is thus $\lambda_1 = 360$ , and attained when $\vv x = \vv v_1$ . The vector $A\vv v_1$ is a point on the ellipse in Figure 1 farthest from the origin, namely

A\vv v_1 = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix} \begin{bmatrix} \frac{1}{3} \\ \frac{2}{3} \\ \frac{2}{3} \end{bmatrix} = \begin{bmatrix} 18 \\ 6 \end{bmatrix}.

(6)

For $\|\vv x\| = 1$ , the maximum value of $\|A\vv x\|$ is $\|A\vv v_1\| = \sqrt{360} = 6\sqrt{10}$ .

This example suggests that the effect of a matrix $A$ on the unit sphere in $\mathbb{R}^3$ is related to the quadratic form $\vv x^T A^TA \vv x$ . What we’ll see next is that the entire geometric behavior of the map $\vv x \mapsto A \vv x$ is captured by this quadratic form.

4The Singular Values of an $m \times n$ Matrix¶

Consider an $m \times n$ real matrix $A \in \mathbb{R}^{m \times n}$ . Then, $A^TA$ is an $n \times n$ symmetric matrix, and can be orthogonally diagonalized via the spectral factorization, i.e., we can write $A^\top A = Q \Lambda Q^\top$ , where $Q$ is an orthogonal matrix composed of orthonormal eigenvectors of $A^\top A$ , and Λ is a diagonal matrix containing the eigenvalues of $A^\top A$ .

We can write $Q = \bm \vv v_1 & \cdots & \vv v_n\em$ , where $\vv{v_1}$ are unit eigenvectors of $A^{\top} A$ , and $\Lambda = \text{diag}(\lambda_1, \dots, \lambda_n)$ , where $\lambda_i$ is the eigenvalue corresponding to $\vv{v_i}$ . Then, for $i = 1, \dots, n$ :

\begin{align*} \|A \vv v_i\|^2 &= (A \vv v_i)^\top(A \vv v_i) = \vv v_i^\top(A^\top A \vv v_i) \\ &= \vv v_i^\top(\lambda_i \vv v_i) \\ &= \lambda_i \vv v_i^\top \vv v_i = \lambda_i \|\vv v_i\|^2 \\ &= \lambda_i \quad\text{(since $\vv{v_i}$ has unit norm)}. \end{align*}

(7)

This tells us that each eigenvalue $\lambda_i$ of $A^\top A$ can be written as the squared norm of $A \vv{v_i}$ . In particular, this tells us that $A^\top A$ has nonnegative eigenvalues, i.e., is positive semidefinite. We will use these eigenvalues to define the singular values of $A$ :

Definition 1 (The Singular Values of a Matrix)

Given a matrix $A \in \mathbb{R}^{m\times n}$ , let $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n$ be the eigenvalues of $A^\top A$ in nonincreasing order, counting multiplicities.

The singular values of $A$ are the positive square roots of the nonzero eigenvalues $\lambda_i > 0$ of $A^\top A$ , and are denoted $\sigma_i$ .

In other words, if $A^\top A$ has rank $r$ (or equivalently, $A$ has rank $r$ ), then $A$ has $r$ nonzero eigenvalues $\lambda_1 \geq \dots \geq \lambda_r > 0$ , counting multiplicities. Then, $A$ has $r$ singular values, defined as $\sigma_i = \sqrt{\lambda_i}$ for $i = 1, \dots, r$ .

Whereas eigenvalues are only defined for square matrices, singular values are defined for any matrices. Soon, we will see how the singular values of a matrix can be used to compute a useful factorization of any matrix, known as the singular value decomposition.

This definition of singular values allows us to give an alternative description of $\text{rank}(A)$ : the rank of $A$ is the number of its singular values.

5A Variational Definition of the Singular Values¶

The above definition was more of a “direct” definition of singular values. In the next example, we will see an analog to the variational characterization of eigenvalues.

Example 2 (An analog to the variational characterization of eigenvalues)

Using the same $A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix}$ as the previous example, we have $\sigma_1 = \sqrt{360} = 6\sqrt{10}$ , $\sigma_2 = \sqrt{90} = 3\sqrt{10}$ . In this case, $A$ only has two singular values as $\lambda_3 = 0$ . For this example, $r = 2$ , and $\lambda_1 = 360 > \lambda_2 = 90 > \lambda_3 = 0$ .

From the previous example, the first singular value of $A$ is the maximum of $\|A\vv x\|$ over all $\|\vv x\| = 1$ , attained at $\vv v_1$ . Our optimization based characterization of eigenvalues of symmetric matrices tells us that the second singular value of $A$ is the maximum of $\|A\vv x\|$ over all unit vectors orthogonal to $\vv v_1$ : this is attained by $\vv u_2$ , the second eigenvector of $A^TA$ . For $\vv u_2$ from the previous example:

A \vv u_2 = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix} \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \\ \frac{2}{3} \end{bmatrix} = \begin{bmatrix} 3 \\ -9 \end{bmatrix}

(8)

This point is on the minor axis of the ellipse in Figure 1, just as $A \vv v_1$ is on the major axis (see Figure 2 below). The two singular values of $A$ are the lengths of the major and mini semiaxes of thes ellipse.

In the above example, the fact that $A\vv v_1$ and $A\vv v_2$ are orthogonal was no accident. In the next section, we’ll explore this in more detail.

Theorem 1

Suppose that $\vv u_1, \ldots, \vv u_n$ is an orthonormal basis for $\mathbb{R}^n$ composed of the eigenvectors of $A^TA$ , ordered so that the corresponding eigenvalues of $A^TA$ satisfy

\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r > \lambda_{r+1} = \cdots = \lambda_n = 0,

(9)

where $r$ denotes the number of nonzero eigenvalues of $A^TA$ , i.e. the number of singular values $\sigma_i = \sqrt{\lambda_i} > 0$ , $i=1,\ldots,r$ of $A$ . Then, $A \vv u_1, \ldots, A\vv u_r$ is an orthogonal basis for Col $(A)$ , and rank $(A) = r$ .

Proof 1 (Proof of Theorem 1)

Because $\vv v_i$ and $\lambda _j \vv v_j$ are orthogonal for $i \neq j$ ,

(A\vv v_i)^T(A \vv v_j) = \vv v_i^TA^TA \vv v_j = \vv v_i^T\lambda_j \vv v_j = 0.

(10)

Thus, $A\vv v_1, \ldots, A\vv v_r$ are mutually orthogonal, and hence linearly independent. They are also clearly contained in $\text{Col}(A)$ . Now, for any $\vv y \in \text{Col}(A)$ , there must be an $\vv x \in \mathbb{R}^n$ such that $\vv y = A\vv x$ . Expanding $\vv x$ in the basis $\vv v_1, \ldots, \vv v_n$ , as $\vv x = c_1\vv v_1 + \cdots + c_n\vv v_n$ for some $c_1, \ldots, c_n \in \mathbb{R}$ , we have:

\begin{align*} \vv y &= A\vv x = A(c_1\vv v_1 + \cdots + c_n\vv v_n) = c_1A\vv v_1 + \cdots + c_rA\vv v_r + c_{r+1}A\vv v_{r+1} + \cdots + c_nA\vv v_n \\ &= c_1A\vv v_1 + \cdots + c_rA\vv v_r. \end{align*}

(11)

We used that $\|A\vv v_i\|^2 = \lambda_i = 0$ for $i=r+1,\ldots,n \Leftrightarrow A\vv v_i = 0$ for $i=r+1,\ldots,n$ in the last equality.

Therefore, we have that $\vv y \in \text{span}\{A\vv v_1, \ldots, A\vv v_r\}$ . Thus $A\vv v_1, \ldots, A\vv v_r$ is both linearly independent and a spanning set for Col $(A)$ , meaning it is an orthogonal basis for Col $(A)$ . Hence, by the Fundamental Theorem of Linear Algebra,

\text{rank}(A) = \text{dim Col}(A) = r.

(12)

6Computing the SVD¶

The decomposition of $A$ involves an $r \times r$ diagonal matrix Σ of the form

\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r).

(13)

We note that because $r = \text{dim Col}(A) = \text{dim Row}(A)$ by the Fundamental Theorem of Linear Algebra, we must have that $r \leq \min\{m,n\}$ if $A \in \mathbb{R}^{m \times n}$ .

Such a factorization of $A$ is called its singular value decomposition, and the columns of $U$ are called the left singular vectors of $A$ , while the columns of $V$ are called the right singular vectors of $A$ .

Proof 2 (Proof of Theorem 2)

Let $\lambda_i$ and $\vv v_i$ be the eigenvalues/vectors of $A^TA$ as described previously, so that $A\vv v_1, \ldots, A\vv v_r$ is an orthogonal basis for col $(A)$ . Normalize each $A\vv v_i$ to form an orthonormal basis for col $(A)$ :

\vv u_i := \frac{1}{\|A\vv v_i\|} A\vv v_i = \frac{1}{\sigma_i} A\vv v_i

(15)

and hence $A\vv v_i = \sigma_i \vv u_i$ for $i=1,\ldots,r$ . Define the matrices

U = \bm \vv u_1 & \cdots & \vv u_r\em \in \mathbb{R}^{m \times r} \quad \text{and} \quad V = \bm \vv v_1 & \cdots & \vv v_r\em \in \mathbb{R}^{n \times r}

(16)

By construction, the columns of $U$ are orthonormal: $U^TU = I_r$ , and similarly for the columns of $V$ : $V^TV = I_r$ .

Let’s define the following “full” matrices:

\hat{U} = \bm U & U^\perp \em \in \mathbb{R}^{m \times m} \quad \text{and} \quad \hat{V} = \bm V & V^\perp\em \in \mathbb{R}^{n \times n}

(17)

Here, $V^\perp = \bm \vv v_{r+1} & \cdots & \vv v_n\em$ has orthonormal columns spanning the orthogonal complement of span $\{\vv v_1, \ldots, \vv v_r\}$ , so that the columns of $\hat{V}$ form an orthonormal basis of $\mathbb{R}^n$ .

Similarly, let $U^\perp$ have orthonormal columns spanning the orthogonal complement of span $\{\vv u_1, \ldots, \vv u_r\}$ , so the columns of $\hat{U}$ form an orthonormal basis for $\mathbb{R}^m$ .

Finally, define

\begin{align*} \hat{\Sigma} &= \begin{bmatrix} \overbrace{\Sigma}^d & \quad \overbrace{0}^{n-r}\} r \\ 0 & \quad \quad \quad 0 \} m-r \end{bmatrix} \end{align*}

(18)

. We first show that

A = \hat{U} \hat{\Sigma} \hat{V}^T, \quad \text{or equivalently (since $\hat{V}$ is orthogonal),} \quad A\hat{V} = \hat{U}\hat{\Sigma}.

(19)

First,

A\hat{V} = \bm A\vv v_1 & \cdots & A\vv v_r & A\vv v_{r+1} & \cdots & A\vv v_n\em = \bm \sigma_1 \vv u_1 & \cdots & \sigma_r\vv u_r & \vv 0 & \cdots & \vv 0\em.

(20)

Then, notice:

\hat{U} \hat{\Sigma} = \bm \vv u_1 \cdots \vv u_r & \vv u_{r+1} & \cdots & \vv u_m\em \begin{bmatrix} \sigma_1 & & 0 & 0 \cdots 0 \\ & \ddots & & \vdots \\ 0 & & \sigma_r & 0 \cdots 0 \\ 0 & \cdots & 0 & 0 \cdots 0 \\ \vdots & & \vdots & \vdots \\ 0 & \cdots & 0 & 0 \cdots 0 \end{bmatrix} = \bm \sigma_1 \vv u_1 & \cdots & \sigma_r \vv u_r & \vv 0 \cdots \vv 0\em.

(21)

So that $A\hat{V} = \hat{U}\hat{\Sigma}$ , or equivalently, $A = \hat{U}\hat{\Sigma}\hat{V}^T$ . But, now, notice:

A = \hat{U} \hat{\Sigma} \hat{V}^T = \bm U & U^\perp\em \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V^T \\ (V^\perp)^T \end{bmatrix} = U \Sigma V^T,

(22)

proving our result.

6.1Python Break!¶

In Python, we can find the SVD of a matrix using the numpy.linalg.svd function:

import numpy as np

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

U, S, Vt = np.linalg.svd(A)

# U: Left singular vectors
print("U matrix:")
print(U)

# S: Singular values (returned as a 1D array)
print("\nSingular values:")
print(S)

# Vt: Right singular vectors (transposed)
print("\nV^T matrix:")
print(Vt)

U matrix:
[[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]

Singular values:
[1.68481034e+01 1.06836951e+00 3.33475287e-16]

V^T matrix:
[[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

Example 3

Let’s use the results of the previous example to construct the SVD of $A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix}$ .

Step 1: Find an orthogonal diagonalization of $A^TA$ . In general, for $A$ with many columns, this is done numerically, but we use the data from before:

A^TA = \hat{V} \Lambda \hat{V}^T = \bm \vv v_1 & \vv v_2 & \vv v_3 \em \begin{bmatrix} \lambda_1 & & \\ & \lambda_2 & \\ & & \lambda_3 \end{bmatrix} \begin{bmatrix} \vv v_1^T \\ \vv v_2^T \\ \vv v_3^T \end{bmatrix}

(23)

with $(\lambda_i, \vv v_i)$ as spectral above with $\lambda_1 = 360$ , $\lambda_2 = 90$ , $\lambda_3 = 0$ .

Step 2: Setup $V$ and Σ: Arrange the nonzero eigenvalues of $A^TA$ in decreasing order and compute the singular values. For this example:

\sigma_1 = 6\sqrt{10} \ \text{and} \ \sigma_2 = 3\sqrt{10}

(24)

and

\Sigma = \text{diag}(\sigma_1, \sigma_2) = \begin{bmatrix} 6\sqrt{10} & 0 \\ 0 & 3\sqrt{10} \end{bmatrix}.

(25)

Hence rank $A = 2$ , and $V \in \mathbb{R}^{3\times2}$ .

The corresponding eigenvectors define the columns of $V$ :

V = \bm \vv v_1 & \vv v_2\em = \begin{bmatrix} \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & -\frac{1}{3} \\ \frac{2}{3} & \frac{2}{3} \end{bmatrix}

(26)

Step 3 Construct $U$ : Since rank $A = 2$ , $U \in \mathbb{R}^{2\times2}$ . The columns of $U$ are given by $A \vv v_1$ and $A\vv v_2$ . Recall we showed above that $\|A \vv v_1\| = \sigma_1$ and $\|A \vv v_2\| = \sigma_2$ . So $U = \bm \vv u_1 & \vv u_2\em$ with

\vv u_1 = \frac{A\vv v_1}{\sigma_1} = \frac{1}{6\sqrt{10}} \begin{bmatrix} 18 \\ 6 \end{bmatrix} = \begin{bmatrix} \frac{3}{\sqrt{10}} \\ \frac{1}{\sqrt{10}} \end{bmatrix} \ \text{and} \\ \vv u_2 = \frac{A\vv v_2}{\sigma_2} = \frac{1}{3\sqrt{10}} \begin{bmatrix} 3 \\ -9 \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{10}} \\ -\frac{3}{\sqrt{10}} \end{bmatrix}

(27)

Finally, the SVD of $A$ is:

A = \underbrace{\begin{bmatrix} \frac{3}{\sqrt{10}} & \frac{1}{\sqrt{10}} \\ \frac{1}{\sqrt{10}} & -\frac{3}{\sqrt{10}} \end{bmatrix}}_U \underbrace{\begin{bmatrix} 6\sqrt{10} & 0 \\ 0 & 3\sqrt{10} \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{3} & \frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & -\frac{1}{3} & \frac{2}{3} \end{bmatrix}}_{V^T}

(28)

You can check that indeed $A = U\Sigma V^T$ here, and that $U^TU = V^TV = I_2$

Example 4 (Finding the SVD of a matrix with a zero singular value (Example 4 of LAA 7.1))

Let’s find the singular values of the matrix $A = \bm 1&-1\\-2&2\\2&-2\em$ . We start by computing the eigenvalues and (unit) eigenvectors of $A^\top A = \bm 9&-9\\-9&9\em$ , which we find to be

\lambda_1 = 18, \quad \vv{v_1} = \bm \frac{1}{\sqrt 2} \\ -\frac{1}{\sqrt 2} \em\\ \lambda_2 = 0, \quad\vv{v_2} = \bm \frac{1}{\sqrt 2} \\ \frac{1}{\sqrt 2}\em

(29)

Hence the only singular value of $A$ , which are given by the square roots of the nonzero eigenvalues of $A^\top A$ , is $\sigma_1 = \sqrt{\lambda_1} = \sqrt{18}$ , with a corresponding right singular vector of $\vv{v_1}$ .

Hence the $V$ matrix is given by:

V = \bm \vv{v_1}\em = \bm \frac{1}{\sqrt 2} \\ -\frac{1}{\sqrt 2} \em,

(30)

the Σ matrix is given by:

\Sigma = \bm \sqrt{18} \em,

(31)

and the $U$ matrix is given by:

U = \bm \frac{A\vv{v_1}}{\sigma_1}\em = \bm \frac 1 3 \\ -\frac 2 3 \\ \frac 2 3 \em

(32)

Hence the (compact) SVD of $A$ is given by:

A = U\Sigma V^\top = \bm \frac 1 3 \\ -\frac 2 3 \\ \frac 2 3 \em \bm\sqrt {18}\em \bm \frac{1}{\sqrt 2} & -\frac{1}{\sqrt 2} \em.

(33)

Example 5 (The singular value decomposition of a transpose)

Answer the following questions:

Given a singular value decomposition, $A = U\Sigma V^T$ , find an SVD of $A^\top$ . How are the singular values of $A$ and $A^T$ related?
For any $n \times n$ matrix $A$ , use the SVD to show that there is an $n \times n$ orthogonal matrix $Q$ such that $A^TA = Q^T(A^TA)Q$ .

Remark. This problem establishes that for any $n \times n$ matrix $A$ , the matrices $AA^T$ and $A^TA$ are orthogonally similar.

Solution to Example 5

If $A = U\Sigma V^T$ , where Σ is $m \times n$ , then $A^T = (V^T)^T \Sigma^T U^T = V\Sigma^T U^T$ . This is an SVD of $A^T$ because $V$ and $U$ are orthogonal matrices and $\Sigma^T$ is an $n \times m$ ``diagonal’’ matrix. Since Σ and $\Sigma^T$ have the same nonzero diagonal entries, $A$ and $A^T$ have the same nonzero singular values. (Note: If $A$ is $2 \times n$ , then $AA^T$ is only $2 \times 2$ and its eigenvalues may be easier to compute (by hand) than the eigenvalues of $A^TA$ .)

Use the SVD to write $A = U\Sigma V^T$ , where $U$ and $V$ are $n \times n$ orthogonal matrices and Σ is an $n \times n$ diagonal matrix. Notice that $U^TU = I = V^TV$ and $\Sigma^T = \Sigma$ , since $U$ and $V$ are orthogonal matrices and Σ is a diagonal matrix. Substituting the SVD for $A$ into $AA^T$ and $A^TA$ results in

AA^T = U\Sigma V^T(U\Sigma V^T)^T = U\Sigma V^TV\Sigma^T U^T = U\Sigma\Sigma^T U^T = U\Sigma^2 U^T,

(34)

and

A^TA = (U\Sigma V^T)^T U\Sigma V^T = V\Sigma^T U^TU\Sigma V^T = V\Sigma^T\Sigma V^T = V\Sigma^2 V^T.

(35)

Let $Q = VU^T$ . Then

Q^T(A^TA)Q = (VU^T)^T(V\Sigma^2 V^T)(VU^T) = UV^TV\Sigma^2 V^TVU^T = U\Sigma^2 U^T = AA^T.

(36)