9.3 Optimization Principles for Eigenvalues of Symmetric Matrices

1Reading¶

Material related to this page, as well as additional exercises, can be found in ALA 8.5.

2Learning Objectives¶

By the end of this page, you should know:

the effect of the extreme eigenvalues on a vector
the maxmimal and minimal direction in which the vector gets stretched
the maximum and minimum values of a quadratic form and where it is achieved
how the compute the (general) eigenvalues and eigenvectors via the optimization of quadratic forms

3Introduction¶

For symmetric matrices, we’ve seen that we can interpret eigenvalues as stretching of a vector in the directions specified by the eigenvectors. This is most clearly visualized in terms of a unit ball being mapped to an ellipsoid, as we illustrated earlier.

We can use this observation to answer questions such as: what direction is stretched the most by matrix? Or the least? Understanding these questions is essential in areas such as machine learning (which directions are most sensitive to measurement noise or estimation error), control theory (which directions are easiest/hardest to move my system in), and in dimensionality reduction (which directions “explain” most of my data).

These questions all have a flavor of optimization to them: we are looking for directions with the “most” or “least” effect. This motivates a study of eigenvalues of symmetric matrices from an optimization perspective.

4Diagonal Matrix¶

We’ll start with the simple case of a real diagonal matrix $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$ . We assume that the diagonal entries, which are the eigenvalues of Λ (why?), appear in decreasing order:

\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n,

(1)

so that $\lambda_1$ is the largest and $\lambda_n$ is the smallest.

The effect of Λ on a vector $\vv y$ is to multiply its entries by the corresponding diagonal elements: $\Lambda \vv y = \bm \lambda_1 y_1 \\ \vdots \\ \lambda_n y_n \em$ . Clearly, the maximal stretch occurs in the $\vv e_1$ direction, while the minimal (or least positive) stretch occurs in the $\vv e_n$ direction.

The key idea of the optimization principle for extremal (smallest or biggest) eigenvalues is the following geometric observation. Let’s look at the associated quadratic form

q(\vv y) = \vv y^T \Lambda \vv y = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2

(2)

Suppose that we are asked to pick a vector $\vv y$ on the unit sphere, i.e., a $\vv y$ satisfying $\|\vv y\|^2 = 1$ , that makes $q(\vv y)$ as big or small as possible. We can then measure how much/little $\vv y$ has been stretched by looking at the ratio $\frac{q(\vv y)}{\|\vv y\|^2} = q(\vv y) = \|\Lambda^{\frac{1}{2}} \vv y\|^2$ .

So let’s first look at the maximal direction: this means we are looking for $\|\vv y\| = 1$ that maximizes $q(\vv y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2$ . Since $\lambda_1 \geq \lambda_i$ , we have that

q(\vv y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2 \leq \lambda_1 (y_1^2 + \cdots + y_n^2) = \lambda_1

(3)

and $q(\vv e_1) = \lambda_1$ . This means that

\lambda_1 = \max \{ q(\vv y) \mid \|\vv y\| = 1 \}. \qquad (\text{Max})

(4)

We can use the same reasoning to find that

\lambda_n = \min \{ q(\vv y) \mid \|\vv y\| = 1 \}. \qquad (\text{Min})

(5)

5Generic Symmetric Matrix¶

Now, can we make a similar statement for a generic symmetric matrix $K=K^{\top}\in\mathbb{R}^{n\times n}$ ? Perhaps not surprisingly, using the spectral factorization provides an affirmative answer.

In particular, let $K = Q\Lambda Q^T$ be the spectral factorization of $K$ . Then

q(\vv x) = \vv x^TK \vv x = \vv x^TQ\Lambda Q^T \vv x = \vv y^T\Lambda \vv y, \quad \text{where } \vv y = Q^T\vv x.

(6)

According to our previous discussion, the maximum of $\vv y^T\Lambda \vv y$ over all unit vectors $\|\vv y\|=1$ is $\lambda_1$ , which is the same as the largest eigenvalue of $K$ . Moreover, since $Q$ is an orthogonal matrix, it does not change the length of a vector when it acts on it:

1 = \|\vv y\|^2 = \vv y^T\vv y = \vv x^TQQ^T \vv x = \vv x^T \vv x = \|\vv x\|^2 = 1,

(7)

So that the maximum of $q(\vv x)$ over all $\|\vv x\|=1$ is again $\lambda_1$ ! Further, the vector $\vv x$ achieving the maximum is $Q \vv e_1 = \vv u_1$ , the corresponding (normalized) eigenvector of $K$ . This is consistent with our prior geometric discussion: the direction of the maximal stretch is the vector aligned with the largest semi-axis of the ellipsoid defined by $q(\vv x) = \vv x^TK \vv x = c$ , as in this ellipse.

We can apply the same reasoning to compute $\lambda_n$ . We summarize our discussion in the following theorem:

Finally, we note that the above theorem can be generalized to compute general eigenvalues by first eliminating the direction of the larger/smaller eigenvalues. For example, we can compute the second largest eigenvalue of $K$ by solving

\lambda_2 = \max\{\vv x^TK \vv x \mid \|\vv x\|=1, \, \vv x^T \vv v_1 = 0\}.

(9)

The key constraint is $x^T\vv u_1=0$ , which says we can only look for vectors that are orthogonal to $\vv u_1$ , the eigenvector associated with $\lambda_1$ .

Example 2 (Finding the second largest eigenvalue of a symmetric matrix)

Find the maximum value of $q(x_1,x_2,x_3) = 9x_1^2 + 4x_2^2 + 3x_3^2$ subject to the constraint that $\|\vv x\|=1$ and $\vv x^T\vv u_1=0$ , for $\vv u_1=\bm 1 \\ 0 \\ 0 \em$ the eigenvector corresponding to the greatest eigenvalue $\lambda_1=9$ of $K=\text{diag}(9,4,3)$ . The constraint $\vv x^T\vv v_1=0$ means $x_1=0$ , and so we need to find $(x_2,x_3)$ satisfying $x_2^2+x_3^2=1$ that maximizes $4x_2^2+3x_3^2$ . This happens at $(x_2,x_3)=(1,0)$ , leading to a value of 4, which is the second largest eigenvalue $\lambda_2$ of $K$ . The corresponding eigenvector is $\vv u_2=(0,1,0)$ .

We can extend this logic (where we found the second largest eigenvalue by only optimizing over unit vectors which were orthogonal to $\vv{v_1}$ ) to obtain a characterization of the $i^{\text{th}}$ largest eigenvalue of a symmetric matrix $A$ .

Theorem 2 (The general variational characterization of eigenvalues)

Let $A$ be a symmetric matrix with eigenvalues $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n$ and corresponding orthogonal eigenvectors $\vv{v_1}, \dots, \vv{v_n}$ . Then the maximal value of the quadratic form $q(\vv x) = \vv x^\top A\vv x$ over all unit vectors that are orthogonal to the first $j - 1$ eigenvectors is its $j^{\text{th}}$ eigenvalue:

\lambda_j = \max \{ \vv x^\top A\vv x \mid \|\vv x\| = 1, \quad \langle \vv x, \vv{v_1} \rangle = \dots = \langle \vv{x}, \vv{v_2} \rangle = 0 \}.

(10)

Let’s see this theorem at work with an example.

Example 3 (Finding the eigenvalues of a symmetric matrix (Exercise 8.5.37d of ALA))

Given the quadratic form $q(x_1, x_2, x_3) = \bm x_1\\x_2\\x_3\em \bm 6&-4&1\\-4&6&-1\\1&-1&11 \em \bm x_1\\x_2\\x_3\em$ , let’s find its eigenvalues using the optimization-based approach discussed above. If anything, this is more of an exercise in algebra to show how we might find eigenvalues of a symmetric matrix without solving the characteristic equation!

Let’s begin by finding the largest eigenvalue $\lambda_1$ , which is the maximum value of $q(\vv x)$ over all vectors $\vv x$ with unit norm:

\lambda_1 = \max \{ q(\vv x) \mid \|\vv x \|^2 = 1 \}

(11)

For readibility, this can be written as the program:

\begin{align*} &\text{Maximize $q(\vv x)$}\\ &\text{Subject to $\| \vv x\|^2 = 1$} \end{align*}

(12)

Expanding the quadratic form, and rewriting:

\begin{align*} q(x_1, x_2, x_3) &= 6x_1^2 + 6x_2^2 + 11x_3^2 - 8x_1x_2 - 2x_2x_3 + 2x_3x_1\\ &= 6x_1^2 + 6(-x_2)^2 + 11x_3^2 + 8x_1(-x_2) + 2(-x_2)x_3 + 2x_3x_1 \end{align*}

(13)

Let’s try to reason about this expression, and the reason why we have factored out $(-x_2)$ will become apparent. Try to prove the following claims in order:

Claim 1. If $\vv x = (x_1, x_2, x_3)$ is a unit vector maximizing $q(\vv x)$ , the terms $x_1$ , $-x_2$ , and $x_3$ are either all nonnegative or all nonpositive.
Claim 2. There is a unit vector $\vv{x^*} = (x^*_1, x^*_2, x^*_3)$ for which $q(\vv x)$ is maximized AND $x_3^* \geq 0$ .
Claim 3. There is a unit vector $\vv{x^{**}} = (x^{**}_1, x^{**}_2, x^{**}_3)$ for which $q(\vv x)$ is maximized AND $x^{**}_3 \geq 0$ AND $x^{**}_1 = x^{**}_2$ .

And after proving these three smaller claims, conclude that $(\frac {1}{ \sqrt 6}, -\frac{1}{\sqrt 6}, \frac{2}{\sqrt 6})$ is a unit vector maximizing $q$ with $q(\frac {1}{ \sqrt 6}, -\frac{1}{\sqrt 6}, \frac{2}{\sqrt 6}) = 12$ , implying that the largest eigenvalue of the coefficient matrix is $\lambda_1 = 12$ . Hence the largest eigenvalue/vector pair is $\lambda_1 = 12$ and $\vv{v_1} = (1, -1, 2)$ .

Now, let’s find the middle eigenvalue of $A$ , which is the maximum value of $q(\vv x)$ over all unit vectors $\vv x$ which are orthogonal to $\vv{v_1} = (1, -1, 2)$ :

\lambda_2 = \max \{ q(\vv x) \mid \|\vv x\|^2 = 1, \langle(1, -1, 2), \vv x\rangle = 0 \}

(14)

This can be written as the program:

\begin{align*} &\text{Maximize $q(\vv x) = 6x_1^2 + 6x_2^2 + 11x_3^2 - 8x_1x_2 - 2x_2x_3 + 2x_3x_1$}\\ &\text{Subject to $\| \vv x\|^2 = 1$ and $x_1 - x_2 + 2x_3 = 0$} \end{align*}

(15)

We can rewrite the constraint $\langle (1, -1, 2), \vv x\rangle = 0$ as $x_3 = \frac{x_2 - x_1}{2}$ . We can use this to rewrite (15) as the program:

\begin{align*} &\text{Maximize $\frac{31}{4}x_1^2 + \frac{31}{4}x_2^2 - \frac{23}{2}x_1x_2$}\\ &\text{Subject to $\| \vv x\|^2 = \frac{5}{4}x_1^2 + \frac{5}{4}x_2^2 - \frac{x_1x_2}{2} = 1$} \end{align*}

(16)

This is maximized, for example, when $x_1 = \frac{1}{2\sqrt 3}$ and $x_2 = -\frac{1}{2\sqrt 3}$ (for which the objective value is 9); this corresponds to a value of $x_3 = \frac{x_2 - x_1}{2} = -\frac{1}{2\sqrt 3}$ . Hence we conclude that the second largest eigenvalue/vector pair is $\lambda_2 = 9$ and $\vv{v_2} = (-1, 1, 1)$ .

You can iterate this process to find the third largest (and smallest) eigenvalue $\lambda_3$ :

\lambda_3 = \max\{ q(\vv x) \mid \|\vv x\|^2 = 1, \langle (1, -1, 2), \vv x\rangle = \langle (-1, 1, 1), \vv x\rangle = 0 \}

(17)

This is a bit of algebra, so we won’t do it here, but it’s pretty straightforward! You should get that the third largest eigenvalue/vector pair is $\lambda_3 = 2$ and $\vv{v_3} = (1, 1, 0)$ .