Skip to article frontmatterSkip to article content

9.3 Optimization Principles for Eigenvalues of Symmetric Matrices

the extreme eigen values

Dept. of Electrical and Systems Engineering
University of Pennsylvania

Binder

Lecture notes

1Reading

Material related to this page, as well as additional exercises, can be found in ALA 8.5.

2Learning Objectives

By the end of this page, you should know:

  • the effect of the extreme eigenvalues on a vector
  • the maxmimal and minimal direction in which the vector gets stretched
  • the maximum and minimum values of a quadratic form and where it is achieved
  • how the compute the (general) eigenvalues and eigenvectors via the optimization of quadratic forms

3Introduction

For symmetric matrices, we’ve seen that we can interpret eigenvalues as stretching of a vector in the directions specified by the eigenvectors. This is most clearly visualized in terms of a unit ball being mapped to an ellipsoid, as we illustrated earlier.

We can use this observation to answer questions such as: what direction is stretched the most by matrix? Or the least? Understanding these questions is essential in areas such as machine learning (which directions are most sensitive to measurement noise or estimation error), control theory (which directions are easiest/hardest to move my system in), and in dimensionality reduction (which directions “explain” most of my data).

These questions all have a flavor of optimization to them: we are looking for directions with the “most” or “least” effect. This motivates a study of eigenvalues of symmetric matrices from an optimization perspective.

4Diagonal Matrix

We’ll start with the simple case of a real diagonal matrix Λ=diag(λ1,,λn)\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n). We assume that the diagonal entries, which are the eigenvalues of Λ (why?), appear in decreasing order:

λ1λ2λn,\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n,

so that λ1\lambda_1 is the largest and λn\lambda_n is the smallest.

The effect of Λ on a vector y\vv y is to multiply its entries by the corresponding diagonal elements: Λy=[λ1y1λnyn]\Lambda \vv y = \bm \lambda_1 y_1 \\ \vdots \\ \lambda_n y_n \em. Clearly, the maximal stretch occurs in the e1\vv e_1 direction, while the minimal (or least positive) stretch occurs in the en\vv e_n direction.

The key idea of the optimization principle for extremal (smallest or biggest) eigenvalues is the following geometric observation. Let’s look at the associated quadratic form

q(y)=yTΛy=λ1y12++λnyn2q(\vv y) = \vv y^T \Lambda \vv y = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2

Suppose that we are asked to pick a vector y\vv y on the unit sphere, i.e., a y\vv y satisfying y2=1\|\vv y\|^2 = 1, that makes q(y)q(\vv y) as big or small as possible. We can then measure how much/little y\vv y has been stretched by looking at the ratio q(y)y2=q(y)=Λ12y2\frac{q(\vv y)}{\|\vv y\|^2} = q(\vv y) = \|\Lambda^{\frac{1}{2}} \vv y\|^2.

So let’s first look at the maximal direction: this means we are looking for y=1\|\vv y\| = 1 that maximizes q(y)=λ1y12++λnyn2q(\vv y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2. Since λ1λi\lambda_1 \geq \lambda_i, we have that

q(y)=λ1y12++λnyn2λ1(y12++yn2)=λ1q(\vv y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2 \leq \lambda_1 (y_1^2 + \cdots + y_n^2) = \lambda_1

and q(e1)=λ1q(\vv e_1) = \lambda_1. This means that

λ1=max{q(y)y=1}.(Max)\lambda_1 = \max \{ q(\vv y) \mid \|\vv y\| = 1 \}. \qquad (\text{Max})

We can use the same reasoning to find that

λn=min{q(y)y=1}.(Min)\lambda_n = \min \{ q(\vv y) \mid \|\vv y\| = 1 \}. \qquad (\text{Min})

5Generic Symmetric Matrix

Now, can we make a similar statement for a generic symmetric matrix K=KRn×nK=K^{\top}\in\mathbb{R}^{n\times n}? Perhaps not surprisingly, using the spectral factorization provides an affirmative answer.

In particular, let K=QΛQTK = Q\Lambda Q^T be the spectral factorization of KK. Then

q(x)=xTKx=xTQΛQTx=yTΛy,where y=QTx. q(\vv x) = \vv x^TK \vv x = \vv x^TQ\Lambda Q^T \vv x = \vv y^T\Lambda \vv y, \quad \text{where } \vv y = Q^T\vv x.

According to our previous discussion, the maximum of yTΛy\vv y^T\Lambda \vv y over all unit vectors y=1\|\vv y\|=1 is λ1\lambda_1, which is the same as the largest eigenvalue of KK. Moreover, since QQ is an orthogonal matrix, it does not change the length of a vector when it acts on it:

1=y2=yTy=xTQQTx=xTx=x2=1, 1 = \|\vv y\|^2 = \vv y^T\vv y = \vv x^TQQ^T \vv x = \vv x^T \vv x = \|\vv x\|^2 = 1,

So that the maximum of q(x)q(\vv x) over all x=1\|\vv x\|=1 is again λ1\lambda_1! Further, the vector x\vv x achieving the maximum is Qe1=u1Q \vv e_1 = \vv u_1, the corresponding (normalized) eigenvector of KK. This is consistent with our prior geometric discussion: the direction of the maximal stretch is the vector aligned with the largest semi-axis of the ellipsoid defined by q(x)=xTKx=cq(\vv x) = \vv x^TK \vv x = c, as in this ellipse.

We can apply the same reasoning to compute λn\lambda_n. We summarize our discussion in the following theorem:

Finally, we note that the above theorem can be generalized to compute general eigenvalues by first eliminating the direction of the larger/smaller eigenvalues. For example, we can compute the second largest eigenvalue of KK by solving

λ2=max{xTKxx=1,xTv1=0}. \lambda_2 = \max\{\vv x^TK \vv x \mid \|\vv x\|=1, \, \vv x^T \vv v_1 = 0\}.

The key constraint is xTu1=0x^T\vv u_1=0, which says we can only look for vectors that are orthogonal to u1\vv u_1, the eigenvector associated with λ1\lambda_1.

We can extend this logic (where we found the second largest eigenvalue by only optimizing over unit vectors which were orthogonal to v1\vv{v_1}) to obtain a characterization of the ithi^{\text{th}} largest eigenvalue of a symmetric matrix AA.

Let’s see this theorem at work with an example.

Binder