9.3 Optimization Principles for Eigenvalues of Symmetric Matrices
the extreme eigen values
1Reading¶
Material related to this page, as well as additional exercises, can be found in ALA 8.5.
2Learning Objectives¶
By the end of this page, you should know:
- the effect of the extreme eigenvalues on a vector
- the maxmimal and minimal direction in which the vector gets stretched
- the maximum and minimum values of a quadratic form and where it is achieved
- how the compute the (general) eigenvalues and eigenvectors via the optimization of quadratic forms
3Introduction¶
For symmetric matrices, we’ve seen that we can interpret eigenvalues as stretching of a vector in the directions specified by the eigenvectors. This is most clearly visualized in terms of a unit ball being mapped to an ellipsoid, as we illustrated earlier.
We can use this observation to answer questions such as: what direction is stretched the most by matrix? Or the least? Understanding these questions is essential in areas such as machine learning (which directions are most sensitive to measurement noise or estimation error), control theory (which directions are easiest/hardest to move my system in), and in dimensionality reduction (which directions “explain” most of my data).
These questions all have a flavor of optimization to them: we are looking for directions with the “most” or “least” effect. This motivates a study of eigenvalues of symmetric matrices from an optimization perspective.
4Diagonal Matrix¶
We’ll start with the simple case of a real diagonal matrix . We assume that the diagonal entries, which are the eigenvalues of Λ (why?), appear in decreasing order:
so that is the largest and is the smallest.
The effect of Λ on a vector is to multiply its entries by the corresponding diagonal elements: . Clearly, the maximal stretch occurs in the direction, while the minimal (or least positive) stretch occurs in the direction.
The key idea of the optimization principle for extremal (smallest or biggest) eigenvalues is the following geometric observation. Let’s look at the associated quadratic form
Suppose that we are asked to pick a vector on the unit sphere, i.e., a satisfying , that makes as big or small as possible. We can then measure how much/little has been stretched by looking at the ratio .
So let’s first look at the maximal direction: this means we are looking for that maximizes . Since , we have that
and . This means that
We can use the same reasoning to find that
5Generic Symmetric Matrix¶
Now, can we make a similar statement for a generic symmetric matrix ? Perhaps not surprisingly, using the spectral factorization provides an affirmative answer.
In particular, let be the spectral factorization of . Then
According to our previous discussion, the maximum of over all unit vectors is , which is the same as the largest eigenvalue of . Moreover, since is an orthogonal matrix, it does not change the length of a vector when it acts on it:
So that the maximum of over all is again ! Further, the vector achieving the maximum is , the corresponding (normalized) eigenvector of . This is consistent with our prior geometric discussion: the direction of the maximal stretch is the vector aligned with the largest semi-axis of the ellipsoid defined by , as in this ellipse.
We can apply the same reasoning to compute . We summarize our discussion in the following theorem:
Finally, we note that the above theorem can be generalized to compute general eigenvalues by first eliminating the direction of the larger/smaller eigenvalues. For example, we can compute the second largest eigenvalue of by solving
The key constraint is , which says we can only look for vectors that are orthogonal to , the eigenvector associated with .
We can extend this logic (where we found the second largest eigenvalue by only optimizing over unit vectors which were orthogonal to ) to obtain a characterization of the largest eigenvalue of a symmetric matrix .
Let’s see this theorem at work with an example.