Skip to article frontmatterSkip to article content

6.7 Similarity, Eigenbases, and Diagonalization

Dept. of Electrical and Systems Engineering
University of Pennsylvania

Binder

Lecture notes

1Reading

Material related to this page, as well as additional exercises, can be found in ALA 8.3.

2Learning Objectives

By the end of this page, you should know:

  • how to define eigenbases,
  • how to define similar matrices,
  • how to define the geometric and algebraic multiplicity of an eigenvalue,
  • when and how a square matrix AA can be diagonalized as A=PDP1A = PDP^{-1}, where DD is diagonal.

3Eigenbases

Most of the vector space bases that are useful in applications are assembled from the eigenvectors of a particular matrix. In this section, we focus on matrices with a “complete” set of eigenvectors and show how these form a basis for Rn\mathbb{R}^n (or in the complex case, Cn\mathbb{C}^n); in these cases, the set of their eigenvectors are known as eigenbases:

Such eigenbases allow us to rewrite the linear transformation determined by a matrix in a simple diagonal form; matrices what allow us to do this are called diagonalizable, a definition which we will formalize shortly. We focus on matrices with real eigenvalues and eigenvectors to start, and will return to matrices with complex eigenvalues/eigenvectors in a few pages.

Our starting point is the following theorem, which we will state as a fact. It is a generalization of the pattern we saw in an example before that the eigenvectors corresponding to distinct eigenvalues are linearly independent:

However, we also saw an example where a 3×33\times 3 matrix only had two distinct eigenvalues, but still had three linearly independent eigenvectors:

4Algebraic and Geometric Multiplicities

Notice that in this last example dimVλ1=2\dim V_{\lambda_1} = 2 (why?) for the double eigenvalue λ1=2\lambda_1 = 2 (i.e., the eigenspace corresponding to λ1\lambda_1 had dimension of 2), and similarly, dimVλ2=1\dim V_{\lambda_2} = 1 for the simple eigenvalue λ2=4\lambda_2 = 4, so that there is a “real” eigenvector for each time an eigenvalue appears as a factor of the characteristic polynomial.

These notions can be captured in the idea of algebric and geometric multiplicity:

Our observation is that if the algebraic and geometric multiplicity match for each eigenvalue, then we can form a basis for Rn\mathbb{R}^n.

For the next little bit, we will assume that our matrix AA satisfies the above theorem. What does this buy us? To answer this question, we need to introduce the idea of similarity transformations.

5Similar Matrices

Given a vector xRn\vv x \in\mathbb{R}^n with coordinates xix_i with respect to the standard basis, i.e., x=x1e1+x2e2+...+xnen\vv x = x_1 \vv{e_1} + x_2 \vv{e_2} + ... + x_n \vv{e_n}, we can find the coordinates y1,...,yny_1,..., y_n of x\vv x with respect to a new basis b1,...,bn\vv{b_1}, ..., \vv{b_n} by solving the following linear system:

y1b1+y2b2+...+ynbn=x    By=x\begin{align*} y_1 \vv{b_1} + y_2 \vv{b_2} + ... + y_n \vv{b_n} = \vv x \iff B \vv y = \vv x \end{align*}

where V=[b1b2...bn]V = \bm \vv{b_1} & \vv{b_2} & ... & \vv{b_n} \em. Since the bi\vv{b_i} form a basis of Rn\mathbb{R}^n, they are linearly independent, which means that BB is nonsingular.

Now, suppose I have a matrix ARn×nA \in \mathbb{R}^{n\times n}, which I use to define the linear transformation f:RnRnf : \mathbb{R}^n \to \mathbb{R}^n, given by a by f(x)=Axf(\vv x) = A \vv x. Here the ff’s inputs xRn\vv x \in \mathbb{R}^n and outputs f(x)Rnf(\vv x) \in \mathbb{R}^n are both expressed with the standard basis e1,...,en\vv{e_1}, ..., \vv{e_n}, and its matrix representative is AA.

What if we would like to implement this linear transformation with respect to the basis BB, that is, define a function g:RnRng : \mathbb{R}^n\to \mathbb{R}^n with inputs yRn\vv y\in \mathbb{R}^n in BB-coordinates, and outputs g(y)Rng(\vv y) \in \mathbb{R}^n in BB-coordinates? To accomplish this, we need to convert both input x\vv{x} and output f(x)f(\vv x) to BB-coordinates.

  • Relating inputs x\vv x to BB-coordinate inputs y\vv y is easy: x=By\vv x = B\vv y.

  • Relating outputs f(x)f(\vv x) to BB-coordinate outputs g(y)g(\vv y) is easy too: f(x)=Bg(y)f(\vv x) = Bg(\vv y).

Putting these together, we see that

f(x)=Ax    Bg(y)=ABy\begin{align*} f(\vv x) = A\vv x \iff Bg(\vv y) = AB \vv y \end{align*}

which lets us solve for g(y)=B1AByg(\vv y) = B^{-1} A B \vv y.

We conclude that if AA is the matrix representation of a linear transformation in the standard basis, then B1ABB^{-1} A B is the matrix representation in the basis BB.

alt text

6Diagonalization

In the above example, our change of basis didn’t really help us understand what the linear transformation f(x)f(\vv x) is doing any better than our starting point. However, we’ll see how that if we use the basis defiend by the eigenvectors of a matrix, some magic happens! We’ll start with an example, and then extract out a general conclusion.

The above example showed us an example of a very important property of an eigenbasis: they diagonalize the original matrix representative! Working with diagonal matrices is very convenient, and thus diagonalization is very useful when we can do it.

Although we only saw a 2×22\times 2 example, the idea is applicable to general n×nn\times n matrices, in the idea of diagonalizable matrices.

Let’s try to understand condition (D) a little bit more by writing it as

AV=VD\begin{align*} AV = VD \end{align*}

Now, for V=[v1v2vn]V = \bm \vv{v_1} & \vv{v_2} & \dots & \vv{v_n} \em, this becomes:

[Av1Av2Avn]=[λ1v1λ2v2λnvn]\begin{align*} \bm A\vv{v_1} & A\vv{v_2} & \dots & A\vv{v_n}\em = \bm \lambda_1 \vv{v_1} & \lambda_2 \vv{v_2} & \dots & \lambda_n \vv{v_n} \em \end{align*}

Focusing on the kthk^{th} column of this n×nn\times n matrix equation, we see something familiar:

Avk=λkvk,\begin{align*} A\vv{v_k} = \lambda_k \vv{v_k}, \end{align*}

that is, the columns of VV must be eigenvectors, and the diagonal elements λi\lambda_i must be eigenvectors! Therefore, we immediately get the following characterization of when a matrix is diagonalizable:

Next, let’s look at some examples of diagonalizable and nondiagonalizable matrices:

6.1Python Break!

Here, we’ll show how to use numpy.linalg (or scipy.linalg) to diagonalize a matrix.

import numpy as np

# given a square matrix A, returns a tuple of matrices (P, D) such that A = PDP^{-1}
def diagonalize(A):
    evals, evecs = np.linalg.eig(A)
    return evecs, np.diag(evals)

A = np.array([
    [2, -1, -1],
    [0, 3, 1],
    [0, 1, 3]
])

P, D = diagonalize(A)

print('P:')
print(P, '\n')
print('D:')
print(D, '\n')
print('PDP^{-1}:')
print(P @ D @ np.linalg.inv(P))
P:
[[ 1.         -0.57735027  0.        ]
 [ 0.          0.57735027 -0.70710678]
 [ 0.          0.57735027  0.70710678]] 

D:
[[2. 0. 0.]
 [0. 4. 0.]
 [0. 0. 2.]] 

PDP^{-1}:
[[ 2. -1. -1.]
 [ 0.  3.  1.]
 [ 0.  1.  3.]]

As you can see, finding a diagonalization in Python is really easy! The numpy.linalg.eig function returns the eigenvectors in a matrix and the eigenvalues (conveniently, it has the eigenvalues in the same order as the corresponding eigenvectors).

Binder