6.7 Similarity, Eigenbases, and Diagonalization

1Reading¶

Material related to this page, as well as additional exercises, can be found in ALA 8.3.

2Learning Objectives¶

By the end of this page, you should know:

how to define eigenbases,
how to define similar matrices,
how to define the geometric and algebraic multiplicity of an eigenvalue,
when and how a square matrix $A$ can be diagonalized as $A = PDP^{-1}$ , where $D$ is diagonal.

3Eigenbases¶

Most of the vector space bases that are useful in applications are assembled from the eigenvectors of a particular matrix. In this section, we focus on matrices with a “complete” set of eigenvectors and show how these form a basis for $\mathbb{R}^n$ (or in the complex case, $\mathbb{C}^n$ ); in these cases, the set of their eigenvectors are known as eigenbases:

Such eigenbases allow us to rewrite the linear transformation determined by a matrix in a simple diagonal form; matrices what allow us to do this are called diagonalizable, a definition which we will formalize shortly. We focus on matrices with real eigenvalues and eigenvectors to start, and will return to matrices with complex eigenvalues/eigenvectors in a few pages.

Our starting point is the following theorem, which we will state as a fact. It is a generalization of the pattern we saw in an example before that the eigenvectors corresponding to distinct eigenvalues are linearly independent:

However, we also saw an example where a $3\times 3$ matrix only had two distinct eigenvalues, but still had three linearly independent eigenvectors:

Example 2 (The eigenbasis from a

3\times 3

matrix a repeated eigenvalue)

Recall the $3\times 3$ matrix $A = \bm 2&-1&-1\\0&3&1\\0&1&3 \em$ . We showed it had the following eigenvalue/vector pairs:

\begin{align*} \lambda_1 = 2&, \quad \vv{v_1} = \bm 1\\0\\1\em,\quad \vv{\hat {v_1}} = \bm 0\\1\\-1\em\\ \lambda_2 = 4&, \quad \vv{v_2} = \bm -1\\1\\1\em \end{align*}

(2)

The collection $\vv{v_1}, \vv{\hat {v_1}}, \vv{v_2} \in \mathbb{R}^3$ are linearly independent, and hence form a basis for $\mathbb{R}^3$ since $\dim \mathbb{R}^3 = 3$ .

4Algebraic and Geometric Multiplicities¶

Notice that in this last example $\dim V_{\lambda_1} = 2$ (why?) for the double eigenvalue $\lambda_1 = 2$ (i.e., the eigenspace corresponding to $\lambda_1$ had dimension of 2), and similarly, $\dim V_{\lambda_2} = 1$ for the simple eigenvalue $\lambda_2 = 4$ , so that there is a “real” eigenvector for each time an eigenvalue appears as a factor of the characteristic polynomial.

These notions can be captured in the idea of algebric and geometric multiplicity:

Our observation is that if the algebraic and geometric multiplicity match for each eigenvalue, then we can form a basis for $\mathbb{R}^n$ .

For the next little bit, we will assume that our matrix $A$ satisfies the above theorem. What does this buy us? To answer this question, we need to introduce the idea of similarity transformations.

5Similar Matrices¶

Given a vector $\vv x \in\mathbb{R}^n$ with coordinates $x_i$ with respect to the standard basis, i.e., $\vv x = x_1 \vv{e_1} + x_2 \vv{e_2} + ... + x_n \vv{e_n}$ , we can find the coordinates $y_1,..., y_n$ of $\vv x$ with respect to a new basis $\vv{b_1}, ..., \vv{b_n}$ by solving the following linear system:

\begin{align*} y_1 \vv{b_1} + y_2 \vv{b_2} + ... + y_n \vv{b_n} = \vv x \iff B \vv y = \vv x \end{align*}

(3)

where $V = \bm \vv{b_1} & \vv{b_2} & ... & \vv{b_n} \em$ . Since the $\vv{b_i}$ form a basis of $\mathbb{R}^n$ , they are linearly independent, which means that $B$ is nonsingular.

Now, suppose I have a matrix $A \in \mathbb{R}^{n\times n}$ , which I use to define the linear transformation $f : \mathbb{R}^n \to \mathbb{R}^n$ , given by a by $f(\vv x) = A \vv x$ . Here the $f$ ’s inputs $\vv x \in \mathbb{R}^n$ and outputs $f(\vv x) \in \mathbb{R}^n$ are both expressed with the standard basis $\vv{e_1}, ..., \vv{e_n}$ , and its matrix representative is $A$ .

What if we would like to implement this linear transformation with respect to the basis $B$ , that is, define a function $g : \mathbb{R}^n\to \mathbb{R}^n$ with inputs $\vv y\in \mathbb{R}^n$ in $B$ -coordinates, and outputs $g(\vv y) \in \mathbb{R}^n$ in $B$ -coordinates? To accomplish this, we need to convert both input $\vv{x}$ and output $f(\vv x)$ to $B$ -coordinates.

Relating inputs $\vv x$ to $B$ -coordinate inputs $\vv y$ is easy: $\vv x = B\vv y$ .
Relating outputs $f(\vv x)$ to $B$ -coordinate outputs $g(\vv y)$ is easy too: $f(\vv x) = Bg(\vv y)$ .

Putting these together, we see that

\begin{align*} f(\vv x) = A\vv x \iff Bg(\vv y) = AB \vv y \end{align*}

(4)

which lets us solve for $g(\vv y) = B^{-1} A B \vv y$ .

We conclude that if $A$ is the matrix representation of a linear transformation in the standard basis, then $B^{-1} A B$ is the matrix representation in the basis $B$ .

Example 3 (Rewriting an linear transformation in another basis)

Consider $A = \bm 1&2\\0&1\em$ and $f(\vv x) = A\vv x$ . This transformation maps $\bm x_1\\x_2\em \mapsto \bm x_1 + 2x_2 \\x_2\em$ . Consider the basis $\vv{b_1} = \bm 1\\2\em, \vv{b_2} = \bm 0\\1\em$ illustrated in blue below:

The basis matrix $B$ is $B = \bm 1&0\\2&1\em$ , and $B^{-1} = \bm 1&0\\-2&1 \em$ . The matrix representation for $g(\vv y)$ is then:

\begin{align*} B^{-1}AB &= \bm 1&0\\-2&1\em \bm 1&2\\0&1\em \bm 1&0\\2&1\em\\ &= \bm 5&2\\-8&-3\em \end{align*}

(5)

and the map $g(\vv y) = B^{-1}AB\vv y$ takes $\bm y_1\\y_2\em \mapsto \bm 5y_1 + 2y_2\\-8y_1 - 3y_2\em$ .

6Diagonalization¶

In the above example, our change of basis didn’t really help us understand what the linear transformation $f(\vv x)$ is doing any better than our starting point. However, we’ll see how that if we use the basis defiend by the eigenvectors of a matrix, some magic happens! We’ll start with an example, and then extract out a general conclusion.

Example 4 (Rewriting an linear transformation in an eigenbasis)

Consider the linear transformation $h(x_1, x_2) = \bm x_1 - x_2 \\ 2x_1 + 4x_2\em$ . It has matrix representation $A = \bm 1&-1\\2&4\em$ 4 with respect to the standard basis of $\mathbb{R}^2$ .

The eigenvalues of $A$ are computed by solving $\det(A - \lambda I) = 0$ :

\begin{align*} \det \bm 1 - \lambda & -1\\2 & 4 - \lambda\em (1 - \lambda)(4 - \lambda) + 2 = \lambda^2 - 5\lambda + 6 = (\lambda - 2)(\lambda - 3) = 0 \end{align*}

(6)

so that $\lambda_1 = 2$ and $\lambda_2 = 3$ . Solving the appropriate eigenvector equations $(A - \lambda_i I)\vv{v_i} = \vv 0$ , we obtain the following eigenvalue/eigenvector pairs:

\begin{align*} \lambda_1 = 2, \vv{v_1} = \bm 1\\-1\em \quad\text{and}\quad \lambda_2 = 3, \vv{v_2} = \bm 1\\-2\em \end{align*}

(7)

Let’s see what happens if we express $A$ in coordinate system defined by the eigenbasis $V = \bm \vv{v_1} & \vv{v_2} \em = \bm 1&1\\-1&-2\em$ .

First, we compute $V^{-1} = \frac{1}{1(-2) - 1(-1)}\bm -2&-1 \\1&1\em = \bm 2&1\\-1&-1\em$ ,

and then find $V^{-1}AV = \bm 2&1\\-1&-1\em \bm 1&-1 \\2&4\em \bm 1&1\\-1&-2\em = \bm 2&0\\0&3\em$ .

This matrix is diagonal! THis means it applies a simple stretching action in the coordinates defined by the eigenvectors. The eigenvalues for this new matrix are also $\lambda_1 = 2$ and $\lambda_2 = 3$ , but in this case, eigenvectors are much simpler: $\vv{\hat{v_1}} = \bm 1\\0\em$ and $\vv{\hat{v_2}} = \bm 0\\1\em$ .

The above example showed us an example of a very important property of an eigenbasis: they diagonalize the original matrix representative! Working with diagonal matrices is very convenient, and thus diagonalization is very useful when we can do it.

Although we only saw a $2\times 2$ example, the idea is applicable to general $n\times n$ matrices, in the idea of diagonalizable matrices.

Let’s try to understand condition (D) a little bit more by writing it as

\begin{align*} AV = VD \end{align*}

(9)

Now, for $V = \bm \vv{v_1} & \vv{v_2} & \dots & \vv{v_n} \em$ , this becomes:

\begin{align*} \bm A\vv{v_1} & A\vv{v_2} & \dots & A\vv{v_n}\em = \bm \lambda_1 \vv{v_1} & \lambda_2 \vv{v_2} & \dots & \lambda_n \vv{v_n} \em \end{align*}

(10)

Focusing on the $k^{th}$ column of this $n\times n$ matrix equation, we see something familiar:

\begin{align*} A\vv{v_k} = \lambda_k \vv{v_k}, \end{align*}

(11)

that is, the columns of $V$ must be eigenvectors, and the diagonal elements $\lambda_i$ must be eigenvectors! Therefore, we immediately get the following characterization of when a matrix is diagonalizable:

Theorem 2 (A necessary and sufficient condition for diagonalizability)

A matrix $A \in \mathbb{R}^{n\times n}$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors. In other words, $A$ is diagonalizable if and only if $A$ is complete.

Equivalently, $A$ is digonalizable if and only if, for each eigenvalue λ, its geometric multiplicity matches its algebraic multiplicity.

In this case, we can diagonalize $A$ as:

\begin{align*} A = \bm \vv{v_1} & \vv{v_2} & \dots & \vv{v_n} \em \bm \lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_n \em \bm \vv{v_1} & \vv{v_2} & \dots & \vv{v_n} \em^{-1} \end{align*}

(12)

where $\lambda_1$ is the eigenvalue corresponding to $\vv{v_1}$ , and so on.

Next, let’s look at some examples of diagonalizable and nondiagonalizable matrices:

Exercise 1 (Checking diagonalizability of

2\times 2

matrices)

For each of the matrices below, determine whether or not they are diagonalizable.

$A = \bm 1&0 \\ 1&1 \em$
$B = \bm 3&-2\\2&-1 \em$
$C = \bm 0&0\\0&0 \em$

Solution to Example 4

First, let’s check if $A$ is diagonalizable. Solving for the eigenvalues of $A$ ,

\begin{align*} \det(A - \lambda I) = 0 \iff (1 - \lambda)(-\lambda) - 1 = 0 \\ \iff \lambda^2 - \lambda - 1 = 0 \iff \lambda = \frac{1\pm \sqrt 5}{2} \end{align*}

(13)

We see that since $A$ has 2 distinct eigenvalues (which is the dimension of $A$ ) it immediately follows from this fact that the eigenvectors of $A$ span $\mathbb{R}^2$ , hence $A$ is diagonalizable.

Next, let’s check if $B$ is diagonalizable. Solving for the eigenvalues of $B$ ,

\begin{align*} \det(B - \lambda I) = 0 \iff (3 - \lambda)(-1 - \lambda) - (-2)(2) = 0 \\ \iff \lambda^2 - 2\lambda + 1 = 0 iff \lambda = 1 \end{align*}

(14)

Since $A$ has only 1 distinct eigenvalue, we must check the dimension of the corresponding eigenspace. Solving the eigenvector equation for $\lambda_1 = 1$ ,

\begin{align*} (B - I) \vv x = \vv 0 \iff \bm 2&-2\\2&-2 \em \vv x = \vv 0 \iff \vv x = a\bm 1\\1 \em, a \in \mathbb{R} \end{align*}

(15)

We see that the corresponding eigenspace only has a dimension of 1, meaning that the eigenvectors of $B$ do not space $\mathbb{R}^2$ , hence $B$ is not diagonalizable.

Next, let’s check if $C$ is diagonalizable. Clearly the eigenvalues of $C$ are just 0, and the corresponding eigenspace is the entire space $\mathbb{R}^2$ . Hence $C$ is diagonalizable, and one such diagonalization is given by:

\begin{align*} \bm 0&0\\0&0 \em = \bm 1&0\\0&1\em\bm 0&0\\0&0\em \bm 1&0\\0&1\em^{-1} \end{align*}

(16)

As we see, there is no direct connection between invertibility and diagonalizaibility, in the sense that one does not imply the other. You can have invertible matrices which are not diagonalizable (like $B$ ) and diagonalizable matrices which are not invertible (like $C$ ).

6.1Python Break!¶

Here, we’ll show how to use numpy.linalg (or scipy.linalg) to diagonalize a matrix.

import numpy as np

# given a square matrix A, returns a tuple of matrices (P, D) such that A = PDP^{-1}
def diagonalize(A):
    evals, evecs = np.linalg.eig(A)
    return evecs, np.diag(evals)

A = np.array([
    [2, -1, -1],
    [0, 3, 1],
    [0, 1, 3]
])

P, D = diagonalize(A)

print('P:')
print(P, '\n')
print('D:')
print(D, '\n')
print('PDP^{-1}:')
print(P @ D @ np.linalg.inv(P))

P:
[[ 1.         -0.57735027  0.        ]
 [ 0.          0.57735027 -0.70710678]
 [ 0.          0.57735027  0.70710678]] 

D:
[[2. 0. 0.]
 [0. 4. 0.]
 [0. 0. 2.]] 

PDP^{-1}:
[[ 2. -1. -1.]
 [ 0.  3.  1.]
 [ 0.  1.  3.]]

As you can see, finding a diagonalization in Python is really easy! The numpy.linalg.eig function returns the eigenvectors in a matrix and the eigenvalues (conveniently, it has the eigenvalues in the same order as the corresponding eigenvectors).