4.3 Orthogonal Matrices and the QR Factorization

1Reading¶

Material related to this page, as well as additional exercises, can be found in ALA 4.3.

2Learning Objectives¶

By the end of this page, you should know:

an orthogonal matrix
the QR factorization of a square matrix
how to use the QR factorization to solve systems of equatitons of the form $A\vv{x} = \vv b$ , with $A$ square

3Orthogonal Matrices¶

Rotations and reflections play key roles in geometry, physics, robotics, quantum mechanics, airplanes, compute graphics, data science, and more. These transformations are encoded via orthogonal matrices, that is matirces whose columns form an orthonormal basis for $\mathbb{R}^n$ . They also play a central role in one of the most important methods of linear algebra, the QR factorization.

We start with a definition.

Definition 1 (Orthogonal Matrix)

A square matrix $Q$ is called orthogonal if it satisfies

\begin{align*} QQ^{\top} = Q^{\top}Q = I. \end{align*}

(1)

This means that $Q^{-1} = Q^{\top}$ (in fact, we could define orthogonal matrices this way instead), and that solving linear systems of the form $Q\vv x = \vv b$ is very easy: simply set $\vv x = Q^\top \vv b$ !

Notice that $Q^\top Q = I$ implies that the columns of $Q$ are orthonormal. If $Q = [\vv{q_1}, ..., \vv{q_n}]$ , then

\begin{align*} (Q^\top Q)_{ij} = \vv{q_i}^\top \vv{q_j} = I_{ij} = \begin{cases} 1 \quad\text{if $i \neq j$}\\ 0\quad\text{if $i = j$}\end{cases} \end{align*}

(2)

which is exactly the definition of an orthonormal collcetion of vectors. Further, since ther eare $n$ such vectors, they must form an orthonormal basis for $\mathbb{R}^n$ .

Now, let’s explore some of the consequences of this definition.

Example 1 (

2 \times 2

orthogonal matrices)

A $2\times 2$ matric $Q = \bm a&b\\c&d\em$ is orthogonal if and only if

\begin{align*} Q^\top Q = \bm a^2 + c^2 & ab + cd \\ ac + cd & b^2 + d^2\em = \bm 1& 0\\ 0& 1\em \end{align*}

(3)

or equivalently

\begin{align*} a^2 + c^2 = 1, \quad ab + cd = 0, \quad b^2 + d^2 = 1 \end{align*}

(4)

The first and last equations say that $\bm a\\ c \em$ and $\bm b\\ d \em$ lie on the unit circle in $\mathbb{R}^2$ : a convenient and revealing way of writing this is by setting

\begin{align*} a = \cos \theta, \quad c= \sin \theta, \quad b = \cos \phi, \quad d = \sin \phi \end{align*}

(5)

since $\cos^2 \theta + \sin^2\theta = 1$ for all $\theta \in \mathbb{R}$ .

Our last condition is $0 = ad + cd = \cos\theta \cos \phi +\sin\theta \sin\phi = \cos(\theta - \phi)$ . Now

\begin{align*} \cos (\theta - \phi) = 0 &\iff \theta - \phi = \frac{\pi}{2} + 2 n \pi\quad \text{or} \quad \theta - \phi = -\frac{\pi}{2} \\ &\iff \pi = \theta \pm \frac{\pi}{2} \end{align*}

(6)

This means either:

$b = -\sin\theta$ and $d = \cos\theta$
or $b = \sin\theta$ and $d = -\cos \theta$

As a result, every $2\times 2$ orthogonal matrix has one of two possible forms:

\begin{align*} \bm \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \em \quad\text{or}\quad \bm \cos\theta & \sin\theta \\ \sin\theta & -\cos\theta \em \end{align*}

(7)

where by convention, we restrict $\theta \in [0, 2\pi)$ .

The columns of both matrices form an orthonormal basis for $\mathbb{R}^2$ . The first is obtained by rotating the standard basis $\vv{e_1}, \vv{e_2}$ through angle θ, the second by first reflexting about the x-axis and the rotating.

$Orthogonal matrices in \mathbb{R}^2$

If we think about the map $\vv x \mapsto Q \vv x$ defined by multiplication with an orthogonal matrix as rotating and/or reflectingthe vector $\vv x$ , then the following property should not be surprising:

Before grinding through some algebra, let’s think about this through the lens of rotation and reflections. Multiply $\vv x$ by a product of orthogonal matrices $Q_2Q_1$ is the same as first rotation/reflecting $\vv x$ by $Q_1$ to obtain $Q_1 \vv x$ , and then rotating/reflecting $Q_1 \vv x$ by $Q_2$ to get $Q_2 Q_1 \vv x$ . Now a sequence of rotations and reflections is still ultimately a rotation and/or reflection so we must have $Q_2 Q_1 \vv x = Q \vv x$ for some orthogonal $Q = Q_2 Q_1$ .

Let’s check that this intuition carries over in the math. Since $Q_1$ and $Q_2$ are orthogonal, we have that

\begin{align*} Q^\top_1 Q_1 = I = Q_2^\top Q_2. \end{align*}

(8)

Let’s check that $(Q_1Q_2)^\top (Q_1Q_2) = I$ :

\begin{align*} (Q_1Q_2)^\top (Q_1Q_2) = Q_2^\top \underbrace{Q_1^\top Q_1}_{I}Q_2 = \underbrace{Q_2^\top Q_2}_I = I \end{align*}

(9)

Therefore $(Q_1 Q_2)^{-1} = (Q_1 Q_2)^\top$ , and we indeed have $Q_1Q_2$ is orthogonal.

4The QR Factorization¶

The GSP, when applied to orthonormalize a basis of $\mathbb{R}^n$ , in fact gives us the famous, incredibly useful QR factorization of a matrix:

Let us start with a basis $\vv{b_1}, ..., \vv{b_n}$ for $\mathbb{R}^n$ , and let $\vv{u_1}, ..., \vv{u_n}$ be the result of applying the GSP to it. Define the matrices:

\begin{align*} A = \bm \vv{b_1} & \vv{b_2} & ... & \vv{b_n} \em, \quad Q = \bm \vv{u_1} & \vv{u_2} & ... & \vv{u_n} \em. \end{align*}

(11)

$Q$ is an orthogonal matrix because the $\vv{u_i}$ form an orthonormal basis.

Now, let’s revisit the GSP equations:

\begin{align*} \vv{v_1} &= \vv{b_1} \\ \vv{v_2} &= \vv{b_2} - \frac{\langle \vv{b_2}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1}\\ \vv{v_3} &= \vv{b_3} - \frac{\langle \vv{b_3}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} - \frac{\langle \vv{b_3}, \vv{v_2} \rangle}{\langle \vv{v_2}, \vv{v_2} \rangle}\vv{v_2}\\ \vdots\\ \vv{v_n} &= \vv{b_n} - \frac{\langle \vv{b_n}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} - ... - \frac{\langle \vv{b_n}, \vv{v_{n-1}} \rangle}{\langle \vv{v_{n-1}}, \vv{v_{n-1}} \rangle}\vv{v_{n-1}}\\ \end{align*}

(12)

We start by replacing each element $\vv{v_i}$ with its normalized form, $\vv{u_i} = \frac{\vv{v_i}}{\| \vv{v_i} \|}$ . Rearranging the above, we can write the original basis elements $\vv{b_i}$ in terms of the orthonormal basis $\vv{u_i}$ via the triangular system

\begin{align*} \vv{b_1} &= r_{11}\vv{u_1}\\ \vv{b_2} &= r_{12}\vv{u_1} + r_{22}\vv{u_2}\\ \vv{b_3} &= r_{13}\vv{u_1} + r_{23}\vv{u_2} = r_{33}\vv{u_3}\\ \vdots\\ \vv{b_n} &= r_{1n}\vv{u_1} + r_{2n}\vv{u_2} + ... + r_{nn}\vv{u_n} \end{align*}

(13)

Using our usual trick of taking inner products with both sides we see that

\begin{align*} \langle \vv{b_j}, \vv{u_i}\rangle &= \langle r_{1j}\vv{u_1} + ... + r_{jj}\vv{u_j}, \vv{u_i}\rangle \\ &= r_{1j} \langle \vv{u_1}, \vv{u_i} \rangle + ... + r_{ij} \langle \vv{u_i}, \vv{u_i} \rangle + ... + r_{jj} \langle \vv{u_i}, \vv{u_j} \rangle \\ &= r_{ij} \end{align*}

(14)

So we conclude that $r_{ij} = \langle \vv{b_j}, \vv{u_i} \rangle$ .

Now, returning to (13), we observe that if we define the upper triangular matrix

\begin{align*} R = \bm r_{11} & r_{12} & \dots & r_{1n}\\ 0 & r_{22} & \dots & r_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & r_{nn} \em \end{align*}

(15)

we can write $A = QR$ . Since the GSP works on any basis, the only requirement for $A$ to have a QR factorization is that its columns form a basis for $\mathbb{R}^n$ , i.e., that $A$ be nonsingular.

5Pseudocode for the QR Factorization Algorithm¶

We can condense the above process into an algorithm, for which we give pseudocode below. Note that this algorithm assumes that the underlying inner product is the dot prodcut, but it can easily be adapted to any inner product.

Algorithm 1 (QR Factorization)

Inputs An invertible $n \times n$ matrix $A$ (with entries $a_{ij}$ )

Output Invertible $n \times n$ matrices $Q$ (with entries $q_{ij}$ ) and $R$ (with entries $r_{ij}$ ) such that $A = QR$ , where $Q$ is orthogonal and $R$ is upper triangular

$Q \gets A$
$R \gets$ empty $n \times n$ matrix
for $j=1$ to $n$ :
$\quad$ $r_{jj} \gets \sqrt{q_{1j}^2 + \dots + q_{nj}^2}$
$\quad$ if $r_{jj} = 0$ , stop; print “A has linearly dependent columns”
$\quad$ $\quad$ else for $i = 1$ to $n$
$\quad$ $\quad$ $\quad$ $q_{ij} \gets q_{ij} / r_{jj}$
$\quad$ for $k = j + 1$ to $n$
$\quad$ $\quad$ $r_{jk} \gets q_{1j}q_{1k} + \dots + q_{nj}q_{nk}$
$\quad$ $\quad$ for $i = 1$ to $n$
$\quad$ $\quad$ $\quad$ $q_{ik} \gets q_{ik} - q_{ij}r_{jk}$
return $Q$ , $R$

At first glance, this algorithm might look a little different that the process we just outlined. However, it’s really the same idea, just the order of subtracting off components is different. Instead of visiting each basis vector and subtracting off the components parallel to vectors before it (which we did in the Gram-Schmidt Process), we visit each basis vector and subtract the components parallel to it from every vector after it.

5.1Python break!¶

We give an implementation of Algorithm 1 in NumPy below, and run it on some test cases.

import numpy as np

def qr_factorization(A):                       
    if (A.shape[0] != A.shape[1]):
        print('A is not square')
        return None, None

    n = A.shape[0]                              
    Q = A.copy()                       
    R = np.zeros((n, n))

    for j in range(n):
        R[j, j] = np.linalg.norm(Q[:, j])       
        if R[j, j] < 1e-8:
            print('A has linearly dependent columns')
            return None, None
        else:
            for i in range(n):
                Q[i, j] = Q[i, j] / R[j, j]
        for k in range(j + 1, n):
            R[j, k] = np.dot(Q[:, j], Q[:, k])
            for i in range(n):
                Q[i, k] = Q[i, k] - Q[i, j] * R[j, k]

    return Q, R

print('Test case with invertible matrix:')

A = np.array([[2.0, 1.0, 3.0], [-1.0, 0.0, 1.0], [0.0, 2.0, -1.0]])
print('A:')
print(A)
print('Q:')

Q, R = qr_factorization(A)
print(Q)
print('R:')
print(R)
print('QR:')
print(np.round(Q @ R, 2))
print('(Q^T)Q:')
print(np.round(Q.T @ Q, 2))

print('\nTest case with noninvertible matrix:')

A = np.array([[2.0, 1.0, 3.0], [-1.0, 0.0, 1.0], [1.0, 1.0, 4.0]])
print('A:')
print(A)

Q, R = qr_factorization(A)
print('Q:')
print(Q)
print('R:')
print(R)

Test case with invertible matrix:
A:
[[ 2.  1.  3.]
 [-1.  0.  1.]
 [ 0.  2. -1.]]
Q:
[[ 0.89442719  0.09759001  0.43643578]
 [-0.4472136   0.19518001  0.87287156]
 [ 0.          0.97590007 -0.21821789]]
R:
[[ 2.23606798  0.89442719  2.23606798]
 [ 0.          2.04939015 -0.48795004]
 [ 0.          0.          2.40039679]]
QR:
[[ 2.  1.  3.]
 [-1. -0.  1.]
 [ 0.  2. -1.]]
(Q^T)Q:
[[ 1.  0. -0.]
 [ 0.  1. -0.]
 [-0. -0.  1.]]

Test case with noninvertible matrix:
A:
[[ 2.  1.  3.]
 [-1.  0.  1.]
 [ 1.  1.  4.]]
A has linearly dependent columns
Q:
None
R:
None

6Solving linear systems with a QR factorization¶

Solving linear systems using a QR factorization is easy. Observe that if our goal is to solve $A\vv x = \vv b$ and a QR factorization is available we first notice that

\begin{align*} QR \vv x = \vv b \iff R \vv x = Q^T \vv b = \vv{\tilde b} \end{align*}

(16)

since $Q^\top Q = I$ . Now, solving $R\vv x = \vv{\tilde b}$ can be easily accomplished via backsubstitution since $R$ is an upper triangular matrix!

Exercise 1 (Solving a linear system via QR factorization)

Using a QR factorization, factor the matrix

\begin{align*} A = \bm 1 & 3 \\ 2 & 1\em \end{align*}

(17)

as $A = QR$ . Then, use the QR factorization of $A$ to solve the linear system

\begin{align*} A \vv x = \bm 8 \\ 1 \em \end{align*}

(18)

Solution to Exercise 1

Denote the columns of $A$ as

\begin{align*} \vv{a_1} = \bm 1\\ 2\em, \quad \vv{a_2} = \bm 3 \\ 1 \em \end{align*}

(19)

First, we apply the Gram-Schmidt process to find an orthogonal basis for $\text{span}(\vv{a_1}, \vv{a_2})$ (the columnspace of $A$ ). We that one such orthogonal basis is given by

\begin{align*} \vv{v_1} &= \vv{a_1} = \bm 1\\2 \em\\ \vv{v_2} &= \vv{a_2} - \frac{\langle \vv{a_2}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} = \bm 2\\ -1 \em \end{align*}

(20)

And an orthonormal basis is given by

\begin{align*} \vv{u_1} = \bm \frac{1}{\sqrt 5}\\\frac{2}{\sqrt 5} \em, \quad\vv{u_2} = \bm \frac{2}{\sqrt 5}\\-\frac{1}{\sqrt 5} \em \end{align*}

(21)

So we have that our $Q$ matrix is given by

\begin{align*} Q = \boxed{\bm \frac{1}{\sqrt 5} & \frac{2}{\sqrt 5} \\ \frac{2}{\sqrt 5} & -\frac{1}{\sqrt 5} \em } \end{align*}

(22)

Now we compute the $R$ matrix. Recall that

\begin{align*} R = \bm \langle \vv{u_1}, \vv{a_1} \rangle &\langle \vv{u_1}, \vv{a_2} \rangle \\ 0 &\langle \vv{u_2}, \vv{a_2}\rangle \em \end{align*}

(23)

Substituting in values for $\vv{u_i}$ and $\vv{b_j}$ , we have that our $R$ matrix is given by

\begin{align*} R = \boxed{\bm \sqrt 5 & \sqrt 5 \\ 0 & \sqrt 5 \em } \end{align*}

(24)

You can check that $Q$ is orthogonal ( $Q^\top Q = I$ ), $R$ is invertible upper triangular, and $A = QR$ .

Next, we solve $A\vv x = \bm 8 \\ 1\em$ :

\begin{align*} A\vv x = \bm 8 \\ 1\em &\iff QR \vv x = \bm 8 \\ 1 \em\\ &\iff \underbrace{(Q^\top Q)}_I R \vv x = Q^\top \bm 8 \\ 1 \em\\ &\iff \bm \sqrt 5 & \sqrt 5 \\ 0 & \sqrt 5\em \bm x_1 \\ x_2\em = \bm \frac{1}{\sqrt 5} & \frac{2}{\sqrt 5} \\ \frac{2}{\sqrt 5} & -\frac{1}{\sqrt 5} \em\bm 8\\ 1\em = \bm 2\sqrt 5 \\ 3\sqrt 5 \em \end{align*}

(25)

Solving this with backsubstitution, we find that $(x_1, x_2) = \boxed{(-1, 3)}$ .

7Optional: Generalized QR factorizations¶

In an earlier section, we stated that any real invertible square matrix $A$ had a QR decomposition $A = QR$ , where $Q$ was orthogonal and $R$ is upper triangular and invertible.

While this special case (where the $R$ matrix is invertible) is incredibly useful for solving systems of linear equations, it turns out that every $m\times n$ matrix $A$ has a decomposition of the form $A = QR$ , where $Q$ is and orthogonal $m\times m$ matrix and $R$ is an upper triangular $m\times n$ matrix.