Skip to article frontmatterSkip to article content

4.3 Orthogonal Matrices and the QR Factorization

A new matrix factorization

Dept. of Electrical and Systems Engineering
University of Pennsylvania

Binder

Lecture notes

1Reading

Material related to this page, as well as additional exercises, can be found in ALA 4.3.

2Learning Objectives

By the end of this page, you should know:

  • an orthogonal matrix
  • the QR factorization of a square matrix
  • how to use the QR factorization to solve systems of equatitons of the form Ax=bA\vv{x} = \vv b, with AA square

3Orthogonal Matrices

Rotations and reflections play key roles in geometry, physics, robotics, quantum mechanics, airplanes, compute graphics, data science, and more. These transformations are encoded via orthogonal matrices, that is matirces whose columns form an orthonormal basis for Rn\mathbb{R}^n. They also play a central role in one of the most important methods of linear algebra, the QR factorization.

We start with a definition.

Now, let’s explore some of the consequences of this definition.

If we think about the map xQx\vv x \mapsto Q \vv x defined by multiplication with an orthogonal matrix as rotating and/or reflectingthe vector x\vv x, then the following property should not be surprising:

Before grinding through some algebra, let’s think about this through the lens of rotation and reflections. Multiply x\vv x by a product of orthogonal matrices Q2Q1Q_2Q_1 is the same as first rotation/reflecting x\vv x by Q1Q_1 to obtain Q1xQ_1 \vv x, and then rotating/reflecting Q1xQ_1 \vv x by Q2Q_2 to get Q2Q1xQ_2 Q_1 \vv x. Now a sequence of rotations and reflections is still ultimately a rotation and/or reflection so we must have Q2Q1x=QxQ_2 Q_1 \vv x = Q \vv x for some orthogonal Q=Q2Q1Q = Q_2 Q_1.

Let’s check that this intuition carries over in the math. Since Q1Q_1 and Q2Q_2 are orthogonal, we have that

Q1Q1=I=Q2Q2.\begin{align*} Q^\top_1 Q_1 = I = Q_2^\top Q_2. \end{align*}

Let’s check that (Q1Q2)(Q1Q2)=I(Q_1Q_2)^\top (Q_1Q_2) = I:

(Q1Q2)(Q1Q2)=Q2Q1Q1IQ2=Q2Q2I=I\begin{align*} (Q_1Q_2)^\top (Q_1Q_2) = Q_2^\top \underbrace{Q_1^\top Q_1}_{I}Q_2 = \underbrace{Q_2^\top Q_2}_I = I \end{align*}

Therefore (Q1Q2)1=(Q1Q2)(Q_1 Q_2)^{-1} = (Q_1 Q_2)^\top, and we indeed have Q1Q2Q_1Q_2 is orthogonal.

4The QR Factorization

The GSP, when applied to orthonormalize a basis of Rn\mathbb{R}^n, in fact gives us the famous, incredibly useful QR factorization of a matrix:

Let us start with a basis b1,...,bn\vv{b_1}, ..., \vv{b_n} for Rn\mathbb{R}^n, and let u1,...,un\vv{u_1}, ..., \vv{u_n} be the result of applying the GSP to it. Define the matrices:

A=[b1b2...bn],Q=[u1u2...un].\begin{align*} A = \bm \vv{b_1} & \vv{b_2} & ... & \vv{b_n} \em, \quad Q = \bm \vv{u_1} & \vv{u_2} & ... & \vv{u_n} \em. \end{align*}

QQ is an orthogonal matrix because the ui\vv{u_i} form an orthonormal basis.

Now, let’s revisit the GSP equations:

v1=b1v2=b2b2,v1v1,v1v1v3=b3b3,v1v1,v1v1b3,v2v2,v2v2vn=bnbn,v1v1,v1v1...bn,vn1vn1,vn1vn1\begin{align*} \vv{v_1} &= \vv{b_1} \\ \vv{v_2} &= \vv{b_2} - \frac{\langle \vv{b_2}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1}\\ \vv{v_3} &= \vv{b_3} - \frac{\langle \vv{b_3}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} - \frac{\langle \vv{b_3}, \vv{v_2} \rangle}{\langle \vv{v_2}, \vv{v_2} \rangle}\vv{v_2}\\ \vdots\\ \vv{v_n} &= \vv{b_n} - \frac{\langle \vv{b_n}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} - ... - \frac{\langle \vv{b_n}, \vv{v_{n-1}} \rangle}{\langle \vv{v_{n-1}}, \vv{v_{n-1}} \rangle}\vv{v_{n-1}}\\ \end{align*}

We start by replacing each element vi\vv{v_i} with its normalized form, ui=vivi\vv{u_i} = \frac{\vv{v_i}}{\| \vv{v_i} \|}. Rearranging the above, we can write the original basis elements bi\vv{b_i} in terms of the orthonormal basis ui\vv{u_i} via the triangular system

b1=r11u1b2=r12u1+r22u2b3=r13u1+r23u2=r33u3bn=r1nu1+r2nu2+...+rnnun\begin{align*} \vv{b_1} &= r_{11}\vv{u_1}\\ \vv{b_2} &= r_{12}\vv{u_1} + r_{22}\vv{u_2}\\ \vv{b_3} &= r_{13}\vv{u_1} + r_{23}\vv{u_2} = r_{33}\vv{u_3}\\ \vdots\\ \vv{b_n} &= r_{1n}\vv{u_1} + r_{2n}\vv{u_2} + ... + r_{nn}\vv{u_n} \end{align*}

Using our usual trick of taking inner products with both sides we see that

bj,ui=r1ju1+...+rjjuj,ui=r1ju1,ui+...+rijui,ui+...+rjjui,uj=rij\begin{align*} \langle \vv{b_j}, \vv{u_i}\rangle &= \langle r_{1j}\vv{u_1} + ... + r_{jj}\vv{u_j}, \vv{u_i}\rangle \\ &= r_{1j} \langle \vv{u_1}, \vv{u_i} \rangle + ... + r_{ij} \langle \vv{u_i}, \vv{u_i} \rangle + ... + r_{jj} \langle \vv{u_i}, \vv{u_j} \rangle \\ &= r_{ij} \end{align*}

So we conclude that rij=bj,uir_{ij} = \langle \vv{b_j}, \vv{u_i} \rangle.

Now, returning to (13), we observe that if we define the upper triangular matrix

R=[r11r12r1n0r22r2n00rnn]\begin{align*} R = \bm r_{11} & r_{12} & \dots & r_{1n}\\ 0 & r_{22} & \dots & r_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & r_{nn} \em \end{align*}

we can write A=QRA = QR. Since the GSP works on any basis, the only requirement for AA to have a QR factorization is that its columns form a basis for Rn\mathbb{R}^n, i.e., that AA be nonsingular.

5Pseudocode for the QR Factorization Algorithm

We can condense the above process into an algorithm, for which we give pseudocode below. Note that this algorithm assumes that the underlying inner product is the dot prodcut, but it can easily be adapted to any inner product.

At first glance, this algorithm might look a little different that the process we just outlined. However, it’s really the same idea, just the order of subtracting off components is different. Instead of visiting each basis vector and subtracting off the components parallel to vectors before it (which we did in the Gram-Schmidt Process), we visit each basis vector and subtract the components parallel to it from every vector after it.

5.1Python break!

We give an implementation of Algorithm 1 in NumPy below, and run it on some test cases.

import numpy as np

def qr_factorization(A):                       
    if (A.shape[0] != A.shape[1]):
        print('A is not square')
        return None, None

    n = A.shape[0]                              
    Q = A.copy()                       
    R = np.zeros((n, n))

    for j in range(n):
        R[j, j] = np.linalg.norm(Q[:, j])       
        if R[j, j] < 1e-8:
            print('A has linearly dependent columns')
            return None, None
        else:
            for i in range(n):
                Q[i, j] = Q[i, j] / R[j, j]
        for k in range(j + 1, n):
            R[j, k] = np.dot(Q[:, j], Q[:, k])
            for i in range(n):
                Q[i, k] = Q[i, k] - Q[i, j] * R[j, k]

    return Q, R

print('Test case with invertible matrix:')

A = np.array([[2.0, 1.0, 3.0], [-1.0, 0.0, 1.0], [0.0, 2.0, -1.0]])
print('A:')
print(A)
print('Q:')

Q, R = qr_factorization(A)
print(Q)
print('R:')
print(R)
print('QR:')
print(np.round(Q @ R, 2))
print('(Q^T)Q:')
print(np.round(Q.T @ Q, 2))

print('\nTest case with noninvertible matrix:')

A = np.array([[2.0, 1.0, 3.0], [-1.0, 0.0, 1.0], [1.0, 1.0, 4.0]])
print('A:')
print(A)

Q, R = qr_factorization(A)
print('Q:')
print(Q)
print('R:')
print(R)
Test case with invertible matrix:
A:
[[ 2.  1.  3.]
 [-1.  0.  1.]
 [ 0.  2. -1.]]
Q:
[[ 0.89442719  0.09759001  0.43643578]
 [-0.4472136   0.19518001  0.87287156]
 [ 0.          0.97590007 -0.21821789]]
R:
[[ 2.23606798  0.89442719  2.23606798]
 [ 0.          2.04939015 -0.48795004]
 [ 0.          0.          2.40039679]]
QR:
[[ 2.  1.  3.]
 [-1. -0.  1.]
 [ 0.  2. -1.]]
(Q^T)Q:
[[ 1.  0. -0.]
 [ 0.  1. -0.]
 [-0. -0.  1.]]

Test case with noninvertible matrix:
A:
[[ 2.  1.  3.]
 [-1.  0.  1.]
 [ 1.  1.  4.]]
A has linearly dependent columns
Q:
None
R:
None

6Solving linear systems with a QR factorization

Solving linear systems using a QR factorization is easy. Observe that if our goal is to solve Ax=bA\vv x = \vv b and a QR factorization is available we first notice that

QRx=b    Rx=QTb=b~\begin{align*} QR \vv x = \vv b \iff R \vv x = Q^T \vv b = \vv{\tilde b} \end{align*}

since QQ=IQ^\top Q = I. Now, solving Rx=b~R\vv x = \vv{\tilde b} can be easily accomplished via backsubstitution since RR is an upper triangular matrix!

7Optional: Generalized QR factorizations

In an earlier section, we stated that any real invertible square matrix AA had a QR decomposition A=QRA = QR, where QQ was orthogonal and RR is upper triangular and invertible.

While this special case (where the RR matrix is invertible) is incredibly useful for solving systems of linear equations, it turns out that every m×nm\times n matrix AA has a decomposition of the form A=QRA = QR, where QQ is and orthogonal m×mm\times m matrix and RR is an upper triangular m×nm\times n matrix.

Binder