1.6 The Permuted LU-Factorization

1Reading¶

Material related to this page, as well as additional exercises, can be found in ALA Ch. 1.4, LAA Ch 2.5, and ILA Ch. 2.4. These notes are mostly based on ALA Ch 1.4.

2Learning Objectives¶

By the end of this page, you should know:

how to solve linear equations if $A$ is not regular
what are permutation matrices
how to use permuted LU factorization

3Interchanging Rows¶

The method of Gaussian Elimination works only if the matrices are regular. However, not every matrix is regular as given in the example below.

\begin{align*} 2y + z & = 2,\\ 2x +6y + z & = 7,\\ x+y+4z & =3, \end{align*}

(1)

The augmented coefficient matrix for the above set of equations is

\left[ \begin{array}{ccc|c} 0 & 2 & 1 & 2\\ 2& 6 & 1 & 7\\ 1 & 1 & 4 & 3 \end{array}\right].

(2)

In the above example, the entry $(1,1)$ is 0, which is the first diagonal element, but cannot serve as a pivot. The “problem” is because $x$ does not appear in the first equation, but this is actually a good thing because we already have an equation that has only two variables in it. Hence, we need to eliminate $x$ from only one of the other two equations. Let us interchange the first two rows of (2):

\left[ \begin{array}{ccc|c} 2& 6 & 1 & 7\\ 0 & 2 & 1 & 2\\ 1 & 1 & 4 & 3 \end{array}\right],

(3)

which clearly does not change the solution set, but now we have a pivot at $(1,1)$ and $(2,2)$ . Interchanging the first and second row is equivalent to swapping the first and second equation. Now, we can proceed as in Gaussian Elimination to zero out $(3, 1)$ and $(3, 2)$ using elementary row operations to get

\left[ \begin{array}{c|c} U & \textbf{c} \end{array}\right] = \left[ \begin{array}{ccc|c} 2& 6 & 1 & 7\\ 0 & 2 & 1 & 2\\ 0 & 0 & \frac{9}{2} & \frac{3}{2} \end{array}\right].

(4)

The pivots are $2, 2, \frac{9}{2}$ and solving via back substitution yields the solution $x=\frac{5}{6}, y = \frac{5}{6}, z = \frac{1}{3}$ .

This row swapping operation is pretty useful, and so we’ll make it another elementary row operation, in addition to the type 1 operation of adding rows of a matrix together that we’ve been relying on so far.

Square matrices $A$ that can be reduced to upper triangular form with nonzeros along the diagonal via Gaussian Elimination with pivoting play a distingished role in linear algebra. As we’ll see shortly, when such matrices are used to define a system of linear equations $A\vv x = \vv b$ , we know that there will be a unique solution for any choice of right hand side $\vv b$ . Because these kinds of matrices are so important, we’ll give them a name and call them nonsingular. Why we call them this will become clearer later in the course, but for now, this will just be our name for “nice” matrices $A$ as described below.

In contrast, a singular square matrix cannot be reduced to such upper triangular form by such row operations, because at some stage in the elimination procedure the diagonal pivot entry and all of the entries below it are zero.

We summarize the discussion about unique solutions of systems of equations for nonsingular $A$ in the theorem below.

We are able to prove the “if” part of this theorem, since we know that when $A$ is nonsingular, we know that Gaussian Elimination with pivoting succeeds, meaning that the solution exists and is unique. We’ll come back to prove the “only if” part later.

4Small Pivots Cause Numerical Issues¶

It’s important to remember then when linear algebra is applied in engineering, data science, economics, and scientific domains, it is done so using computers. Computers store data using bits (0s and 1s), and as such have limited precision. This means that computers have to eventually round off numbers, because they can’t keep an infinite number of decimal points. This matters because when the numbers you are working with span a very large range (i.e., some numbers are very big and others are very small), then even very small rounding errors introduced by the limits of your computer can cause significant errors in your solutions. Here we’ll look at a simple example of such numerical ill-conditionning, and see that by being cognizant of these issues, we can find a work around.

We’ll look at the simple system of two equations in two unknowns:

\begin{align*} 0.01x +1.6y & = 32.1,\\ x+0.6y & = 22, \end{align*}

(5)

where, the exact solution is $x=10, y=20$ (you can check this). Now let’s assume that we have only have a very primitive computer for solving this system of equations that only retains three digits for any number we store in it. Of course, modern computers have much higher, but still finite, capacity. We can modify the exmaple below to produce similar issues on modern computers.

The actual augmented matrix is given below after making it upper triangular.

\left[ \begin{array}{cc|c} .01 & 1.6 & 32.1\\ 1 & .6 & 22 \end{array}\right] \leftrightarrow \left[ \begin{array}{cc|c} .01 & 1.6 & 32.1\\ 0 & -159.4 & -3188 \end{array}\right].

(6)

Since the calculator only retains three digits, the matrix computed by our primitive computer is instead

\left[ \begin{array}{cc|c} .01 & 1.6 & 32.1\\ 0 & -159 & -319 \end{array}\right].

(7)

If we use back substitution with the rounded augmented matrix (7) to solve the system by, we get first get $y = \frac{-3190}{159} \approx 20.1$ . Well that’s not too bad, we’re only off by 0.1 in $y$ . But, if we proceed to solve for $x$ using this value of $y$ , we get $x \approx -10$ ! This is a big problem! A small error in $y$ , i.e, 0.1, produced a huge error in $x$ ! The problem is because of the small pivot 0.01 at $(1,1)$ . The small error in $y$ is scaled up when evaluating the first equation, where we have to divide the exression by the small pivot 0.01. If we interchanged the rows so that the pivot is 1 and proceed from there via Gaussian Elimination, the approximate solution is ${y \approx 20.1, x \approx 9.9}$ , which is much more reasonable.

5Permutation matrices¶

Similar to how type 1 operations are represented using elementary matrices, we can represent type 2 operations using permutation matrices.

For example, to swap rows 1 and 2 of the matrix $A$ , we left multiply $A$ by $P$ as follows.

A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, P = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \Rightarrow PA = \begin{bmatrix} 4 & 5 & 6 \\ 1 & 2 & 3 \\ 7 & 8 & 9 \end{bmatrix}

(8)

5.1Python Break!¶

import numpy as np

P = np.array([[0, 1, 0], [1, 0, 0], [0, 0, 1]])
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

#Swap rows 1 and 2 of A
P @ A

array([[4, 5, 6],
       [1, 2, 3],
       [7, 8, 9]])

6Permuted LU-factorization¶

From Definition 1, every nonsingular matrix $A$ can be reduced to upper triangular form using type 1 operations and type 2 operations. If we perform all the permutations first using a permutation matrix $P$ , then we obtain a regular matrix $PA$ that admits an $LU$ factorization. Hence we can define the permuted LU factorization of a nonsingular matrix $A$ as follows.

Given a permuted LU factorization of the matrix $A$ , we can solve the system of equations by first doing forward substitution with $L\textbf{z}= P\textbf{b}$ and back substitution with $U\textbf{x} = \textbf{z}$ .

Exercise 1 (Permuting a linear system)

Consider the following linear system:

\begin{align*} x + 2y + z = -2\\ 2x + 4y - z = 12\\ -x + y - z = 5 \end{align*}

(10)

Show that the coefficient matrix

\begin{align*} A = \bm 1 & 2 & 1 \\ 2 & 4 & -1 \\ -1 & 1 & -1 \em \end{align*}

(11)

is irregular.

Then, given the permutation matrix

\begin{align*} P = \bm 1&0&0\\ 0&0&1\\ 0&1&0 \em \end{align*}

(12)

show that the permuted coefficient matrix $PA$ is regular.

Finally, solve the permuted system $PA = P \mathbf b$ by first computing the $LU$ factorization of $PA$ , and then using forward and back substitution with $PA\mathbf x = LU\mathbf x = P\mathbf b$ .

Solution to Exercise 1

First, let’s prove that $A$ is irregular. Let’s run the $LU$ factorization algorithm on $A$ (which is really just Gaussian elimination). With 1 as our first pivot, we eliminate column 1:

\begin{align*} \underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A &= \bm 1&0&0\\0&1&0\\0&0&1\em \bm 1&2&1\\2&4&-1\\-1&1&-1 \em\\ &= \bm 1&0&0\\0&1&0\\0&0&1\em\left(\bm 1&0&0\\2&1&0\\-1&0&1 \em \bm 1&0&0\\-2&1&0\\1&0&1 \em\right) \bm 1&2&1\\2&4&-1\\-1&1&-1 \em\\ &=\left(\bm 1&0&0\\0&1&0\\0&0&1\em\bm 1&0&0\\2&1&0\\-1&0&1 \em \right) \left(\bm 1&0&0\\-2&1&0\\1&0&1 \em \bm 1&2&1\\2&4&-1\\-1&1&-1 \em\right)\\ &=\bm 1&0&0\\2&1&0\\-1&0&1 \em \bm 1&2&1\\0&0&-3\\0&3&0 \em \end{align*}

(13)

Uh oh! Our second pivot (the element at position (2, 2) of the rightmost matrix in the bottom expression) is now zero, so the $LU$ factorization algorithm fails. This means that $A$ doesn’t have an $LU$ factorization, so $A$ is irregular.

The permuted matrix $PA$ is

\begin{align*} PA = \underbrace{\bm 1&0&0\\ 0&0&1\\ 0&1&0 \em}_P \underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A = \bm 1&2&1\\-1&1&-1\\2&4&-1 \em \end{align*}

(14)

Now, let’s run the $LU$ factorization algorithm on $PA$ . With 1 as our first pivot, we eliminate column 1:

\begin{align*} \underbrace{\bm 1&2&1\\-1&1&-1\\2&4&-1 \em}_{PA} &= \bm 1&0&0\\0&1&0\\0&0&1\em \bm 1&2&1\\-1&1&-1\\2&4&-1 \em\\ &= \bm 1&0&0\\0&1&0\\0&0&1\em\left(\bm 1&0&0\\-1&1&0\\2&0&1 \em \bm 1&0&0\\1&1&0\\-2&0&1 \em\right) \bm 1&2&1\\-1&1&-1\\2&4&-1 \em\\ &= \left(\bm 1&0&0\\0&1&0\\0&0&1\em\bm 1&0&0\\-1&1&0\\2&0&1 \em\right) \left(\bm 1&0&0\\1&1&0\\-2&0&1 \em\bm 1&2&1\\-1&1&-1\\2&4&-1 \em\right)\\ &= \underbrace{\bm 1&0&0\\-1&1&0\\2&0&1 \em}_L \underbrace{\bm 1&2&1\\0&3&0\\0&0&-3 \em}_U \end{align*}

(15)

For this exercise, we had a small 3x3 system, so that’s it for the $LU$ factorization algorithm!

The last step is to solve with forwardsubstition and backsubstitution on the system:

\begin{align*} \underbrace{\bm 1&0&0\\-1&1&0\\2&0&1 \em}_L \underbrace{\bm 1&2&1\\0&3&0\\0&0&-3 \em}_U \underbrace{\bm x\\y\\z \em}_{\mathbf x} = \underbrace{\bm -2\\5\\12 \em}_{P \mathbf b} \end{align*}

(16)

And we get:

\begin{align*} U\mathbf x = \bm -2\\3\\18 \em, \quad \mathbf x = \bm 2\\ 1\\ -6\em \end{align*}

(17)

Example 1 (Simultaneously finding the P, L, and U matrices)

So far, we’ve learned that every nonsingular square matrix $A$ has at least one factorization of the form $PA = LU$ , where $P$ is a permutation matrix and $L, U$ are lower and upper triangular, respectively.

In Exercise 1, you were given the permutation $P$ beforehand. But what if don’t have prior knowledge of $P$ ?

It turns out that, given a square nonsingular matrix $A$ , we can compute the $P, L, U$ matrices simultaneously. In this example, we’ll demonstrate a procedure for computing $P, L, U$ simultaneously using the same system as Exercise 1. It’s basically just Gaussian elimination with row swaps!

First, let’s write out the obvious equality

\begin{align*} \underbrace{\bm 1&0&0\\0&1&0\\0&0&1 \em}_{"P"}\underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A = \underbrace{\bm 1&0&0\\0&1&0\\0&0&1 \em}_{"L"} \underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_{"U"} \end{align*}

(18)

You’ll note that the RHS has 2 matrices named $"L"$ and $"U"$ . The idea of the procedure is to use some matrix algebra rules to transform $"L"$ into a lower tringular matrix and $"U"$ into an upper triangular matrix. In the process, we are going to build up the permutation matrix $"P"$ .

Starting off, let’s use 1 as our first pivot to eliminate column 1:

\begin{align*} \underbrace{\bm 1&0&0\\0&1&0\\0&0&1 \em}_{"P"}\underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A &= \underbrace{\bm 1&0&0\\0&1&0\\0&0&1\em}_{"L"} \underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_{"U"}\\ &= \bm 1&0&0\\0&1&0\\0&0&1\em\left(\bm 1&0&0\\2&1&0\\-1&0&1 \em \bm 1&0&0\\-2&1&0\\1&0&1 \em\right) \bm 1&2&1\\2&4&-1\\-1&1&-1 \em\\ &=\left(\bm 1&0&0\\0&1&0\\0&0&1\em\bm 1&0&0\\2&1&0\\-1&0&1 \em \right) \left(\bm 1&0&0\\-2&1&0\\1&0&1 \em \bm 1&2&1\\2&4&-1\\-1&1&-1 \em\right)\\ &=\underbrace{\bm 1&0&0\\2&1&0\\-1&0&1 \em}_{"L"} \underbrace{\bm 1&2&1\\0&0&-3\\0&3&0 \em}_{"U"} \end{align*}

(19)

Just like in Exercise 1, we run into the same issue of a zero pivot. How can we fix this? Recall that for any permutation matrix $P$ swapping 2 rows, $PP = I$ (In this case, we want $P$ to be the permutation matrix swapping rows 2 and 3 of $"U"$ ). We can cleverly insert this expression in the middle of $"L"$ and $"U"$ , and use associativity of matrix multiplication to get:

\begin{align*} \underbrace{\bm 1&0&0\\0&1&0\\0&0&1 \em}_{"P"}\underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A &=\underbrace{\bm 1&0&0\\2&1&0\\-1&0&1 \em}_{"L"} \underbrace{\bm 1&2&1\\0&0&-3\\0&3&0 \em}_{"U"}\\ &= \underbrace{\bm 1&0&0\\2&1&0\\-1&0&1 \em}_{"L"} \left( \bm 1&0&0\\0&0&1\\0&1&0 \em \bm 1&0&0\\0&0&1\\0&1&0 \em \right) \underbrace{\bm 1&2&1\\0&0&-3\\0&3&0 \em}_{"U"}\\ &= \left(\underbrace{\bm 1&0&0\\2&1&0\\-1&0&1 \em}_{"L"} \bm 1&0&0\\0&0&1\\0&1&0 \em \right)\left(\bm 1&0&0\\0&0&1\\0&1&0 \em \underbrace{\bm 1&2&1\\0&0&-3\\0&3&0 \em}_{"U"}\right)\\ &= \underbrace{\bm 1&0&0\\2&0&1\\-1&1&0 \em}_{"L"} \underbrace{\bm 1&2&1\\0&3&0\\0&0&-3 \em}_{"U"} \end{align*}

(20)

We’re almost there! However, our “L” matrix isn’t lower triangular yet! Not to worry, we can fix this with a single row swap:

\begin{align*} &\underbrace{\bm 1&0&0\\0&1&0\\0&0&1 \em}_{"P"}\underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A =\underbrace{\bm 1&0&0\\2&0&1\\-1&1&0 \em}_{"L"} \underbrace{\bm 1&2&1\\0&3&0\\0&0&-3 \em}_{"U"}\\ &\underset{\text{(implies)}}{\implies} \bm 1&0&0\\0&0&1\\0&1&0 \em \underbrace{\bm 1&0&0\\0&1&0\\0&0&1 \em}_{"P"}\underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A = \bm 1&0&0\\0&0&1\\0&1&0 \em\underbrace{\bm 1&0&0\\2&0&1\\-1&1&0 \em}_{"L"} \underbrace{\bm 1&2&1\\0&3&0\\0&0&-3 \em}_{"U"}\\ &\implies \underbrace{\bm 1&0&0\\0&0&1\\0&1&0 \em}_{"P"}\underbrace{\bm 1&2&1\\2&4&-1\\-1&1&-1 \em}_A = \underbrace{\bm 1&0&0\\-1&1&0\\2&0&1 \em}_{"L"} \underbrace{\bm 1&2&1\\0&3&0\\0&0&-3 \em}_{"U"} \end{align*}

(21)

(An astute reader might notice that $"L"$ was unchanged from multiplying it from both the left and right by $P$ (i.e., $"L" = P \times "L" \times "P"$ ) where $P$ was the row swap we defined above. This is no coincidence, and stems from the fact that $"L"$ has a particular structure which we will discuss later.)

And we’re done! But this example was only for a small 3x3 system, where we only had to make one row swap. Let’s try to distill the essence of what we just did into a more general procedure to compute the $PA = LU$ factorization of any square nonsingular matrix (this same procedure will generalize to nonsquare matrices as well).

First, initialize your matrices:
- $"P"$ starts off as the identity matrix.
- $"L"$ starts off as the identity matrix.
- $"U"$ starts off as the input coefficient matrix $A$ .
Second, we are going to perform a series of steps (Step 1, Step 2, ..., Step $n$ ) such that at end of Step $k$ , for $k = 1, 2, ..., n$ , the following invariants are satisfied:
- $"P"\times A = "L" \times "U"$ .
- $"P"$ is a permutation matrix.
- $L$ is lower triangular.
- The trailing (bottomright-most) $(n - k) \times (n - k)$ submatrix of $"L"$ is the identity matrix and only has zeros above it in $L$ .
- The leading (topleft-most) $k \times k$ submatrix of $U$ is upper triangular and only has zeros below it in $"U"$ .
If we do the following, we can always preserve these invariants. At Step $k$ ,
- If the pivot at position $(k, k)$ is nonzero, then continue onto the next step. Otherwise, swap it with an element below it, and left multiply $"P"$ by the permutation matrix corresponding to the rows that were swapped.
- Use said pivot at position $(k, k)$ to eliminate the entries below it by left multiplying $"U"$ by some elementary matrices, and right multiplying “L” by the inverse of those elementary matrices.
Finally, by our invariants, at the end of Step $n$ : $"P"$ , $"L"$ , $"U"$ will be the $PA = LU$ factorization of $A$ .

In this course, we don’t place an emphasis on the gritty details behind this procedure, so we won’t prove it’s correctness here. But still, try to convince yourself that this procedure works for square, nonsingular matrices!

6.1Python Break!¶

Implementing the algorithms you learn in this class from scratch in NumPy is a great way to both gain a deeper understanding of the math and to practice your coding skills. However, when applying these ideas in real word situations, it is usually better to use built-in functions: these typically have 100s, if not 1000s, of engineering hours behind them, and have been optimized for reliability, performance, and scalability. In the code snippet below, we use the standard scientific computing package SciPy’s built in lu implementation for the LU-factorization (with pivots). Note that this implementation is slightly different than the one described in our notes, and instead computes $P, L, U$ such that $A = PLU$ . By the end of the next section on matrix inverses, you’ll know how to relate this expression to the one we described above.

# Permuted LU factorization

import numpy as np
from scipy.linalg import lu

A = np.array([[0, 2, 1],
              [2, 6, 1],
              [1, 1, 4]])
P, L, U = lu(A)
print("P: \n", P, "\nL: \n", L, "\nU: \n", U)
print(f'A - P L U =\n {A - P @ L @ U}')

P: 
 [[0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]] 
L: 
 [[ 1.   0.   0. ]
 [ 0.   1.   0. ]
 [ 0.5 -1.   1. ]] 
U: 
 [[2.  6.  1. ]
 [0.  2.  1. ]
 [0.  0.  4.5]]
A - P L U =
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

# Let's try a bigger example!  How big can you make n before it takes too long to solve?

n = 2000
A = np.random.randn(n,n)

P, L, U = lu(A)
print(f'$max|(A - P L U)_ij| =\n {np.max(np.abs(A - P @ L @ U))}')

$max|(A - P L U)_ij| =
 1.461053500406706e-13