1 Reading ¶ Material related to this page, as well as additional exercises, can be found in ALA 4.1.
2 Learning Objectives ¶ By the end of this page, you should know:
orthogonal and orthonormal basis and their examples how to check if a basis is orthogonal and orthonormal to write coordinates of a vector in orthogonal and orthonormal basis 3 Orthogonality ¶ Orthogonality is a generalization/abstraction of perpendicularity (right angles) to general inner product spaces. Algorithms using orthogonality are at the core of modern linear algebra, and include the Gram-Schmidt algorithm, the QR decomposition, and the least-squares algorithm, all of which we shall see in this lecture.
More abstract applications of orthogonality, that you will see for example, in ESE 2240 include the Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT), algorithms that lie at the heart of modern digital media (e.g., JPEG image compression and MP3 audio compression).
4 Orthogonal and Orthonormal Bases ¶ Let V V V be an inner product space (as usual, we assign that the scalars over which V V V is defined are real valued). Remember that v , w ∈ V \vv v, \vv w \in V v , w ∈ V are orthogonal if ⟨ v , w ⟩ = 0 \langle \vv v, \vv w\rangle = 0 ⟨ v , w ⟩ = 0 . If v , w ∈ R n \vv v, \vv w \in \mathbb{R}^n v , w ∈ R n and ⟨ v , w ⟩ = v ⋅ w \langle \vv v, \vv w\rangle = \vv v \cdot \vv w ⟨ v , w ⟩ = v ⋅ w is the dot product, this simply means that v \vv v v and w \vv w w are perpendicular (meet at a right angle).
Orthogonal vectors are useful, because they point in completely different directions, making them particularly well-suited for defining bases. Orthogonal vectors give rise to the concept of an orthogonal basis .
Definition 1 (Orthogonal Basis)
A basis b 1 , . . . , b n \vv{b_1}, ..., \vv{b_n} b 1 , ... , b n of an n n n -dimensional inner product space V V V is called orthogonal if ⟨ b i , b j ⟩ = 0 \langle \vv{b_i}, \vv{b_j}\rangle = 0 ⟨ b i , b j ⟩ = 0 for all i ≠ j i\neq j i = j . In this case, the collection b i \vv{b_i} b i are said to be mutually orthogonal , i.e., every pair of distinct vectors are orthogonal .
If each basis vector in an orthogonal basis is a unit vector (has norm equal to one), then they form a special type of orthogonal basis known as an orthonormal basis.
Definition 2 (Orthonormal Basis)
An orthogonal basis b 1 , . . . , b n \vv{b_1}, ..., \vv{b_n} b 1 , ... , b n of an n n n -dimensional inner product space V V V is called orthonormal if ∥ b i ∥ = 1 \|\vv{b_i}\| = 1 ∥ b i ∥ = 1 for each i i i . Here, ∥ v ∥ = ⟨ v , v ⟩ \|\vv v\| = \sqrt{\langle \vv v, \vv v\rangle} ∥ v ∥ = ⟨ v , v ⟩ is the norm induced by the inner product.
A simply way to construct an orthonormal basis from an orthogonal basis is to normalize each of its elements, that is, to replace each basis element b i \vv{b_i} b i with its normalized counterpart b i ∥ b i ∥ \frac{\vv{b_i}}{\|\vv{b_i}\|} ∥ b i ∥ b i . As an exercise, can you formally verify that b 1 ∥ b 1 ∥ , . . . , b n ∥ b n ∥ \frac{\vv{b_1}}{\|\vv{b_1}\|}, ..., \frac{\vv{b_n}}{\|\vv{b_n}\|} ∥ b 1 ∥ b 1 , ... , ∥ b n ∥ b n is an orthonormal basis for if b 1 , . . . , b n \vv{b_1}, ..., \vv{b_n} b 1 , ... , b n is an orthogonal one? Can you explain why rescaling each entry does not affect the mutual orthogonality of this set?
Example 1 (The standard basis for
R n \mathbb{R}^n R n )
A familiar example of an orthonormal basis for R n \mathbb{R}^n R n equipped with the standard inner product is the collection of standard basis elements:
e 1 = [ 1 0 ⋮ 0 ] , e 2 = [ 0 1 ⋮ 0 ] , . . . , e n = [ 0 0 ⋮ 1 ] \begin{align*}
\vv{e_1} = \bm 1 \\ 0 \\ \vdots \\ 0\em, \quad\vv{e_2} = \bm 0 \\ 1 \\ \vdots \\ 0\em,\quad ...,\quad \vv{e_n} = \bm 0 \\ 0 \\ \vdots \\ 1\em
\end{align*} e 1 = ⎣ ⎡ 1 0 ⋮ 0 ⎦ ⎤ , e 2 = ⎣ ⎡ 0 1 ⋮ 0 ⎦ ⎤ , ... , e n = ⎣ ⎡ 0 0 ⋮ 1 ⎦ ⎤ This is known as the standard basis of R n \mathbb{R}^n R n .
A very useful property of a collection of mutually orthogonal vectors is that they are automatically linearly independent. In particular, if v 1 , . . . , v k \vv{v_1}, ..., \vv{v_k} v 1 , ... , v k satisfy ⟨ v i , v j ⟩ = 0 \langle \vv{v_i}, \vv{v_j} \rangle = 0 ⟨ v i , v j ⟩ = 0 for all i ≠ j i \neq j i = j (and v i ≠ 0 \vv{v_i} \neq 0 v i = 0 for all i i i ), then they are linearly independent .
To see this, we take an arbitrary linear combination of the v i \vv{v_i} v i and set it to 0:
c 1 v 1 + c 2 v 2 + . . . + c k v k = 0 \begin{align*}
c_1 \vv{v_1} + c_2\vv{v_2} + ... + c_k\vv{v_k} = \vv 0
\end{align*} c 1 v 1 + c 2 v 2 + ... + c k v k = 0 Let’s take the inner product of both sides of this equation with any v i \vv{v_i} v i :
0 = ⟨ 0 , v i ⟩ = ⟨ c 1 v 1 + c 2 v 2 + . . . + c k v k , v i ⟩ = c 1 ⟨ v 1 , v i ⟩ + . . . + c i ⟨ v i , v i ⟩ + . . . + c k ⟨ v k , v i ⟩ (linearity of ⟨ ⋅ , v i ⟩ ) = c i ⟨ v i , v i ⟩ = c i ∥ v i ∥ 2 (orthogonality) \begin{align*}
0 = \langle \vv 0, \vv{v_i} \rangle &= \langle c_1\vv{v_1} + c_2\vv{v_2} + ... + c_k\vv{v_k}, \vv{v_i}\rangle\\
&= c_1\langle \vv{v_1}, \vv{v_i} \rangle + ... + c_i\langle \vv{v_i}, \vv{v_i} \rangle + ... + c_k\langle \vv{v_k}, \vv{v_i} \rangle \quad \text{(linearity of $\langle \cdot, \vv{v_i}\rangle$)}\\
&= c_i\langle \vv{v_i}, \vv{v_i}\rangle = c_i \|\vv{v_i}\|^2 \quad\text{(orthogonality)}
\end{align*} 0 = ⟨ 0 , v i ⟩ = ⟨ c 1 v 1 + c 2 v 2 + ... + c k v k , v i ⟩ = c 1 ⟨ v 1 , v i ⟩ + ... + c i ⟨ v i , v i ⟩ + ... + c k ⟨ v k , v i ⟩ (linearity of ⟨ ⋅ , v i ⟩ ) = c i ⟨ v i , v i ⟩ = c i ∥ v i ∥ 2 (orthogonality) Since v i ≠ 0 \vv{v_i} \neq 0 v i = 0 , ∥ v i ∥ 2 > 0 \|\vv{v_i}\|^2 > 0 ∥ v i ∥ 2 > 0 , which means c i = 0 c_i = 0 c i = 0 . We can repeat this game with all v i \vv{v_i} v i for i = 1 , . . . , k i = 1, ..., k i = 1 , ... , k , to conclude that (2) holds only if c 1 = c 2 = . . . = c k = 0 c_1 = c_2 = ... = c_k = 0 c 1 = c 2 = ... = c k = 0 . Hence, the mutually orthogonal collection v 1 , . . . , v k \vv{v_1}, ..., \vv{v_k} v 1 , ... , v k is linearly independent.
Example 2 (Normalizing an orthogonal basis)
The vectors
b 1 = [ 1 2 − 1 ] , b 2 = [ 0 1 2 ] , b 3 = [ 5 − 2 1 ] \begin{align*}
\vv{b_1} = \bm 1 \\2 \\ -1\em, \quad \vv{b_2} = \bm 0\\1\\2\em, \quad \vv{b_3} = \bm 5\\-2\\1\em
\end{align*} b 1 = ⎣ ⎡ 1 2 − 1 ⎦ ⎤ , b 2 = ⎣ ⎡ 0 1 2 ⎦ ⎤ , b 3 = ⎣ ⎡ 5 − 2 1 ⎦ ⎤ are an orthogonal basis for R 3 \mathbb{R}^3 R 3 . One easy way to check this is to confirm that b i ⋅ b j = 0 \vv{b_i} \cdot \vv{b_j} = 0 b i ⋅ b j = 0 for all i ≠ j i\neq j i = j (this is indeed true). Since dim ( R 3 ) = 3 \text{dim} (\mathbb{R}^3) = 3 dim ( R 3 ) = 3 , and b 1 , b 2 , b 3 \vv{b_1}, \vv{b_2}, \vv{b_3} b 1 , b 2 , b 3 are linearly independent, they must be a basis.
To turn them from an orthogonal basis into an orthonormal basis, we simply divide every vector by its length to obtain
v 1 = b 1 ∥ b 1 ∥ = 1 6 [ 1 2 − 1 ] , v 2 = b 2 ∥ b 2 ∥ = 1 5 [ 0 1 2 ] , v 3 = b 3 ∥ b 3 ∥ = 1 30 [ 5 − 2 1 ] \begin{align*}
\vv{v_1} = \frac{\vv{b_1}}{\|\vv{b_1}\|} = \frac{1}{\sqrt 6}\bm 1\\2\\-1\em, \quad \vv{v_2} = \frac{\vv{b_2}}{\|\vv{b_2}\|} = \frac{1}{\sqrt 5}\bm 0\\1\\2\em,
\quad \vv{v_3} = \frac{\vv{b_3}}{\|\vv{b_3}\|} = \frac{1}{\sqrt{30}}\bm 5\\-2\\1\em
\end{align*} v 1 = ∥ b 1 ∥ b 1 = 6 1 ⎣ ⎡ 1 2 − 1 ⎦ ⎤ , v 2 = ∥ b 2 ∥ b 2 = 5 1 ⎣ ⎡ 0 1 2 ⎦ ⎤ , v 3 = ∥ b 3 ∥ b 3 = 30 1 ⎣ ⎡ 5 − 2 1 ⎦ ⎤ This example highlights a more general principle, which is again quite useful: if v 1 , . . . , v n \vv{v_1}, ..., \vv{v_n} v 1 , ... , v n are mutually orthogonal, then they form a basis for their span W = span { v 1 , . . . , v n } ⊆ V W = \text{span}\{ {\vv{v_1}}, ..., \vv{v_n} \} \subseteq V W = span { v 1 , ... , v n } ⊆ V , which is thus a subspace of dim ( W ) = n \text{dim}(W) = n dim ( W ) = n .
It then follows that if dim ( V ) = n \text{dim}(V) = n dim ( V ) = n , then v 1 , . . . , v n \vv{v_1}, ..., \vv{v_n} v 1 , ... , v n are an orthogonal basis for V V V (this is precisely the observation we used in this example ).
4.1 Python break! ¶ In the following code, we demonstrate how to normalize a set of vectors (an orthogonal basis) represented in a matrix using np.linalg.norm
, so that we obtain a normalized set of vectors (an orthonormal basis).
# Normalizing
import numpy as np
b = np.array([[1, 0, 5],
[2, 1, -2],
[-1, 2, 1]])
print("The basis represented as a matrix: \n", b)
b_norm = np.linalg.norm(b, axis=0) # notice across what axis we are computing the norm
b_normalized = b / b_norm # Dividing a matrix by a vector!!!
print("Normalized basis: \n", b_normalized)
print("Norm of each basis vector: \n", np.linalg.norm(b_normalized, axis=0))
The basis represented as a matrix:
[[ 1 0 5]
[ 2 1 -2]
[-1 2 1]]
Normalized basis:
[[ 0.40824829 0. 0.91287093]
[ 0.81649658 0.4472136 -0.36514837]
[-0.40824829 0.89442719 0.18257419]]
Norm of each basis vector:
[1. 1. 1.]
5 Working in Orthogonal Bases ¶ So why do we care about orthogonal (or even better, orthonormal) bases? Turns out they make a lot of the computations that we’ve been doign so far MUCH easier.
We’ll start with some important properties of computing a vector’s coordinates with respect to an orthogonal basis.
Theorem 1 (Coordinates and Norm in an Orthonormal Basis)
Let u 1 , . . . , u n \vv{u_1}, ..., \vv{u_n} u 1 , ... , u n be an orthonormal basis for an inner product space V V V . Then any v ∈ V \vv v\in V v ∈ V is a linear combination
v = c 1 u 1 + . . . + c n u n \begin{align*}
\vv v = c_1 \vv{u_1} + ... + c_n \vv{u_n}
\end{align*} v = c 1 u 1 + ... + c n u n in which its coordinates are given by
c i = ⟨ v , u i ⟩ , u = 1 , . . . , n \begin{align*}
c_i = \langle \vv v, \vv{u_i} \rangle,\quad u = 1, ..., n
\end{align*} c i = ⟨ v , u i ⟩ , u = 1 , ... , n Moreover, its norm is given by the Pythagorean formula ,
∥ v ∥ 2 = c 1 2 + . . . + c n 2 = ∑ i = 1 n ⟨ v , u i ⟩ 2 \begin{align*}
\| \vv v \|^2 = c_1^2 + ... + c_n^2 = \sum_{i=1}^{n}{\langle \vv v, \vv{u_i}\rangle^2}
\end{align*} ∥ v ∥ 2 = c 1 2 + ... + c n 2 = i = 1 ∑ n ⟨ v , u i ⟩ 2 The trick here is to exploit that
⟨ u i , u j ⟩ = { 0 if i ≠ j 1 if i = j \begin{align*}
\langle \vv{u_i}, \vv{u_j} \rangle = \begin{cases} 0 \quad\text{if $i \neq j$}\\ 1 \quad\text{if $ i = j$}\end{cases}
\end{align*} ⟨ u i , u j ⟩ = { 0 if i = j 1 if i = j Let’s compute
⟨ v , u i ⟩ = ⟨ c 1 u 1 + . . . + c n u n , u i ⟩ = c 1 ⟨ u 1 , u i ⟩ + . . . + c i ⟨ u i , u i ⟩ + . . . + c n ⟨ u n , u i ⟩ (linearity of ⟨ ⋅ , u i ⟩ ) = c i ∥ u i ∥ 2 (orthogonality) = c i ( ∥ u i ∥ = 1 ) \begin{align*}
\langle \vv v, \vv{u_i} \rangle &= \langle c_1\vv{u_1} + ... + c_n\vv{u_n}, \vv {u_i}\rangle\\
&= c_1 \langle \vv{u_1}, \vv{u_i} \rangle + ... + c_i\langle \vv{u_i}, \vv{u_i}\rangle + ... + c_n\langle \vv{u_n}, \vv{u_i} \rangle\quad\text{(linearity of $\langle \cdot, \vv{u_i}\rangle$)}\\
&= c_i \| \vv{u_i} \|^2\quad\text{(orthogonality)}\\
&= c_i \quad\text{$(\| \vv{u_i} \| = 1)$}
\end{align*} ⟨ v , u i ⟩ = ⟨ c 1 u 1 + ... + c n u n , u i ⟩ = c 1 ⟨ u 1 , u i ⟩ + ... + c i ⟨ u i , u i ⟩ + ... + c n ⟨ u n , u i ⟩ (linearity of ⟨ ⋅ , u i ⟩ ) = c i ∥ u i ∥ 2 (orthogonality) = c i ( ∥ u i ∥ = 1 ) So we have c i = ⟨ v , u i ⟩ c_i = \langle \vv v, \vv{u_i}\rangle c i = ⟨ v , u i ⟩ . Now to compute the norm, we again use a similar trick:
∥ v ∥ 2 = ⟨ v , v ⟩ = ⟨ ∑ i = 1 n c i u i , ∑ j = 1 n c j u j ⟩ = ∑ i = 1 n c i ⟨ u i , ∑ j = 1 n c j u j ⟩ (linearity of ⟨ ⋅ , ∑ j = 1 n c j u j ⟩ ) = ∑ i = 1 n ∑ j = 1 n c i c j ⟨ u i , u j ⟩ (linearity of ⟨ u i , ⋅ ⟩ ) = ∑ i = 1 n c i 2 ∥ u i ∥ 2 (orthogonality) = ∑ i = 1 n c i 2 ( ∥ u i = 1 ∥ ) \begin{align*}
\|\vv v\|^2 = \langle \vv v, \vv v\rangle &= \left\langle \sum_{i=1}^{n}{c_i\vv{u_i}}, \sum_{j=1}^{n}{c_j\vv{u_j}} \right\rangle\\
&= \sum_{i=1}^{n}{c_i\left\langle \vv{u_i}, \sum_{j=1}^{n}{c_j\vv{u_j}} \right\rangle}\quad\text{(linearity of $\left\langle \cdot, \sum_{j=1}^{n}{c_j\vv{u_j}} \right\rangle$)}\\
&= \sum_{i=1}^{n}{\sum_{j=1}^{n}{c_ic_j\langle \vv{u_i}, \vv{u_j} \rangle}}\quad\text{(linearity of $\langle \vv{u_i}, \cdot \rangle$)}\\
&= \sum_{i=1}^{n}{c_i^2\|\vv{u_i}\|^2}\quad\text{(orthogonality)}\\
&= \sum_{i=1}^{n}{c_i^2}\quad\text{($\|\vv{u_i} = 1\|$)}
\end{align*} ∥ v ∥ 2 = ⟨ v , v ⟩ = ⟨ i = 1 ∑ n c i u i , j = 1 ∑ n c j u j ⟩ = i = 1 ∑ n c i ⟨ u i , j = 1 ∑ n c j u j ⟩ (linearity of ⟨ ⋅ , ∑ j = 1 n c j u j ⟩ ) = i = 1 ∑ n j = 1 ∑ n c i c j ⟨ u i , u j ⟩ (linearity of ⟨ u i , ⋅ ⟩ ) = i = 1 ∑ n c i 2 ∥ u i ∥ 2 (orthogonality) = i = 1 ∑ n c i 2 (∥ u i = 1∥) A very small change to the above allows us to extend these ideas to orthogonal, but not orthonormal, bases:
Theorem 2 (Coordinates and Norm in an Orthogonal Basis)
If v 1 , . . . , v n \vv{v_1}, ..., \vv{v_n} v 1 , ... , v n are an orthogonal basis, then v ∈ V \vv v\in V v ∈ V can be written
v = a 1 v 1 + . . . + a n v n with a i = ⟨ v , v i ⟩ ∥ v i ∥ 2 \begin{align*}
\vv v = a_1\vv{v_1} + ... + a_n\vv{v_n} \quad\text{with $a_i = \frac{\langle \vv v, \vv{v_i} \rangle}{\|\vv{v_i}\|^2}$}
\end{align*} v = a 1 v 1 + ... + a n v n with a i = ∥ v i ∥ 2 ⟨ v , v i ⟩ and its norm is given by
∥ v ∥ 2 = a 1 2 ∥ v 1 ∥ 2 + . . . + a n 2 ∥ v n ∥ 2 \begin{align*}
\|\vv v\|^2 = a_1^2 \|v_1\|^2 + ... + a_n^2 \|v_n\|^2
\end{align*} ∥ v ∥ 2 = a 1 2 ∥ v 1 ∥ 2 + ... + a n 2 ∥ v n ∥ 2 This is derived using our theorem for orthonormal bases by rescaling the v i \vv{v_i} v i to v i ∥ v i ∥ \frac{\vv{v_i}}{\|\vv{v_i}\|} ∥ v i ∥ v i .