5.1 Linear Functions - ESE 2030 📏

1Reading¶

Material related to this page, as well as additional exercises, can be found in ALA 7.1.

2Learning Objectives¶

By the end of this page, you should know:

definition of Linear functions and some examples
how to verify linearity of functions
how matrix-vector multiplication relates to linear functions
composition of linear functions
inverses of linear functions

3Introduction to Linearity¶

A strategy that we have embraced so far has been to turn algebraic questions into geometric ones. Our foundation for this strategy has been the vector space, which allows us to reason abut a wide range of objects (vectors, polynomials, word histograms, and functions) as “arrows” that we can add, stretch, flip and rotate. Our canonical approach to transforming one vector into another has been through matrix-vector multiplication: we start with a vector $\vv x$ and create a new vector via the mapping $\vv x \mapsto A \vv x$ .

Our goal in this lecture is to give you a brief introduction to the theory of linear functions, of which the function $f(\vv x) =A \vv x$ , is a special case. Linear functions are also known as linear maps, or when applied to function spaces, linear operators. These functions lie at the heart of robotics, computer graphics, quantum mechanics, and dynamical systems. We will see that by introducing just a little bit more abstraction, we can reason about all of these different settings using the same mathematical machinery.

4Linear Functions¶

We start with the basic definition of a linear function which captures the fundamental idea of linearity: closed under addition and scalar multiplication. A formal definition is given below.

Before looking at some common examples, we make a few comments:

Comments

Setting $c = 0$ in rule (L2) tells us that a linear function always maps the zero element $\vv 0 \in V$ to the zero element $\vv 0 \in W$ (note: these are different zero elements as they live in different vector spaces!).
A commonly used trick for verifying linearity is to combine (L1) and (L2) into the single rule:

L(c\vv v + d\vv w) = cL(\vv v) + dL(\vv w)\quad \text{for all} \quad \vv v, \vv w \in V, \quad c,d \in \mathbb{R} \quad \text{(L)}

(2)

We can extend rule (L) to any finite linear combination:

L(c_1\vv v_1 + \cdots + c_k\vv v_k) = c_1L(\vv v_1) + \cdots + c_kL(\vv v_k) \quad \text{(LL)}

(3)

for all $c_1,\ldots,c_k \in \mathbb{R}$ and $\vv v_1,\ldots,\vv v_k \in V$ .

Example 1 (Zero, Identity, and Scalar Multiplication Functions)

The zero function $O(\vv v) = \vv 0$ which maps any $\vv v \in V$ to $\vv 0 \in W$ is easily checked to satisfy rule (L) (both sides are zero!).
The identity function $I(\vv v) = \vv v$ , which leaves any vector $\vv v \in V$ unchanged satisfies rule (L) because both $I(c\vv v + d\vv w) = c\vv v + d\vv w$ and $cI(\vv v) + dI(\vv w) = c\vv v + d\vv w$ .
The scalar multiplication function $M_a(\vv v) = a\vv v$ which scales an element $\vv v \in V$ by the scalar $a \in \mathbb{R}$ defines a linear function from $V$ to itself, with $M_0(\vv v) = O(\vv v)$ and $M_1(\vv v) = I(\vv v)$ appearing as special cases.

Theorem 1

Every linear function $L: \mathbb{R}^n \to \mathbb{R}^m$ is given by matrix-vector multiplication, $L(\vv v) = A\vv v$ , for some $A \in \mathbb{R}^{m \times n}$ .

Proof 1 (Proof of Theorem 1)

The key idea is to apply the linear combination property (LL) to the expression $\vv v = v_1 \vv e_1 + \cdots + v_n \vv e_n$ of $\vv v$ in the standard basis of $\mathbb{R}^n$ :

\begin{align*} L(\vv v) &= L(v_1 \vv e_1 + \cdots + v_n \vv e_n) \\ &\overset{(LL)}{=} v_1 L(\vv e_1) + v_2 L(\vv e_2) + \cdots + v_n L(\vv e_n) \\ &= \bm L(\vv e_1) \quad L(\vv e_2) \quad \cdots \quad L(\vv e_n) \em \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \\ &= A \vv v, \ \text{where}, \ A = \bm L(\vv e_1) \quad L(\vv e_2) \quad \cdots \quad L(\vv e_n) \em, \ \vv v = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \end{align*}

(5)

Thus we have shown that the way to find the matrix representation of a linear function is to evaluate it on the basis elements and then stack them into a matrix $A = \bm L(\vv e_1) &L(\vv e_2) & \cdots & L(\vv e_n)\em$ .

Example 3 (2D rotators)

Let’s consider the function $R_{\theta}: \mathbb{R}^2 \to \mathbb{R}^2$ that rotates a vector ${\vv v \in \mathbb{R}^2}$ counter-clockwise by θ radians. To find its matrix representation, we look at the figure below and apply a little high school trigonometry (SOHCAHTOA anyone?).

Recalling that $\|\vv e_1\| = \|\vv e_2\| = 1$ , and that rotating vectors preserve length, we have:

R_{\theta} (\vv e_1) = \begin{bmatrix} \cos \theta \\ \sin \theta \end{bmatrix}, \quad R_{\theta} (\vv e_2) = \begin{bmatrix} -\sin \theta \\ \cos \theta \end{bmatrix}

(6)

which, when stacked together, give the matrix representation $R_{\theta} (\vv v) = A_{\theta} \vv v$ with

A_{\theta} = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}

(7)

This looks familiar! Indeed, this is the same expression we found when characterizing orthogonal 2x2 matrices. If we then apply $\vv v \mapsto A_{\theta} \vv v$ we obtain:

\hat{\vv v} = R_{\theta} (\vv v) = A_{\theta} \vv v = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} v_1 \cos \theta - v_2 \sin \theta \\ v_1 \sin \theta + v_2 \cos \theta \end{bmatrix}

(8)

which you can check are correct using trigonometry, but follow directly from the linearity of rotation.

4.1Python break!¶

In the below example, we illustrate the rotation of vectors in Python by constructing a rotation matrix as given in Example 3 and applying the linear transform to the original vector.

import numpy as np
import matplotlib.pyplot as plt

def plot_vecs(origin, v1, v2):
    fig, ax = plt.subplots()
    ax.quiver(*origin, *v1, angles='xy', scale_units='xy', scale=1, color='r', label='Original')
    ax.quiver(*origin, *v2, angles='xy', scale_units='xy', scale=1, color='b', label='Rotated')
    ax.set_xlim(-3, 3)
    ax.set_ylim(-3, 3)
    plt.legend()
    plt.grid()
    ax.set_xlabel('X-axis')
    ax.set_ylabel('Y-axis')
    plt.show()

def rot_mat_cons(theta):
    return np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])
    
theta = np.pi/6 # change this and observe how much the vector is rotated
rot_mat = rot_mat_cons(theta)

v1 = np.array([[1, 2]]).T
v2 = rot_mat @ v1

# Define the origin
origin = np.array([[0, 0]]).T

plot_vecs(origin, v1, v2)

5Composition¶

Applying one linear function after another is called composition: let $V, W, Z$ be vector spaces. If $L: V \to W$ and $M: W \to Z$ are linear functions, then the composite function $M \circ L: V \to Z$ , defined by $(M \circ L)(\vv v) = M(L(\vv v))$ , is also linear (easily checked to satisfy rule (L)).

This gives us a “dynamic” interpretation of matrix-matrix multiplication. If $L(\vv v) = A\vv v$ maps $\mathbb{R}^n$ to $\mathbb{R}^m$ and $M(\vv w) = B\vv w$ maps $\mathbb{R}^m$ to $\mathbb{R}^l$ , so that $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{l \times m}$ , then:

(M \circ L)(\vv v) = M(L(\vv v)) = B(A\vv v) = (BA)\vv v

(9)

so that the matrix representation of $M \circ L: \mathbb{R}^n \to \mathbb{R}^l$ is the matrix product $BA \in \mathbb{R}^{l \times n}$ . And, like matrix multiplication, composition of linear functions is in general not commutative (order of applying the function matters!)

Example 4 (Composing rotations)

Composing two rotations results in another rotation: $R_{\phi} \circ R_{\theta} = R_{\phi + \theta}$ , i.e., if we first rotate by θ and then by ϕ, it is the same as rotating by $\theta + \phi$ . Using matrices:

\begin{bmatrix} \cos \phi & -\sin \phi \\ \sin \phi & \cos \phi \end{bmatrix} \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} = A_{\phi} A_{\theta} = A_{\phi+\theta} = \begin{bmatrix} \cos( \phi+\theta) & -\sin(\phi+\theta) \\ \sin(\phi+\theta) & \cos(\phi+\theta) \end{bmatrix}

(10)

Working out the LHS above, we can derive the well-known trigonometric addition formulas:

cos(\phi+\theta) = \cos \phi \cos \theta- \sin \phi \sin \theta, \quad \sin(\phi+\theta) = \cos \phi \sin \theta + \sin \phi \cos \theta

(11)

In fact, this counts as a proof!

5.1Python break!¶

In the below code, we illustrate composition of rotations in Python by multiplying rotation matrices.

theta1 = np.pi/6
theta2 = np.pi/2

rot1 = rot_mat_cons(theta1)
rot2 = rot_mat_cons(theta2)

v1 = np.array([[3, 0]]).T
v2 = rot1 @ rot2 @ v1 # composition of rotations

plot_vecs(origin, v1, v2)

## A reverse rotation

theta3 = -np.pi/2

rot3 = rot_mat_cons(theta3)

v3 = rot3 @ rot1 @ rot2 @ v1 # How can you get to v3 from v2?

plot_vecs(origin, v1, v3)

6Inverses¶

Just as with square matrices, we can define the inverse of a linear function. Let $L: V \to W$ be a linear function. If $M: W \to V$ is a linear function such that:

L \circ M = I_W \quad \text{and} \quad M \circ L = I_V

(12)

where $I_W$ and $I_V$ are identity maps on $W$ and $V$ respectively, then $M$ is the inverse of $L$ and is denoted $M = L^{-1}$ .

Example 5 (Mapping polyamials

P^{(n)}

to

\mathbb{R}^n

and back again)

Let $V = P^{(n)}$ be the space of polynomials of degree $\leq n$ , and let $W = \mathbb{R}^{n+1}$ . Define the linear map $L : P^{(n)} → \mathbb{R}^{n+1}$ as follows: for $p(x) = a_0 + a_1x + \cdots + a_nx^n$ ,

L(p) =\bm a_0 \\ a_1 \\ \vdots \\ a_n \em,

(13)

i.e, $L(p)$ stacks the coefficients of $p(x)$ into a vector $L(p) \in \mathbb{R}^{n+1}$ .

The inverse map $L^{-1}(\vv a)$ is simply the mapping that takes a vector $\vv a = \bm a_0 ,a_1, \ldots, a_n\em^{\top} \in \mathbb{R}^{n+1}$ and outputs the polynomial $L^{−1}(\vv a)(x) = a_0 + a_1x + \cdots + a_nx^n$ . We check that it satisfies

L \circ L^{−1} = I_{\mathbb{R}^{n+1}} \ \text{and} \ L^{−1} \circ L = I_{p^{(n)}}

(14)

First,

(L \circ L^{−1})(\vv a) = L (L^{−1}(\vv a)) = L((a_0 + a_1x + \cdots + a_nx^n) = \bm a_0\\ a1\\ \vdots \\ a_n \em = \vv a

(15)

for any $\vv a \in \mathbb{R}^{n+1}$ , so that $L \circ L^{−1} = I_{\mathbb{R}^{n+1}}$ . Next, we check, for any, $p(x) = a_0 + a_1x+ \cdots + a_nx^n$ :

(L^{−1} \circ L) (p) = L^{−1}(L(p)) = L^{−1}\left(\bm a_0 \\ a_1 \\ \vdots \\ a_n \em \right) = L^{−1}(\vv a) = a_0+a_1×+ \cdots +a_nx^n = p(x)

(16)

So that $L^{−1}\circ L = I_{p(n)}$ .