Skip to article frontmatterSkip to article content

3.2 Angles, the Cauchy–Schwarz Inequality, and General Norms

Computing angles between general vectors

Dept. of Electrical and Systems Engineering
University of Pennsylvania

Binder

Lecture notes

1Reading

Material related to this page, as well as additional exercises, can be found in ALA Ch. 3.2.

2Learning Objectives

By the end of this page, you should know:

  • the Cauchy-Schwarz Inequality
  • the generalized angle between vectors
  • orthogonality between vectors
  • the triangle inequality
  • definition and examples of norms

3Generalized Angle

Our starting point in defining the notion of angle in a general inner product space is the familiar formula

vw=vwcos(θ),\vv v \cdot \vv w = \|\vv v\| \|\vv w\| \cos(\theta),

where θ measures the angle between v\vv v and w\vv w.

Angle

Since cos(θ)1\|\cos(\theta)\| \leq 1, we can bound the magnitude of vw\vv v \cdot \vv w as

vwvw.|\vv v \cdot \vv w| \leq \|\vv v\|\|\vv w\|.

Definition 1 makes sense because, by (3), we know that

1v,wvw1.-1 \leq \frac{\langle \vv v, \vv w \rangle}{\|\vv v\|\|\vv w\|} \leq 1.

Hence, θ is well defined, and unique if restricted to be in [0,π][0, \pi].

4Angles between Generic Vectors

Python break!

We show how to use NumPy functions (np.dot, np.linalg.norm, np.arcos) to compute the angle between vectors. We also show how to compute cosine similarity from the cosine distance between vectors using scipy.spatial library.

# angle between vectors
import numpy as np
import scipy

v = np.array([1, 0, 1])
w = np.array([0, 1, 1])

cos_theta = np.dot(v, w)/(np.linalg.norm(v)*np.linalg.norm(w))
theta = np.arccos(cos_theta)
print("Angle between v and w is: ", theta, " rad")

# cosine distance -> cosine similarity
from scipy.spatial import distance

cosine_dist = distance.cosine(v, w)
cosine_sim = 1 - cosine_dist
print("Cosine similarity between v and w is: ", cosine_sim)
Angle between v and w is:  1.0471975511965979  rad
Cosine similarity between v and w is:  0.5

5Orthogonal Vectors

The notion of perpendicular vectors is an important one in Euclidean geometry. These are vectors that meet at a right angle, i.e., θ=π2\theta = \frac{\pi}{2} or θ=π2\theta = -\frac{\pi}{2}, with cosθ=0\cos \theta = 0. This tells us that vectors v\vv v and w\vv w are perpendicular if and only if their dot product vanishes: vw=0\vv v \cdot \vv w = 0 (can you see why via Cauchy-Schwarz?).

We continue with our strategy of extending familiar geometric concepts in Euclidean space to general inner product spaces. For historic reasons, we use the term orthogonal instead of perpendicular.

Orthogonality is an incredibly useful and practical idea that appears all over the place in engineering, AI, and economics, which we will explore in detail next lecture.

6The Triangle Inequality

We know, e.g., from the law of cosines, that the length of one side of a triangle is at most the sum of the length of the other two sides.

Triangle
c2=a2+b22abcos(θ)a2+b2+2ab (since cos(θ)1)=(a+b)2ca+b\begin{align*} c^2 &= a^2 + b^2 - 2ab\cos(\theta) \\ &\leq a^2 + b^2 + 2ab \ (\textrm{since} \ \cos(\theta) \leq 1) \\ &= (a+b)^2 \\ \Rightarrow c &\leq a+b \end{align*}

The idea in (11) extends directly to the setting where we want to relate the length v+w\|\vv v + \vv w\| of the sum of vectors v\vv v, w\vv w to the lengths v\|\vv v\| and w\| \vv w\|.

7Norms

We have seen that inner products allow us to define a natural notion of length. However, there are other sensible ways of measuring the size of a vector that do not arise from an inner product. For example suppose we choose to measure the size of a vector by its ``taxicab distance’’ where we pretend we are a cab driver in Manhattan, and we can only drive go north-south and east-west. We then end up with a different measure of length that makes lots of sense!

To define a general norm on a vector space, we will extract properties that ``make sense’’ as a measure of distance but that do not directly rely on an inner product structure (like angles).

7.1Describing Definition 3

Axiom (i) says ``length’’ should always be non-negative, and only the zero vector has zero length (seems reasonable!)

Axiom (ii) says if I stretch/shrink a vector v\vv v by a factor cRc \in \mathbb{R}, then the length should scale accordingly (this is why we call cRc \in \mathbb{R} a scalar!). Note that c<0c<0 means we stretch/shrink and flip v\mathbf{v}, but flipping shouldn’t affect length, so cv=cv=cv\|c\vv v\| = \|-c\vv v\| = |c|\|\mathbf{v}\|.

Axiom (iii) tells us that lengths of sums of vectors should ``behave as if there is a cosine rule’’ even if there is no notion of angle. This is a less intuitive property but has been identified as a key property to make norms useful to work with.

We will introduce two other commonly used norms in practice, but you should know that there are many many more.

# Different norms

v = np.array([1, -2])
v1 = np.linalg.norm(v, ord=1)
v2 = np.linalg.norm(v)
vinf = np.linalg.norm(v, ord=np.inf)
print("\n1-norm: ", v1, "\n2-norm: ", v2, "\ninfinity norm: ", vinf)

1-norm:  3.0 
2-norm:  2.23606797749979 
infinity norm:  2.0

Binder