8.2 Markov Processes - ESE 2030 📏

1Reading¶

Material related to this page, as well as additional exercises, can be found in Section 4.9 and Chapter 10 of LAA $5^{th}$ edition, and ALA 9.3.

2Learning Objectives¶

By the end of this page, you should know:

what is a Markov chain model using a weather prediction example
what is a probability vector and a transition matrix (regular)
when does a Markov chain converge to a unique probability vector
how the eigenvalue and eigenvector of the transition matrix relates to the convergence of the Markov chain

3Weather Prediction: Introduction¶

We will spend this section on Markov Chains, which are a widely used linear iterative model to describe a wide variety of situations in biology, business, chemistry, engineering, physics, and elsewhere.

In each case, the model is used to describe an experiment or measurement that is performed many times in the same way. The outcome of an experiment can be one of several known possible outcomes, and importantly, the outcome of one experiment depends only on the experiment conducted immediately before it. Before introducing a formal model for Markov chains, let’s look at an example.

Example 1 (Weather Prediction)

Suppose you would like to predict the weather in your city. Looking at local weather records over the past 10 years, you notice that:

If today is sunny, tomorrow is sunny 70% of the time and cloudy 30% of the time.
If today is cloudy, tomorrow is cloudy 80% of the time and sunny 20% of the time.

Now, suppose today is sunny. What is the probability¹ that the weather 8 days from now will also be sunny?

¹ You will learn how to properly define probabilities in ESE 3010. For our purposes, you can think of it as confidences or likelihood. So saying that 8 days from now will be sunny with probability $60\%$ is the same as saying that the weather 10 days from now is determined by flipping a biased coin that comes up “sunny” $60\%$ of the time and “cloudy” 40% of the time.

To formulate this problem mathematically, let’s use $S(k)$ to denote the probability that day $k$ is sunny and $C(k)$ the probability that it is cloudy. If these are the only two possibilities, then the individual probabilities must sum to 1 (1 represents 100% likely, .5 50% likely, etc.): $S(k) + C(k) = 1$ .

According to our historical data, the probability that day $k+1$ is sunny or cloudy can be expressed as:

S(k+1) = .7 S(k) + .2 C(k), \quad C(k+1) = .3 S(k) + .8 C(k)

(1)

For example, the equation says that if day $k$ was sunny, i.e., $S(k)=1$ and $C(k)=0$ there is a 70% chance day $k+1$ is too; similarly, if day $k$ was cloudy, i.e., $S(k)=0$ and $C(k)=1$ , there is a 40% chance day $k+1$ is sunny.

We rewrite (1) as the linear iterative system $\vv x(k+1) = P \vv x(k)$ , where

P = \bm .7 & .2 \\ .3 & .8 \em \ \text{and}\ \ \vv x(k) = \bm S(k) \\ C(k) \em

(2)

We use $P$ instead of $T$ here as this is a typical convention for describing the transition matrix of a Markov chain. The vector $\vv x(k)$ is called the $k^{th}$ state vector.

Now, given that today is sunny, i.e., that $S(0) = 1$ and $C(0) = 0$ , what is the probability that 8 days from now is sunny? We can answer this easily by iterating the system $\vv x(k+1) = P \vv x(k)$ to compute $\vv x(8)$ !

\begin{align*} \vv x(0) &= \bm 1 \\ 0 \em \vv x(1) = P \vv x(0) = \bm .7 \\ .3 \em, \ \vv x(2) = P \vv x(1) = P^2 \vv x(0) \approx \bm .55 \\ .45 \em \\ \vv x(3) &= P^3 \vv x(0) \approx \bm .475 \\ .525 \em, \ \vv x(4) \approx \bm .438 \\ .562 \em, \ \vv x(5) \approx \bm .419 \\ .581 \em, \ \vv x(6) \approx \bm .410 \\ .590 \em \\ \vv x(7) &\approx \bm .405 \\ .595 \em, \ \vv x(8) \approx \bm .402 \\ .598 \em \end{align*}

(3)

So we conclude that 40.2% of the time, if today is sunny, then 8 days from now is also sunny.

4Convergence in Markov Chains¶

Let’s try to understand why the convergence in Example 1 happens, and then we’ll look at some interesting applications of Markov chains.

Our starting point is a general definition of a probability vector.

In general, a Markov chain is given by the first order linear iterative system

\vv x(k+1) = P \vv x(k) \quad (\text{MC})

(4)

whose initial state $\vv x(0)$ is a probability vector. The entries of the transition matrix $P$ must satisfy

0 \leq p_{ij} \leq 1 \ \text{and} \ p_{1j} + \cdots + p_{nj} = 1. \quad (\text{TM})

(5)

for all $i,j=1,\ldots,n$ . The entry $p_{ij}$ is the transition probability that the system will switch from state $j$ to state $i$ . Because this covers all possible transitions, this means each column sums to 1. Under these conditions, we can guarantee that if $\vv x(k)$ is a probability vector, so is $\vv x(k+1) = P\vv x(k)$ . To see this, note that $\vv 1^{\top} P = \bm \vv 1^{\top} \vv p_1 & \cdots & 1^{\top} \vv p_n\em = \bm 1 & \cdots & 1 \em = \vv 1^{\top}$ so that $\vv 1^{\top} \vv x(k+1) = \vv 1^{\top} P\vv x(k) = \vv 1^{\top} \vv x(k) = 1$ . That $\vv x(k+1)$ is entrywise non-negative follows from $P$ and $\vv x(k)$ being entry-wise non-negative.

Next, let’s investigate convergence properties. We first need to impose a very mild technical condition on the transition matrix $P$ , namely we assume that it is regular.

The long-term behavior of a Markov chain with regular transition matrix $P$ is governed by the Perron-Frobenius theorem, which we state next. The proof is quite involved, so we won’t cover it, but if you’re curious, check out the end of ALA 9.3.

This is a very exciting development! It tells us that we can understand the long-term behavior of a regular Markov chain by solving for the eigenvector $\vv x^*$ associated with the eigenvalue $\lambda_1=1$ of $P$ .

Returning to our weather prediction example, we compute the steady state probability vector $\vv x^*$ by just solving $(P-I)\vv v=\vv 0$ :

(P-I)\vv v = \bm -.3 & .2 \\ .3 & -.2 \em \bm v_1 & v_2 \em = \vv 0 => v_1 = \frac{2}{3} v_2 \Rightarrow \vv v = \bm \frac{2}{3} \\ 1 \em

(6)

and then normalizing $\vv v$ so that its entries add up to 1:

\vv x^* = \frac{1}{1+\frac{2}{3}} \bm \frac{2}{3} \\ 1 \em = \bm \frac{2}{5} \\ \frac{3}{5} \em = \bm 0.4 \\ 0.6 \em

(7)

This special eigenvector $\vv x^*$ tells us that no matter the initial state $\vv x(0)$ , the long term behavior is that we are in State 1 (sunny) 40% of days and State 2 (cloudy) 60% of days.

Example 2 (Get out the vote!)

Suppose the voting results of a congressional election at a certain voting precinct are represented by a vector $\vv x \in \mathbb{R}^3$ :

\vv x = \bm \% \ \text{voting Democratic (D)} \\ \% \ \text{voting Reupublican (R)} \\ \% \ \text{voting Libertarian (LD)} \em

(8)

We record the outcome of this election every two years by a vector of this type, and let’s assume that the outcome of one election depends only on results of the previous one. Then the sequence $\vv x(k)$ of vectors that describe the votes in each election form a Markov chain. Suppose, using historical data, we estimate the following transition matrix P:

\begin{align*} & \quad \quad \text{From:} \\ & \quad \begin{matrix} \text{D} & \text{R} & \text{L} & \text{To:} \end{matrix} \\ P =& \bm .7 & .1 & .3 \\ .2 & .8 & .3 \\ .1 & .1 & .4 \em \quad \begin{matrix} \text{D} \\ \text{R} \\ \text{L} \end{matrix} \end{align*}

(9)

The entries in the first column, labeled D, describe what % of persons who voted D in the last election will vote D, R, and L in this one: in this example, 70% of prior D voters will vote D again, 20% will vote R, and 10% will vote L.

If we assume that $P$ remains fixed across many elections, we can predict not only the next election’s results, but long-term election results as well. For example, if last election had results:

\vv x(0) = \bm .55 \\ .40 \\ .05 \em

(10)

then the next election will have a likely outcome of

\vv x(1) = P \vv x(0) = \bm .44 \\ .445 \\ .115 \em

(11)

and the following election will have likely outcome

\vv x(2) = P \vv x(1) = \bm .387 \\ .4785 \\ .1345 \em

(12)

In the long run, we expect vectors converge to the steady state distribution $\vv x^*$ satisfying $\vv x^* = P\vv x^*$ , which we obtain by solving:

(P - I)\vv = \vv 0

(13)

and setting $\vv x^* = \frac{1}{\v 1^{\top} \vv v} \vv v$ . In this case, this works out to:

\vv x^* \approx \begin{bmatrix} 0.321 \\ 0.536 \\ 0.143 \end{bmatrix}

(14)

which informs that assuming voter patterns do not change, 32.1% of voters will go to D, 53.6% to R, and 14.5% to I in this precinct.