ML Wiki

Stochastic Matrices

Stochastic matrices (or Markov matrices) - matrices used to describe transitions in Markov Chains

A stochastic matrix is a matrix $A$ which

• is square $n \times n$
• for all entires $0 \leqslant a_{ij} \leqslant 1$
• sum over columns is 1

Properties

• for any $k$, $A^k$ is also stochastic
• its largest eigenvalue is $\lambda_1 = 1$
• all other eigenvalues are $| \lambda_i | \leqslant 1$, and usually it's strictly less

$\lambda_1 = 1$

Let take any stochastic $A$

• suppose $\lambda = 1$ is an eigenvalue
• then $A - \lambda I = A - I$ must be singular
• sum of columns of $A$ is 1. But when we subtract $1$ from diagonal, the sum is 0 for all columns
• so columns are now linearly dependent, and it means that rows are also linearly dependent
• then $(1,1,1) \in N(A^T)$ (left Nullspace of $A$)
• and the Nullspace $N(A)$ also contains something: it contains the eigenvector $\mathbf v_1$ that corresponds to $\lambda_1 = 0$

If $\lambda_1 < 0$, then $A^k$ for large $k$ will converge to $\mathbf O$ - a matrix with all zeros.

Eigenvalues

• for stochastic matrices, eigenvalues of $A$ are the same as eigenvalues of $A^T$
• $\text{det } (A - \lambda I) = 0$
• $\text{det } (A - \lambda I)^T = \text{det } (A^T - \lambda I) = 0$

Recurrent Equation

• $\mathbf u_k = A^k \mathbf u_0$
• Let's use the eigenvectors $\mathbf v_1 , \ ... \ , \mathbf v_n$ of $A$ as basis
• then $\mathbf u_k = A^k \mathbf u_0 = c_1 \lambda_1^k \mathbf v_1 + c_2 \lambda_2^k \mathbf v_2 + \ ... \ c_n \lambda_n^k \mathbf v_n$
• $\lambda_1 = 1$ and the rest are less than 1, so for large $k$ we have
• $\mathbf u_k \approx c_1 \mathbf v_1$
• $c_1 \mathbf v_1$ is the steady state
• This an application of the Power Iteration method

Example

Example 1

• $A = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix}$
• $\lambda_1 = 1, \lambda_2 = 0.5$
• eigenvectors $\mathbf v_1 = (0.6, 0.4)$ and $(1, -1)$
• let's apply Eigendecomposition of $A$:
• $A = S \Lambda S^{-1}$: $\begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix} = \begin{bmatrix} 0.6 & 1 \\ 0.4 & -1 \end{bmatrix} \, \begin{bmatrix} 1 & 0 \\ 0 & 0.5 \end{bmatrix} \, \begin{bmatrix} 1 & 1 \\ 0.4 & -0.6 \end{bmatrix}$

• $A^2$ has the same $S$: $A^2 = S \Lambda S^{-1} \, S \Lambda S^{-1} = S \Lambda^2 S^{-1}$
• $A^k = S \Lambda^k S^{-1}$
• Thus, $A^k = \begin{bmatrix} 0.6 & 1 \\ 0.4 & -1 \end{bmatrix} \, \begin{bmatrix} 1^k & 0 \\ 0 & (0.5)^k \end{bmatrix} \, \begin{bmatrix} 1 & 1 \\ 0.4 & -0.6 \end{bmatrix}$

• as $k$ increases, $0.5^k$ becomes very small, so
• as $k \to \infty$, $A^k \to \begin{bmatrix} 0.6 & 1 \\ 0.4 & -1 \end{bmatrix} \, \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \, \begin{bmatrix} 1 & 1 \\ 0.4 & -0.6 \end{bmatrix} = \begin{bmatrix} 0.6 & 0.6 \\ 0.4 & 0.4 \end{bmatrix}$

Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".