ML Wiki

Power Iteration

Power Iteration is a Linear Algebra method for approximating the dominant Eigenvalues and Eigenvectors of a matrix

Suppose $A$ is symmetric

then Eigendecomposition of $A$ is $A = Q \Lambda Q^T$
and $A^k = Q \Lambda^k Q^T$
let $\mathbf q_i$ be the columns of $Q$

Dominant eigenvalues and eigenvectors

$\lambda_1$ is the dominant eigenvalue of $A$ if $|\lambda_1| > |\lambda_i|$ for all $i = 2, ...$
The corresponding eigenvector $\mathbf q_1$ is also called dominant

The Power Method

Algorithm

initial approximation - random unit vector $x_0$
$x_1 = A x_0$
$x_2 = A A x_0 = A^2 x_0$
$x_3 = A A A x_0 = A^3 x_0$
...
until converges

For large powers of $k$, we will obrain a good approximation of the dominant eigenvector

Finding the eigenvalue:

if $v$ is an eigenvector of $A$, then its eigenvalue is
$\lambda = \cfrac{v^T A^T v}{v^T v} = \cfrac{(A v)^T v}{v^T v}$
This is called Rayleigh Quotent
proof: suppose $\lambda$ is eigenvalue of $A$ and $v$ is its eigenvector
$\cfrac{(A v)^T v}{v^T v} = \cfrac{\lambda \, v^T v}{v^T v} = \lambda$

Performance:

computing $A \cdot A^{k-1}$ is costly
go the other direction:
use recurrent relation $x_{k} = A x_{k-1}$

if the power method generates a good approximation of the eigenvector, the approximation of the eigenvalue is also good

Normalization

The values produced by the power method are quite large:

better if we scale them down
should scale the values at each iteartion
for example, can find the largest component of $x_k$ and divide by it
more commonly, we just unit-normalize it

Implementation

Python with Numpy:

def eigenvalue(A, v):
    Av = A.dot(v)
    return v.dot(Av)

def power_iteration(A):
    n, d = A.shape

    v = np.ones(d) / np.sqrt(d)
    ev = eigenvalue(A, v)

    while True:
        Av = A.dot(v)
        v_new = Av / np.linalg.norm(Av)

        ev_new = eigenvalue(A, v_new)
        if np.abs(ev - ev_new) < 0.01:
            break

        v = v_new
        ev = ev_new

   return ev_new, v_new

Convergence of the Power Method

Theorem:

if $A$ is diagonalizable and has dominant eigenvalue,
then power iteration sequence $Ax, A^2 x, A^3 x, ...$ converges to the dominant eigenvector (scaled)

Proof:

since $A$ is diagonalizable, it has $n$ linearly independent eigenvectors $v_1, ..., v_n$ and corresponding eigenvalues $\lambda_1, ..., \lambda_n$.
let $v_1$ and $\lambda_1$ be dominant.
because the eigenvectors are independent, they form a basis
- so any vector $x_0$ can be represented as $x_0 = c_1 v_1 + ... + c_n v_n$
- and $c_1$ must be non zero - otherwise $x_0$ is not a good choice, and we need to choose another initial vector
let's multiply both sides by $A$:
- $A x_0 = c_1 A v_1 + ... + c_n A v_n = c_1 \lambda_1 v_1 + ... + c_n \lambda_n v_n$
repeat that $k$ times:
- $A^k x_0 = c_1 \lambda_1^k v_1 + ... + c_n \lambda_n^k v_n$
- or $A^k x_0 = \lambda_1^k \left[ c_1 v_1 + c_2 \left(\cfrac{\lambda_2}{\lambda_1} \right)^k v_2 ... + c_n \left(\cfrac{\lambda_n}{\lambda_1} \right)^k v_n \right]$
since $\lambda_1$ is dominating, the ratios $\left(\cfrac{\lambda_i}{\lambda_1} \right)^k \to 0$ as $k \to \infty$ for all $i$
so $A^k x_0 = \lambda_1^k c_1 v_1$ and it gets better as $k$ grows

$\square$

From the proof we can also see that if the ratio $\lambda_2 / \lambda_1$ is small, it will converge quickly, but if it's not - it will take many iterations

Examples:

A = [4 5; 6 5] - needs only 6 steps, ratio is 0.1
A = [-4 10; 7 5] - needs 68 steps, ratio is 0.9

Finding Other Eigenvectors

Naive Method

We can just remove the dominant direction from the matrix and repeat

So:

$A = Q \Lambda Q^T$, so $A = \sum_{i = 1}^n \lambda_i \mathbf q_i \mathbf q_i^T$
use power iteration to find $\mathbf q_1$ and $\lambda_1$
then let $A_2 \leftarrow A - \lambda_1 \mathbf q_1 \mathbf q_1^T$
repeat power iteration on $A_2$ to find $\mathbf q_2$ and $\lambda_2$
continue like this for $\lambda_3, \ ... \ , \lambda_n$

Inverse Iteration

Inverse iteration - to find the smallest eigenvalue http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf

Shifts

We can focus at any other eigenvalue by shifting our matrix

Shifting is the procedure of adding some value $s$ to the main diagonal: $A + s I$
let $\mu$ be the smallest ev of the shifted $A$
- i.e. $(A + s I) \mathbf x = \mu \mathbf x$
- or $(A + s I) \mathbf x - \mu \mathbf x = (A + s I - \mu I) \mathbf x = 0$
- and the matrix $A + s I - \mu I$ is singular
since $A + s I - \mu I$ is singular, $\lambda = \mu - s$ is the eigenvalue of $A$ and it's the value closest to $s$
So shifting allows to focus on eigenvalues closest to any particular value

Eigenvectors are not affected by shifts:

suppose $A \mathbf x = \lambda \mathbf x$
let's shift by $m$: $(A + m I) \mathbf x = A \mathbf x + m \mathbf x = \lambda \mathbf x + m \mathbf x = (\lambda + m) \mathbf x$
for the shifted $A$, the eigenvalue is $\lambda + m$, and the eigenvector stays the same

Orthogonal Iteration

Idea:

we know that other eigenvectors are orthogonal to the dominant one
so we can use the power method, and force that the second vector is orthogonal to the first one
this way we guarante that they will converge to two different eigenvectors
we can do this for many vectors, not just two
this is called "Orthogonal Iteration"

Orthogonal Iteration

Orthogonal Iteration is a block version of the Power Method, also sometimes called "Simultaneous (Power) Interation"
Instead of multiplying $A$ by just one vector, we multiply by multiple vectors $\{ q_1, ... q_r \}$, which we put in a matrix $Q$.
At each step we re-normalize the vectors, e.g. with QR Decomposition

Algorithm:

choose $Q_0$ such that $Q_0^T Q_0 = I$
for $k = 1, 2, ...$:
- $Z_k = A Q_{k-1}$
- $Q_k R_k = Z_k$ (QR decomposition)

Let's look at the sequence of $\{ R_k \}$:

$R_k = Q_k^T Z_k = Q_k^T A Q_{k - 1}$
if $\{ Q_k \}$ converges to some $Q$ then $Q^T A Q = R$ is upper triangular
This is a Schur Decomposition of $A$
Thus, the eigenvalues of $A$ are located on the main diagonal of $R$
And the columns of $Q$ are the eigenvectors

Implementation

Python with numpy:

def simultaneous_power_iteration(A, k):
    n, m = A.shape
    Q = np.random.rand(n, k)
    Q, _ = np.linalg.qr(Q)
    Q_prev = Q
 
    for i in range(1000):
        Z = A.dot(Q)
        Q, R = np.linalg.qr(Z)

        # can use other stopping criteria as well 
        err = ((Q - Q_prev) ** 2).sum()
        if i % 10 == 0:
            print(i, err)

        Q_prev = Q
        if err < 1e-3:
            break

    return np.diag(R), Q

Algorithms based on Power Iteration

QR Algorithm

Simultaneous Iteration typically takes a while to converge

Another, better way to do it is QR algorithm.
It's based on Simultaneous Iteration algorithm

Lanczos Algorithm

Lanczos algorithm is used for tridiagonalization of a matrix $A$

It's based on the power iteration algorithm
And can be used to find eigenvalues and eigenvectors

Applications

Sources

Machine Learning 1 (TUB)
Hopcroft, John, and Ravindran Kannan. "Foundations of Data Science1." (2014) [1]
https://www2.math.ethz.ch/education/bachelor/seminars/fs2008/nas/grasic.pdf
http://college.cengage.com/mathematics/larson/elementary_linear/5e/students/ch08-10/chap_10_3.pdf
http://www.math.usm.edu/lambers/mat610/sum10/lecture14.pdf

Power Iteration

Contents

Power Iteration

The Power Method

Normalization

Implementation

Convergence of the Power Method

Finding Other Eigenvectors

Naive Method

Inverse Iteration

Shifts

Orthogonal Iteration

Implementation

Algorithms based on Power Iteration

QR Algorithm

Lanczos Algorithm

Applications

Sources