ML Wiki

Power Iteration

Power Iteration is a Linear Algebra method for approximating the dominant Eigenvalues and Eigenvectors of a matrix

Suppose $A$ is symmetric

then Eigendecomposition of $A$ is $A = Q \Lambda Q^T$
and $A^k = Q \Lambda^k Q^T$
let $\mathbf q_i$ be the columns of $Q$

Dominant eigenvalues and eigenvectors

$\lambda_1$ is the dominant eigenvalue of $A$ if $|\lambda_1| > |\lambda_i|$ for all $i = 2, ...$
The corresponding eigenvector $\mathbf q_1$ is also called dominant

The Power Method

Algorithm

initial approximation - random unit vector $x_0$
$x_1 = A x_0$
$x_2 = A A x_0 = A^2 x_0$
$x_3 = A A A x_0 = A^3 x_0$
...
until converges

For large powers of $k$, we will obrain a good approximation of the dominant eigenvector

Finding the eigenvalue:

if $v$ is an eigenvector of $A$, then its eigenvalue is
$\lambda = \cfrac{v^T A^T v}{v^T v} = \cfrac{(A v)^T v}{v^T v}$
This is called Rayleigh Quotent
proof: suppose $\lambda$ is eigenvalue of $A$ and $v$ is its eigenvector
$\cfrac{(A v)^T v}{v^T v} = \cfrac{\lambda \, v^T v}{v^T v} = \lambda$

Performance:

computing $A \cdot A^{k-1}$ is costly
go the other direction:
use recurrent relation $x_{k} = A x_{k-1}$

if the power method generates a good approximation of the eigenvector, the approximation of the eigenvalue is also good

Normalization

The values produced by the power method are quite large:

better if we scale them down
should scale the values at each iteartion
for example, can find the largest component of $x_k$ and divide by it
more commonly, we just unit-normalize it

Implementation

Python with Numpy:

def eigenvalue(A, v):
    Av = A.dot(v)
    return v.dot(Av)

def power_iteration(A):
    n, d = A.shape

    v = np.ones(d) / np.sqrt(d)
    ev = eigenvalue(A, v)

    while True:
        Av = A.dot(v)
        v_new = Av / np.linalg.norm(Av)

        ev_new = eigenvalue(A, v_new)
        if np.abs(ev - ev_new) < 0.01:
            break

        v = v_new
        ev = ev_new

   return ev_new, v_new

Convergence of the Power Method

Theorem:

if $A$ is diagonalizable and has dominant eigenvalue,
then power iteration sequence $Ax, A^2 x, A^3 x, ...$ converges to the dominant eigenvector (scaled)

Proof:

since $A$ is diagonalizable, it has $n$ linearly independent eigenvectors $v_1, ..., v_n$ and corresponding eigenvalues $\lambda_1, ..., \lambda_n$.
let $v_1$ and $\lambda_1$ be dominant.
because the eigenvectors are independent, they form a basis
- so any vector $x_0$ can be represented as $x_0 = c_1 v_1 + ... + c_n v_n$
- and $c_1$ must be non zero - otherwise $x_0$ is not a good choice, and we need to choose another initial vector
let's multiply both sides by $A$:
- $A x_0 = c_1 A v_1 + ... + c_n A v_n = c_1 \lambda_1 v_1 + ... + c_n \lambda_n v_n$
repeat that $k$ times:
- $A^k x_0 = c_1 \lambda_1^k v_1 + ... + c_n \lambda_n^k v_n$
- or $A^k x_0 = \lambda_1^k \left[ c_1 v_1 + c_2 \left(\cfrac{\lambda_2}{\lambda_1} \right)^k v_2 ... + c_n \left(\cfrac{\lambda_n}{\lambda_1} \right)^k v_n \right]$
since $\lambda_1$ is dominating, the ratios $\left(\cfrac{\lambda_i}{\lambda_1} \right)^k \to 0$ as $k \to \infty$ for all $i$
so $A^k x_0 = \lambda_1^k c_1 v_1$ and it gets better as $k$ grows

$\square$

From the proof we can also see that if the ratio $\lambda_2 / \lambda_1$ is small, it will converge quickly, but if it's not - it will take many iterations

Examples:

A = [4 5; 6 5] - needs only 6 steps, ratio is 0.1
A = [-4 10; 7 5] - needs 68 steps, ratio is 0.9

Inverse Iteration

Inverse iteration - to find the smallest eigenvalue http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf

Finding Other Eigenvectors

Naive Method

We can just remove the dominant direction from the matrix and repeat

So:

$A = Q \Lambda Q^T$, so $A = \sum_{i = 1}^n \lambda_i \mathbf q_i \mathbf q_i^T$
use power iteration to find $\mathbf q_1$ and $\lambda_1$
then let $A_2 \leftarrow A - \lambda_1 \mathbf q_1 \mathbf q_1^T$
repeat power iteration on $A_2$ to find $\mathbf q_2$ and $\lambda_2$
continue like this for $\lambda_3, \ ... \ , \lambda_n$

Orthogonal Iteration

Orthogonal Iteration is a block version of the Power Method, also sometimes called "Simultaneous (Power) Interation"

Instead of multiplying $A$ by just one vector, we multiply by multiple vectors $\{ q_1, ... q_r \}$, which we put in a matrix $Q$.
At each step we re-normalize the vectors, e.g. with QR Decomposition

Algorithm:

choose $Q_0$ such that $Q_0^T Q_0 = I$
for $k = 1, 2, ...$:
- $Z_k = A Q_{k-1}$
- $Q_k R_k = Z_k$ (QR decomposition)

Let's look at the sequence of $\{ R_k \}$:

$R_k = Q_k^T Z_k = Q_k^T A Q_{k - 1}$
if $\{ Q_k \}$ converges to some $Q$ then $Q^T A Q = R$ is upper triangular
This is a Schur Decomposition of $A$
Thus, the eigenvalues of $A$ are located on the main diagonal of $R$
And the columns of $Q$ are the eigenvectors

Implementation

def simultaneous_power_iteration(A, k):
    n, m = A.shape
    Q = np.random.rand(n, k)
    Q, _ = np.linalg.qr(Q)
    Q_prev = Q
 
   for i in range(1000):
        Z = A.dot(Q)
        Q, R = np.linalg.qr(Z)

        err = ((Q - Q_prev) ** 2).sum()
        if i % 10 == 0:
            print(i, err)

        Q_prev = Q
        if err < 1e-3:
            break

    ev = np.diag(R)
    # or: ev = (A.dot(Q) * Q).sum(axis=0)
    return ev, Q

Deflation

TODO: Power method can be modified to approximate other eigenvalues using "deflation". Deflation - by performing a Householder Tranformation

Algorithms based on Power Iteration

Applications

QR Algorithm

Simultaneous Iteration typically takes a while to converge

Another, better way to do it is QR algorithm.

Lanczos Algorithm

Lanczos algorithm is used for tridiagonalization of a matrix $A$

It's based on the power iteration algorithm
And can be used to find eigenvalues and eigenvectors

Sources

Machine Learning 1 (TUB)
Hopcroft, John, and Ravindran Kannan. "Foundations of Data Science1." (2014) [1]
https://www2.math.ethz.ch/education/bachelor/seminars/fs2008/nas/grasic.pdf
http://college.cengage.com/mathematics/larson/elementary_linear/5e/students/ch08-10/chap_10_3.pdf
http://www.math.usm.edu/lambers/mat610/sum10/lecture14.pdf

ML Wiki

Power Iteration
Revision as of 17:46, 22 June 2017 by Alexey (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Power Iteration

The Power Method

Normalization

Implementation

Convergence of the Power Method

Inverse Iteration

Finding Other Eigenvectors

Naive Method

Orthogonal Iteration

Implementation

Deflation

Algorithms based on Power Iteration

Applications

QR Algorithm

Lanczos Algorithm

Sources

Power IterationRevision as of 17:46, 22 June 2017 by Alexey (Talk | contribs) (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Power Iteration

The Power Method

Normalization

Implementation

Convergence of the Power Method

Inverse Iteration

Finding Other Eigenvectors

Naive Method

Orthogonal Iteration

Implementation

Deflation

Algorithms based on Power Iteration

Applications

QR Algorithm

Lanczos Algorithm

Sources

Power Iteration
Revision as of 17:46, 22 June 2017 by Alexey (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)