Power Iteration
Power Iteration is a Linear Algebra method for approximating the dominant Eigenvalues and Eigenvectors of a matrix
Suppose $A$ is symmetric
 then Eigendecomposition of $A$ is $A = Q \Lambda Q^T$
 and $A^k = Q \Lambda^k Q^T$
 let $\mathbf q_i$ be the columns of $Q$
Dominant eigenvalues and eigenvectors
 $\lambda_1$ is the dominant eigenvalue of $A$ if $\lambda_1 > \lambda_i$ for all $i = 2, ...$
 The corresponding eigenvector $\mathbf q_1$ is also called dominant
The Power Method
Algorithm
 initial approximation  random unit vector $x_0$
 $x_1 = A x_0$
 $x_2 = A A x_0 = A^2 x_0$
 $x_3 = A A A x_0 = A^3 x_0$
 ...
 until converges
For large powers of $k$, we will obrain a good approximation of the dominant eigenvector
Finding the eigenvalue:
 if $v$ is an eigenvector of $A$, then its eigenvalue is
 $\lambda = \cfrac{v^T A^T v}{v^T v} = \cfrac{(A v)^T v}{v^T v}$
 This is called Rayleigh Quotent
 proof: suppose $\lambda$ is eigenvalue of $A$ and $v$ is its eigenvector
 $\cfrac{(A v)^T v}{v^T v} = \cfrac{\lambda \, v^T v}{v^T v} = \lambda$
Performance:
 computing $A \cdot A^{k1}$ is costly
 go the other direction:
 use recurrent relation $x_{k} = A x_{k1}$
if the power method generates a good approximation of the eigenvector, the approximation of the eigenvalue is also good
Normalization
The values produced by the power method are quite large:
 better if we scale them down
 should scale the values at each iteartion
 for example, can find the largest component of $x_k$ and divide by it
 more commonly, we just unitnormalize it
Implementation
Python with Numpy:
def eigenvalue(A, v):
Av = A.dot(v)
return v.dot(Av)
def power_iteration(A):
n, d = A.shape
v = np.ones(d) / np.sqrt(d)
ev = eigenvalue(A, v)
while True:
Av = A.dot(v)
v_new = Av / np.linalg.norm(Av)
ev_new = eigenvalue(A, v_new)
if np.abs(ev  ev_new) < 0.01:
break
v = v_new
ev = ev_new
return ev_new, v_new
Convergence of the Power Method
Theorem:
 if $A$ is diagonalizable and has dominant eigenvalue,
 then power iteration sequence $Ax, A^2 x, A^3 x, ...$ converges to the dominant eigenvector (scaled)
Proof:
 since $A$ is diagonalizable, it has $n$ linearly independent eigenvectors $v_1, ..., v_n$ and corresponding eigenvalues $\lambda_1, ..., \lambda_n$.
 let $v_1$ and $\lambda_1$ be dominant.
 because the eigenvectors are independent, they form a basis
 so any vector $x_0$ can be represented as $x_0 = c_1 v_1 + ... + c_n v_n$
 and $c_1$ must be non zero  otherwise $x_0$ is not a good choice, and we need to choose another initial vector
 let's multiply both sides by $A$:
 $A x_0 = c_1 A v_1 + ... + c_n A v_n = c_1 \lambda_1 v_1 + ... + c_n \lambda_n v_n$
 repeat that $k$ times:
 $A^k x_0 = c_1 \lambda_1^k v_1 + ... + c_n \lambda_n^k v_n$
 or $A^k x_0 = \lambda_1^k \left[ c_1 v_1 + c_2 \left(\cfrac{\lambda_2}{\lambda_1} \right)^k v_2 ... + c_n \left(\cfrac{\lambda_n}{\lambda_1} \right)^k v_n \right]$
 since $\lambda_1$ is dominating, the ratios $\left(\cfrac{\lambda_i}{\lambda_1} \right)^k \to 0$ as $k \to \infty$ for all $i$
 so $A^k x_0 = \lambda_1^k c_1 v_1$ and it gets better as $k$ grows
$\square$
From the proof we can also see that if the ratio $\lambda_2 / \lambda_1$ is small, it will converge quickly, but if it's not  it will take many iterations
Examples:

A = [4 5; 6 5]
 needs only 6 steps, ratio is 0.1

A = [4 10; 7 5]
 needs 68 steps, ratio is 0.9
Inverse Iteration
Inverse iteration  to find the smallest eigenvalue
http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf
Finding Other Eigenvectors
Naive Method
We can just remove the dominant direction from the matrix and repeat
So:
 $A = Q \Lambda Q^T$, so $A = \sum_{i = 1}^n \lambda_i \mathbf q_i \mathbf q_i^T$
 use power iteration to find $\mathbf q_1$ and $\lambda_1$
 then let $A_2 \leftarrow A  \lambda_1 \mathbf q_1 \mathbf q_1^T$
 repeat power iteration on $A_2$ to find $\mathbf q_2$ and $\lambda_2$
 continue like this for $\lambda_3, \ ... \ , \lambda_n$
Orthogonal Iteration
Orthogonal Iteration is a block version of the Power Method, also sometimes called "Simultaneous (Power) Interation"
 Instead of multiplying $A$ by just one vector, we multiply by multiple vectors $\{ q_1, ... q_r \}$, which we put in a matrix $Q$.
 At each step we renormalize the vectors, e.g. with QR Decomposition
Algorithm:
 choose $Q_0$ such that $Q_0^T Q_0 = I$
 for $k = 1, 2, ...$:
 $Z_k = A Q_{k1}$
 $Q_k R_k = Z_k$ (QR decomposition)
Let's look at the sequence of $\{ R_k \}$:
 $R_k = Q_k^T Z_k = Q_k^T A Q_{k  1}$
 if $\{ Q_k \}$ converges to some $Q$ then $Q^T A Q = R$ is upper triangular
 This is a Schur Decomposition of $A$
 Thus, the eigenvalues of $A$ are located on the main diagonal of $R$
 And the columns of $Q$ are the eigenvectors
Implementation
def simultaneous_power_iteration(A, k):
n, m = A.shape
Q = np.random.rand(n, k)
Q, _ = np.linalg.qr(Q)
Q_prev = Q
for i in range(1000):
Z = A.dot(Q)
Q, R = np.linalg.qr(Z)
err = ((Q  Q_prev) ** 2).sum()
if i % 10 == 0:
print(i, err)
Q_prev = Q
if err < 1e3:
break
ev = np.diag(R)
# or: ev = (A.dot(Q) * Q).sum(axis=0)
return ev, Q
Deflation
TODO:
Power method can be modified to approximate other eigenvalues using "deflation".
Deflation  by performing a Householder Tranformation
Algorithms based on Power Iteration
Applications
Simultaneous Iteration typically takes a while to converge
 Another, better way to do it is QR algorithm.
Lanczos algorithm is used for tridiagonalization of a matrix $A$
 It's based on the power iteration algorithm
 And can be used to find eigenvalues and eigenvectors
Sources