Power Iteration
Power Iteration is a Linear Algebra method for approximating the dominant Eigenvalues and Eigenvectors of a matrix
Suppose $A$ is symmetric
 then Eigendecomposition of $A$ is $A = Q \Lambda Q^T$
 and $A^k = Q \Lambda^k Q^T$
 let $\mathbf q_i$ be the columns of $Q$
Dominant eigenvalues and eigenvectors
 $\lambda_1$ is the dominant eigenvalue of $A$ if $\lambda_1 > \lambda_i$ for all $i = 2, ...$
 The corresponding eigenvector $\mathbf q_1$ is also called dominant
The Power Method
Algorithm
 initial approximation  random unit vector $x_0$
 $x_1 = A x_0$
 $x_2 = A A x_0 = A^2 x_0$
 $x_3 = A A A x_0 = A^3 x_0$
 ...
 until converges
For large powers of $k$, we will obrain a good approximation of the dominant eigenvector
Finding the eigenvalue:
 if $v$ is an eigenvector of $A$, then its eigenvalue is
 $\lambda = \cfrac{v^T A^T v}{v^T v} = \cfrac{(A v)^T v}{v^T v}$
 This is called Rayleigh Quotent
 proof: suppose $\lambda$ is eigenvalue of $A$ and $v$ is its eigenvector
 $\cfrac{(A v)^T v}{v^T v} = \cfrac{\lambda \, v^T v}{v^T v} = \lambda$
Performance:
 computing $A \cdot A^{k1}$ is costly
 go the other direction:
 use recurrent relation $x_{k} = A x_{k1}$
if the power method generates a good approximation of the eigenvector, the approximation of the eigenvalue is also good
Normalization
The values produced by the power method are quite large:
 better if we scale them down
 should scale the values at each iteartion
 for example, can find the largest component of $x_k$ and divide by it
 more commonly, we just unitnormalize it
Implementation
Python with Numpy:
def eigenvalue(A, v):
Av = A.dot(v)
return v.dot(Av)
def power_iteration(A):
n, d = A.shape
v = np.ones(d) / np.sqrt(d)
ev = eigenvalue(A, v)
while True:
Av = A.dot(v)
v_new = Av / np.linalg.norm(Av)
ev_new = eigenvalue(A, v_new)
if np.abs(ev  ev_new) < 0.01:
break
v = v_new
ev = ev_new
return ev_new, v_new
Convergence of the Power Method
Theorem:
 if $A$ is diagonalizable and has dominant eigenvalue,
 then power iteration sequence $Ax, A^2 x, A^3 x, ...$ converges to the dominant eigenvector (scaled)
Proof:
 since $A$ is diagonalizable, it has $n$ linearly independent eigenvectors $v_1, ..., v_n$ and corresponding eigenvalues $\lambda_1, ..., \lambda_n$.
 let $v_1$ and $\lambda_1$ be dominant.
 because the eigenvectors are independent, they form a basis
 so any vector $x_0$ can be represented as $x_0 = c_1 v_1 + ... + c_n v_n$
 and $c_1$ must be non zero  otherwise $x_0$ is not a good choice, and we need to choose another initial vector
 let's multiply both sides by $A$:
 $A x_0 = c_1 A v_1 + ... + c_n A v_n = c_1 \lambda_1 v_1 + ... + c_n \lambda_n v_n$
 repeat that $k$ times:
 $A^k x_0 = c_1 \lambda_1^k v_1 + ... + c_n \lambda_n^k v_n$
 or $A^k x_0 = \lambda_1^k \left[ c_1 v_1 + c_2 \left(\cfrac{\lambda_2}{\lambda_1} \right)^k v_2 ... + c_n \left(\cfrac{\lambda_n}{\lambda_1} \right)^k v_n \right]$
 since $\lambda_1$ is dominating, the ratios $\left(\cfrac{\lambda_i}{\lambda_1} \right)^k \to 0$ as $k \to \infty$ for all $i$
 so $A^k x_0 = \lambda_1^k c_1 v_1$ and it gets better as $k$ grows
$\square$
From the proof we can also see that if the ratio $\lambda_2 / \lambda_1$ is small, it will converge quickly, but if it's not  it will take many iterations
Examples:

A = [4 5; 6 5]
 needs only 6 steps, ratio is 0.1

A = [4 10; 7 5]
 needs 68 steps, ratio is 0.9
Finding Other Eigenvectors
Naive Method
We can just remove the dominant direction from the matrix and repeat
So:
 $A = Q \Lambda Q^T$, so $A = \sum_{i = 1}^n \lambda_i \mathbf q_i \mathbf q_i^T$
 use power iteration to find $\mathbf q_1$ and $\lambda_1$
 then let $A_2 \leftarrow A  \lambda_1 \mathbf q_1 \mathbf q_1^T$
 repeat power iteration on $A_2$ to find $\mathbf q_2$ and $\lambda_2$
 continue like this for $\lambda_3, \ ... \ , \lambda_n$
Inverse Iteration
Inverse iteration  to find the smallest eigenvalue
http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf
Shifts
We can focus at any other eigenvalue by shifting our matrix
 Shifting is the procedure of adding some value $s$ to the main diagonal: $A + s I$
 let $\mu$ be the smallest ev of the shifted $A$
 i.e. $(A + s I) \mathbf x = \mu \mathbf x$
 or $(A + s I) \mathbf x  \mu \mathbf x = (A + s I  \mu I) \mathbf x = 0$
 and the matrix $A + s I  \mu I$ is singular
 since $A + s I  \mu I$ is singular, $\lambda = \mu  s$ is the eigenvalue of $A$ and it's the value closest to $s$
 So shifting allows to focus on eigenvalues closest to any particular value
Eigenvectors are not affected by shifts:
 suppose $A \mathbf x = \lambda \mathbf x$
 let's shift by $m$: $(A + m I) \mathbf x = A \mathbf x + m \mathbf x = \lambda \mathbf x + m \mathbf x = (\lambda + m) \mathbf x$
 for the shifted $A$, the eigenvalue is $\lambda + m$, and the eigenvector stays the same
Orthogonal Iteration
Idea:
 we know that other eigenvectors are orthogonal to the dominant one
 so we can use the power method, and force that the second vector is orthogonal to the first one
 this way we guarante that they will converge to two different eigenvectors
 we can do this for many vectors, not just two
 this is called "Orthogonal Iteration"
Orthogonal Iteration
 Orthogonal Iteration is a block version of the Power Method, also sometimes called "Simultaneous (Power) Interation"
 Instead of multiplying $A$ by just one vector, we multiply by multiple vectors $\{ q_1, ... q_r \}$, which we put in a matrix $Q$.
 At each step we renormalize the vectors, e.g. with QR Decomposition
Algorithm:
 choose $Q_0$ such that $Q_0^T Q_0 = I$
 for $k = 1, 2, ...$:
 $Z_k = A Q_{k1}$
 $Q_k R_k = Z_k$ (QR decomposition)
Let's look at the sequence of $\{ R_k \}$:
 $R_k = Q_k^T Z_k = Q_k^T A Q_{k  1}$
 if $\{ Q_k \}$ converges to some $Q$ then $Q^T A Q = R$ is upper triangular
 This is a Schur Decomposition of $A$
 Thus, the eigenvalues of $A$ are located on the main diagonal of $R$
 And the columns of $Q$ are the eigenvectors
Implementation
Python with numpy:
def simultaneous_power_iteration(A, k):
n, m = A.shape
Q = np.random.rand(n, k)
Q, _ = np.linalg.qr(Q)
Q_prev = Q
for i in range(1000):
Z = A.dot(Q)
Q, R = np.linalg.qr(Z)
# can use other stopping criteria as well
err = ((Q  Q_prev) ** 2).sum()
if i % 10 == 0:
print(i, err)
Q_prev = Q
if err < 1e3:
break
return np.diag(R), Q
Algorithms based on Power Iteration
Simultaneous Iteration typically takes a while to converge
 Another, better way to do it is QR algorithm.
 It's based on Simultaneous Iteration algorithm
Lanczos algorithm is used for tridiagonalization of a matrix $A$
 It's based on the power iteration algorithm
 And can be used to find eigenvalues and eigenvectors
Applications
Sources