Line 103: | Line 103: | ||
* <code>A = [4 5; 6 5]</code> - needs only 6 steps, ratio is 0.1 | * <code>A = [4 5; 6 5]</code> - needs only 6 steps, ratio is 0.1 | ||
* <code>A = [-4 10; 7 5]</code> - needs 68 steps, ratio is 0.9 | * <code>A = [-4 10; 7 5]</code> - needs 68 steps, ratio is 0.9 | ||
− | |||
− | |||
− | |||
− | |||
− | |||
Line 120: | Line 115: | ||
* repeat power iteration on $A_2$ to find $\mathbf q_2$ and $\lambda_2$ | * repeat power iteration on $A_2$ to find $\mathbf q_2$ and $\lambda_2$ | ||
* continue like this for $\lambda_3, \ ... \ , \lambda_n$ | * continue like this for $\lambda_3, \ ... \ , \lambda_n$ | ||
+ | |||
+ | |||
+ | === Inverse Iteration === | ||
+ | Inverse iteration - to find the smallest eigenvalue | ||
+ | http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf | ||
+ | |||
+ | |||
+ | === Shifts === | ||
+ | We can focus at any other eigenvalue by shifting our matrix | ||
+ | * ''Shifting'' is the procedure of adding some value $s$ to the main diagonal: $A + s I$ | ||
+ | * let $\mu$ be the smallest ev of the shifted $A$ | ||
+ | ** i.e. $(A + s I) \mathbf x = \mu \mathbf x$ | ||
+ | ** or $(A + s I) \mathbf x - \mu \mathbf x = (A + s I - \mu I) \mathbf x = 0$ | ||
+ | ** and the matrix $A + s I - \mu I$ is singular | ||
+ | * since $A + s I - \mu I$ is singular, $\lambda = \mu - s$ is the eigenvalue of $A$ and it's the value closest to $s$ | ||
+ | * So shifting allows to focus on eigenvalues closest to any particular value | ||
+ | |||
+ | |||
+ | Eigenvectors are not affected by shifts: | ||
+ | * suppose $A \mathbf x = \lambda \mathbf x$ | ||
+ | * let's shift by $m$: $(A + m I) \mathbf x = A \mathbf x + m \mathbf x = \lambda \mathbf x + m \mathbf x = (\lambda + m) \mathbf x$ | ||
+ | * for the shifted $A$, the eigenvalue is $\lambda + m$, and the eigenvector stays the same | ||
+ | |||
=== Orthogonal Iteration === | === Orthogonal Iteration === | ||
− | Orthogonal Iteration is a block version of the Power Method, also sometimes called "Simultaneous (Power) Interation" | + | Idea: |
+ | * we know that other eigenvectors are orthogonal to the dominant one | ||
+ | * so we can use the power method, and force that the second vector is orthogonal to the first one | ||
+ | * this way we guarante that they will converge to two different eigenvectors | ||
+ | * we can do this for many vectors, not just two | ||
+ | * this is called "Orthogonal Iteration" | ||
+ | |||
+ | Orthogonal Iteration | ||
+ | * Orthogonal Iteration is a block version of the Power Method, also sometimes called "Simultaneous (Power) Interation" | ||
* Instead of multiplying $A$ by just one vector, we multiply by multiple vectors $\{ q_1, ... q_r \}$, which we put in a matrix $Q$. | * Instead of multiplying $A$ by just one vector, we multiply by multiple vectors $\{ q_1, ... q_r \}$, which we put in a matrix $Q$. | ||
* At each step we re-normalize the vectors, e.g. with [[QR Decomposition]] | * At each step we re-normalize the vectors, e.g. with [[QR Decomposition]] | ||
− | |||
+ | Algorithm: | ||
* choose $Q_0$ such that $Q_0^T Q_0 = I$ | * choose $Q_0$ such that $Q_0^T Q_0 = I$ | ||
* for $k = 1, 2, ...$: | * for $k = 1, 2, ...$: | ||
Line 144: | Line 170: | ||
=== Implementation === | === Implementation === | ||
+ | Python with numpy: | ||
def simultaneous_power_iteration(A, k): | def simultaneous_power_iteration(A, k): | ||
Line 151: | Line 178: | ||
Q_prev = Q | Q_prev = Q | ||
− | + | for i in range(1000): | |
Z = A.dot(Q) | Z = A.dot(Q) | ||
Q, R = np.linalg.qr(Z) | Q, R = np.linalg.qr(Z) | ||
+ | # can use other stopping criteria as well | ||
err = ((Q - Q_prev) ** 2).sum() | err = ((Q - Q_prev) ** 2).sum() | ||
if i % 10 == 0: | if i % 10 == 0: | ||
Line 163: | Line 191: | ||
break | break | ||
− | + | return np.diag(R), Q | |
− | + | ||
− | + | ||
− | |||
− | |||
− | |||
− | |||
− | |||
== Algorithms based on Power Iteration == | == Algorithms based on Power Iteration == | ||
− | |||
− | |||
− | |||
− | |||
=== [[QR Algorithm]] === | === [[QR Algorithm]] === | ||
Simultaneous Iteration typically takes a while to converge | Simultaneous Iteration typically takes a while to converge | ||
* Another, better way to do it is QR algorithm. | * Another, better way to do it is QR algorithm. | ||
+ | * It's based on Simultaneous Iteration algorithm | ||
+ | |||
=== [[Lanczos Algorithm]] === | === [[Lanczos Algorithm]] === | ||
Line 187: | Line 206: | ||
* It's based on the power iteration algorithm | * It's based on the power iteration algorithm | ||
* And can be used to find eigenvalues and eigenvectors | * And can be used to find eigenvalues and eigenvectors | ||
+ | |||
+ | |||
+ | === Applications === | ||
+ | * [[Principal Component Analysis]] | ||
+ | * [[Stochastic Matrices]] and [[PageRank]] | ||
+ | |||
Power Iteration is a Linear Algebra method for approximating the dominant Eigenvalues and Eigenvectors of a matrix
Suppose $A$ is symmetric
Dominant eigenvalues and eigenvectors
Algorithm
For large powers of $k$, we will obrain a good approximation of the dominant eigenvector
Finding the eigenvalue:
Performance:
if the power method generates a good approximation of the eigenvector, the approximation of the eigenvalue is also good
The values produced by the power method are quite large:
Python with Numpy:
def eigenvalue(A, v): Av = A.dot(v) return v.dot(Av) def power_iteration(A): n, d = A.shape v = np.ones(d) / np.sqrt(d) ev = eigenvalue(A, v) while True: Av = A.dot(v) v_new = Av / np.linalg.norm(Av) ev_new = eigenvalue(A, v_new) if np.abs(ev - ev_new) < 0.01: break v = v_new ev = ev_new return ev_new, v_new
Theorem:
Proof:
$\square$
From the proof we can also see that if the ratio $\lambda_2 / \lambda_1$ is small, it will converge quickly, but if it's not - it will take many iterations
Examples:
A = [4 5; 6 5]
- needs only 6 steps, ratio is 0.1A = [-4 10; 7 5]
- needs 68 steps, ratio is 0.9
We can just remove the dominant direction from the matrix and repeat
So:
Inverse iteration - to find the smallest eigenvalue http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf
We can focus at any other eigenvalue by shifting our matrix
Eigenvectors are not affected by shifts:
Idea:
Orthogonal Iteration
Algorithm:
Let's look at the sequence of $\{ R_k \}$:
Python with numpy:
def simultaneous_power_iteration(A, k): n, m = A.shape Q = np.random.rand(n, k) Q, _ = np.linalg.qr(Q) Q_prev = Q for i in range(1000): Z = A.dot(Q) Q, R = np.linalg.qr(Z) # can use other stopping criteria as well err = ((Q - Q_prev) ** 2).sum() if i % 10 == 0: print(i, err) Q_prev = Q if err < 1e-3: break return np.diag(R), Q
Simultaneous Iteration typically takes a while to converge
Lanczos algorithm is used for tridiagonalization of a matrix $A$