(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
== Power Iteration == | == Power Iteration == | ||
− | + | Power Iteration is a [[Linear Algebra]] method for approximating the dominant [[Eigenvalues and Eigenvectors]] of a matrix | |
− | |||
− | |||
Suppose $A$ is symmetric | Suppose $A$ is symmetric | ||
* then [[Eigendecomposition]] of $A$ is $A = Q \Lambda Q^T$ | * then [[Eigendecomposition]] of $A$ is $A = Q \Lambda Q^T$ | ||
Line 9: | Line 7: | ||
* let $\mathbf q_i$ be the columns of $Q$ | * let $\mathbf q_i$ be the columns of $Q$ | ||
− | + | Dominant eigenvalues and eigenvectors | |
− | * | + | * $\lambda_1$ is the dominant eigenvalue of $A$ if $|\lambda_1| > |\lambda_i|$ for all $i = 2, ...$ |
− | + | * The corresponding eigenvector $\mathbf q_1$ is also called dominant | |
− | * | + | |
− | * | + | |
+ | === The Power Method === | ||
+ | Algorithm | ||
+ | * initial approximation - random unit vector $x_0$ | ||
+ | * $x_1 = A x_0$ | ||
+ | * $x_2 = A A x_0 = A^2 x_0$ | ||
+ | * $x_3 = A A A x_0 = A^3 x_0$ | ||
+ | * ... | ||
+ | * until converges | ||
+ | |||
+ | For large powers of $k$, we will obrain a good approximation of the dominant eigenvector | ||
+ | |||
+ | |||
+ | Finding the eigenvalue: | ||
+ | * if $v$ is an eigenvector of $A$, then its eigenvalue is | ||
+ | * $\lambda = \cfrac{v^T A^T v}{v^T v} = \cfrac{(A v)^T v}{v^T v}$ | ||
+ | * This is called [[Rayleigh Quotent]] | ||
+ | * proof: suppose $\lambda$ is eigenvalue of $A$ and $v$ is its eigenvector | ||
+ | * $\cfrac{(A v)^T v}{v^T v} = \cfrac{\lambda \, v^T v}{v^T v} = \lambda$ | ||
Performance: | Performance: | ||
* computing $A \cdot A^{k-1}$ is costly | * computing $A \cdot A^{k-1}$ is costly | ||
− | * | + | * go the other direction: |
− | * $A^k \ | + | * use recurrent relation $x_{k} = A x_{k-1}$ |
− | * | + | |
− | + | ||
+ | if the power method generates a good approximation of the eigenvector, the approximation of the eigenvalue is also good | ||
+ | |||
+ | |||
+ | === Normalization === | ||
+ | The values produced by the power method are quite large: | ||
+ | * better if we scale them down | ||
+ | * should scale the values at each iteartion | ||
+ | * for example, can find the largest component of $x_k$ and divide by it | ||
+ | * more commonly, we just unit-normalize it | ||
+ | |||
+ | |||
+ | === Implementation === | ||
+ | Python with Numpy: | ||
+ | |||
+ | def eigenvalue(A, v): | ||
+ | Av = A.dot(v) | ||
+ | return v.dot(Av) | ||
+ | |||
+ | def power_iteration(A): | ||
+ | n, d = A.shape | ||
+ | |||
+ | v = np.ones(d) / np.sqrt(d) | ||
+ | ev = eigenvalue(A, v) | ||
+ | |||
+ | while True: | ||
+ | Av = A.dot(v) | ||
+ | v_new = Av / np.linalg.norm(Av) | ||
+ | |||
+ | ev_new = eigenvalue(A, v_new) | ||
+ | if np.abs(ev - ev_new) < 0.01: | ||
+ | break | ||
+ | |||
+ | v = v_new | ||
+ | ev = ev_new | ||
+ | |||
+ | return ev_new, v_new | ||
+ | |||
+ | |||
+ | === Convergence of the Power Method === | ||
+ | Theorem: | ||
+ | * if $A$ is diagonalizable and has dominant eigenvalue, | ||
+ | * then power iteration sequence $Ax, A^2 x, A^3 x, ...$ converges to the dominant eigenvector (scaled) | ||
+ | |||
+ | Proof: | ||
+ | * since $A$ is diagonalizable, it has $n$ linearly independent eigenvectors $v_1, ..., v_n$ and corresponding eigenvalues $\lambda_1, ..., \lambda_n$. | ||
+ | * let $v_1$ and $\lambda_1$ be dominant. | ||
+ | * because the eigenvectors are independent, they form a basis | ||
+ | ** so any vector $x_0$ can be represented as $x_0 = c_1 v_1 + ... + c_n v_n$ | ||
+ | ** and $c_1$ must be non zero - otherwise $x_0$ is not a good choice, and we need to choose another initial vector | ||
+ | * let's multiply both sides by $A$: | ||
+ | ** $A x_0 = c_1 A v_1 + ... + c_n A v_n = c_1 \lambda_1 v_1 + ... + c_n \lambda_n v_n$ | ||
+ | * repeat that $k$ times: | ||
+ | ** $A^k x_0 = c_1 \lambda_1^k v_1 + ... + c_n \lambda_n^k v_n$ | ||
+ | ** or $A^k x_0 = \lambda_1^k \left[ c_1 v_1 + c_2 \left(\cfrac{\lambda_2}{\lambda_1} \right)^k v_2 ... + c_n \left(\cfrac{\lambda_n}{\lambda_1} \right)^k v_n \right]$ | ||
+ | * since $\lambda_1$ is dominating, the ratios $\left(\cfrac{\lambda_i}{\lambda_1} \right)^k \to 0$ as $k \to \infty$ for all $i$ | ||
+ | * so $A^k x_0 = \lambda_1^k c_1 v_1$ and it gets better as $k$ grows | ||
+ | |||
+ | $\square$ | ||
+ | |||
+ | |||
+ | From the proof we can also see that if the ratio $\lambda_2 / \lambda_1$ is small, it will converge quickly, but if it's not - it will take many iterations | ||
+ | |||
+ | Examples: | ||
+ | * <code>A = [4 5; 6 5]</code> - needs only 6 steps, ratio is 0.1 | ||
+ | * <code>A = [-4 10; 7 5]</code> - needs 68 steps, ratio is 0.9 | ||
== Finding Other Eigenvectors == | == Finding Other Eigenvectors == | ||
+ | === Naive Method === | ||
+ | We can just remove the dominant direction from the matrix and repeat | ||
+ | |||
+ | So: | ||
* $A = Q \Lambda Q^T$, so $A = \sum_{i = 1}^n \lambda_i \mathbf q_i \mathbf q_i^T$ | * $A = Q \Lambda Q^T$, so $A = \sum_{i = 1}^n \lambda_i \mathbf q_i \mathbf q_i^T$ | ||
* use power iteration to find $\mathbf q_1$ and $\lambda_1$ | * use power iteration to find $\mathbf q_1$ and $\lambda_1$ | ||
Line 32: | Line 117: | ||
+ | === Inverse Iteration === | ||
+ | Inverse iteration - to find the smallest eigenvalue | ||
+ | http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf | ||
− | |||
− | |||
+ | === Shifts === | ||
+ | We can focus at any other eigenvalue by shifting our matrix | ||
+ | * ''Shifting'' is the procedure of adding some value $s$ to the main diagonal: $A + s I$ | ||
+ | * let $\mu$ be the smallest ev of the shifted $A$ | ||
+ | ** i.e. $(A + s I) \mathbf x = \mu \mathbf x$ | ||
+ | ** or $(A + s I) \mathbf x - \mu \mathbf x = (A + s I - \mu I) \mathbf x = 0$ | ||
+ | ** and the matrix $A + s I - \mu I$ is singular | ||
+ | * since $A + s I - \mu I$ is singular, $\lambda = \mu - s$ is the eigenvalue of $A$ and it's the value closest to $s$ | ||
+ | * So shifting allows to focus on eigenvalues closest to any particular value | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Eigenvectors are not affected by shifts: | |
− | + | * suppose $A \mathbf x = \lambda \mathbf x$ | |
− | + | * let's shift by $m$: $(A + m I) \mathbf x = A \mathbf x + m \mathbf x = \lambda \mathbf x + m \mathbf x = (\lambda + m) \mathbf x$ | |
+ | * for the shifted $A$, the eigenvalue is $\lambda + m$, and the eigenvector stays the same | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | === Orthogonal Iteration === | |
− | + | Idea: | |
− | + | * we know that other eigenvectors are orthogonal to the dominant one | |
− | + | * so we can use the power method, and force that the second vector is orthogonal to the first one | |
− | + | * this way we guarante that they will converge to two different eigenvectors | |
+ | * we can do this for many vectors, not just two | ||
+ | * this is called "Orthogonal Iteration" | ||
− | + | Orthogonal Iteration | |
− | + | * Orthogonal Iteration is a block version of the Power Method, also sometimes called "Simultaneous (Power) Interation" | |
+ | * Instead of multiplying $A$ by just one vector, we multiply by multiple vectors $\{ q_1, ... q_r \}$, which we put in a matrix $Q$. | ||
+ | * At each step we re-normalize the vectors, e.g. with [[QR Decomposition]] | ||
− | == | + | Algorithm: |
− | + | * choose $Q_0$ such that $Q_0^T Q_0 = I$ | |
− | + | * for $k = 1, 2, ...$: | |
− | + | ** $Z_k = A Q_{k-1}$ | |
+ | ** $Q_k R_k = Z_k$ (QR decomposition) | ||
− | |||
− | |||
− | |||
− | |||
− | + | Let's look at the sequence of $\{ R_k \}$: | |
− | + | * $R_k = Q_k^T Z_k = Q_k^T A Q_{k - 1}$ | |
− | + | * if $\{ Q_k \}$ converges to some $Q$ then $Q^T A Q = R$ is upper triangular | |
− | + | * This is a [[Schur Decomposition]] of $A$ | |
− | + | * Thus, the eigenvalues of $A$ are located on the main diagonal of $R$ | |
+ | * And the columns of $Q$ are the eigenvectors | ||
− | == Applications == | + | === Implementation === |
+ | Python with numpy: | ||
+ | |||
+ | def simultaneous_power_iteration(A, k): | ||
+ | n, m = A.shape | ||
+ | Q = np.random.rand(n, k) | ||
+ | Q, _ = np.linalg.qr(Q) | ||
+ | Q_prev = Q | ||
+ | |||
+ | for i in range(1000): | ||
+ | Z = A.dot(Q) | ||
+ | Q, R = np.linalg.qr(Z) | ||
+ | |||
+ | # can use other stopping criteria as well | ||
+ | err = ((Q - Q_prev) ** 2).sum() | ||
+ | if i % 10 == 0: | ||
+ | print(i, err) | ||
+ | |||
+ | Q_prev = Q | ||
+ | if err < 1e-3: | ||
+ | break | ||
+ | |||
+ | return np.diag(R), Q | ||
+ | |||
+ | |||
+ | |||
+ | == Algorithms based on Power Iteration == | ||
+ | === [[QR Algorithm]] === | ||
+ | Simultaneous Iteration typically takes a while to converge | ||
+ | * Another, better way to do it is QR algorithm. | ||
+ | * It's based on Simultaneous Iteration algorithm | ||
+ | |||
+ | |||
+ | === [[Lanczos Algorithm]] === | ||
+ | Lanczos algorithm is used for tridiagonalization of a matrix $A$ | ||
+ | * It's based on the power iteration algorithm | ||
+ | * And can be used to find eigenvalues and eigenvectors | ||
+ | |||
+ | |||
+ | === Applications === | ||
* [[Principal Component Analysis]] | * [[Principal Component Analysis]] | ||
* [[Stochastic Matrices]] and [[PageRank]] | * [[Stochastic Matrices]] and [[PageRank]] | ||
+ | |||
== Sources == | == Sources == | ||
− | |||
* [[Machine Learning 1 (TUB)]] | * [[Machine Learning 1 (TUB)]] | ||
+ | * Hopcroft, John, and Ravindran Kannan. "Foundations of Data Science1." (2014) [http://research.microsoft.com/en-US/people/kannan/book-mar-30-2014.pdf] | ||
+ | * https://www2.math.ethz.ch/education/bachelor/seminars/fs2008/nas/grasic.pdf | ||
+ | * http://college.cengage.com/mathematics/larson/elementary_linear/5e/students/ch08-10/chap_10_3.pdf | ||
+ | * http://www.math.usm.edu/lambers/mat610/sum10/lecture14.pdf | ||
[[Category:Linear Algebra]] | [[Category:Linear Algebra]] | ||
[[Category:Python]] | [[Category:Python]] | ||
[[Category:Machine Learning]] | [[Category:Machine Learning]] |
Power Iteration is a Linear Algebra method for approximating the dominant Eigenvalues and Eigenvectors of a matrix
Suppose $A$ is symmetric
Dominant eigenvalues and eigenvectors
Algorithm
For large powers of $k$, we will obrain a good approximation of the dominant eigenvector
Finding the eigenvalue:
Performance:
if the power method generates a good approximation of the eigenvector, the approximation of the eigenvalue is also good
The values produced by the power method are quite large:
Python with Numpy:
def eigenvalue(A, v): Av = A.dot(v) return v.dot(Av) def power_iteration(A): n, d = A.shape v = np.ones(d) / np.sqrt(d) ev = eigenvalue(A, v) while True: Av = A.dot(v) v_new = Av / np.linalg.norm(Av) ev_new = eigenvalue(A, v_new) if np.abs(ev - ev_new) < 0.01: break v = v_new ev = ev_new return ev_new, v_new
Theorem:
Proof:
$\square$
From the proof we can also see that if the ratio $\lambda_2 / \lambda_1$ is small, it will converge quickly, but if it's not - it will take many iterations
Examples:
A = [4 5; 6 5]
- needs only 6 steps, ratio is 0.1A = [-4 10; 7 5]
- needs 68 steps, ratio is 0.9
We can just remove the dominant direction from the matrix and repeat
So:
Inverse iteration - to find the smallest eigenvalue http://ranger.uta.edu/~huber/cse4345/Notes/Eigenvalues.pdf
We can focus at any other eigenvalue by shifting our matrix
Eigenvectors are not affected by shifts:
Idea:
Orthogonal Iteration
Algorithm:
Let's look at the sequence of $\{ R_k \}$:
Python with numpy:
def simultaneous_power_iteration(A, k): n, m = A.shape Q = np.random.rand(n, k) Q, _ = np.linalg.qr(Q) Q_prev = Q for i in range(1000): Z = A.dot(Q) Q, R = np.linalg.qr(Z) # can use other stopping criteria as well err = ((Q - Q_prev) ** 2).sum() if i % 10 == 0: print(i, err) Q_prev = Q if err < 1e-3: break return np.diag(R), Q
Simultaneous Iteration typically takes a while to converge
Lanczos algorithm is used for tridiagonalization of a matrix $A$