Line 57: | Line 57: | ||
− | == [[Column Space]]s == | + | === [[Column Space]]s === |
Claim: The column space of $A$ does not change when we orthogonalize it | Claim: The column space of $A$ does not change when we orthogonalize it | ||
Line 66: | Line 66: | ||
* e.g. $\mathbf v_3 = \mathbf c - \alpha_1 \mathbf v_1 - \alpha_2 \mathbf v_2 = \mathbf c - \alpha_1\mathbf a - \alpha_2 \cdot \left(\mathbf b - \alpha_3 \mathbf a \right) = \mathbf c - \alpha_1 \mathbf a - \alpha_2 \mathbf b - \alpha_2 \alpha_3 \mathbf a$ | * e.g. $\mathbf v_3 = \mathbf c - \alpha_1 \mathbf v_1 - \alpha_2 \mathbf v_2 = \mathbf c - \alpha_1\mathbf a - \alpha_2 \cdot \left(\mathbf b - \alpha_3 \mathbf a \right) = \mathbf c - \alpha_1 \mathbf a - \alpha_2 \mathbf b - \alpha_2 \alpha_3 \mathbf a$ | ||
* $\alpha_1 = \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1}, \alpha_2 = \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2}, \alpha_3 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1}$ are just scalars | * $\alpha_1 = \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1}, \alpha_2 = \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2}, \alpha_3 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1}$ are just scalars | ||
+ | |||
+ | |||
+ | === Implementation === | ||
+ | ==== Java ==== | ||
+ | The implementation is taken from Smile: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/projection/RandomProjection.java | ||
+ | |||
+ | double[][] X; | ||
+ | Math.unitize(X[0]); | ||
+ | for (int i = 1; i < p; i++) { | ||
+ | for (int j = 0; j < i; j++) { | ||
+ | double t = -Math.dot(X[i], X[j]); | ||
+ | Math.axpy(t, X[j], X[i]); | ||
+ | } | ||
+ | Math.unitize(X[i]); | ||
+ | } | ||
+ | |||
+ | where | ||
+ | * <code>Math.unitize</code> unit-normalizes the vector | ||
+ | * <code>Math.axpy</code> updates an array by adding a multiple of another array y = a * x + y. | ||
Line 81: | Line 100: | ||
* since $Q^T Q = I$, we have $R = Q^T A$ | * since $Q^T Q = I$, we have $R = Q^T A$ | ||
* let $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \ \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \ \mathop{\mathbf q_n}\limits_|^| \Bigg]$ | * let $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \ \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \ \mathop{\mathbf q_n}\limits_|^| \Bigg]$ | ||
− | * thus we have | + | * thus we have <math> |
+ | R = Q^T A = \begin{bmatrix} | ||
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\ | \mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\ | ||
\mathbf q_2^T \mathbf a_1 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\ | \mathbf q_2^T \mathbf a_1 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\ | ||
\vdots & \vdots & \ddots & \vdots \\ | \vdots & \vdots & \ddots & \vdots \\ | ||
\mathbf q_n^T \mathbf a_1 & \mathbf q_n^T \mathbf a_2 & \cdots & \mathbf q_n^T \mathbf a_n \\ | \mathbf q_n^T \mathbf a_1 & \mathbf q_n^T \mathbf a_2 & \cdots & \mathbf q_n^T \mathbf a_n \\ | ||
− | \end{bmatrix} | + | \end{bmatrix} |
+ | </math> | ||
Line 93: | Line 114: | ||
* then we took $\mathbf q_1$ and $\mathbf q_2$ and made all $\mathbf a_3, ..., \mathbf a_n$ orthogonal to them, so $\mathbf q_3^T \mathbf a_2 = \ ... \ = \mathbf q_n^T \mathbf a_2 = 0$ | * then we took $\mathbf q_1$ and $\mathbf q_2$ and made all $\mathbf a_3, ..., \mathbf a_n$ orthogonal to them, so $\mathbf q_3^T \mathbf a_2 = \ ... \ = \mathbf q_n^T \mathbf a_2 = 0$ | ||
* proceeding this way till the end we see that $R$ is upper-diagonal: | * proceeding this way till the end we see that $R$ is upper-diagonal: | ||
− | * thus we have | + | * thus we have <math> |
+ | R = Q^T A = \begin{bmatrix} | ||
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\ | \mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\ | ||
0 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\ | 0 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\ | ||
\vdots & \vdots & \ddots & \vdots \\ | \vdots & \vdots & \ddots & \vdots \\ | ||
0 & 0 & \cdots & \mathbf q_n^T \mathbf a_n \\ | 0 & 0 & \cdots & \mathbf q_n^T \mathbf a_n \\ | ||
− | \end{bmatrix} | + | \end{bmatrix} |
+ | </math> | ||
Line 110: | Line 133: | ||
− | == Usage == | + | === Usage === |
This is often used for [[Linear Least Squares]] - to make the [[Normal Equation]] faster | This is often used for [[Linear Least Squares]] - to make the [[Normal Equation]] faster | ||
− | == Implementation code == | + | === Implementation code === |
− | === Python === | + | ==== Python ==== |
Implementation of the pseudo-code from the Strang's book: | Implementation of the pseudo-code from the Strang's book: | ||
Line 122: | Line 145: | ||
def qr_factorization(A): | def qr_factorization(A): | ||
− | + | m, n = A.shape | |
− | + | Q = np.zeros((m, n)) | |
− | + | R = np.zeros((n, n)) | |
+ | |||
+ | for j in range(n): | ||
+ | v = A[:, j] | ||
− | + | for i in range(j - 1): | |
− | + | q = Q[:, i] | |
− | + | R[i, j] = q.dot(v) | |
− | + | v = v - R[i, j] * q | |
− | + | ||
− | + | ||
− | + | ||
− | + | norm = np.linalg.norm(v) | |
− | + | Q[:, j] = v / norm | |
− | + | R[j, j] = norm | |
− | + | return Q, R | |
A = np.random.rand(13, 10) * 1000 | A = np.random.rand(13, 10) * 1000 |
$\require{cancel}$
In Linear Algebra, Gram-Schmidt process is a method for orthogonalization:
Suppose we have two linearly independent vectors $\mathbf a$ and $\mathbf b$
How do we produce a vector $\mathbf v_2$ from $\mathbf b$ such that $\mathbf v_2 \; \bot \; \mathbf v_1 = \mathbf a$
To find the final solution, we just normalize $\mathbf v_1$ and $\mathbf v_2$:
What if we have 3 vectors $\mathbf a, \mathbf b, \mathbf c$?
To get $\mathbf q_1, \mathbf q_2, \mathbf q_3$, we just normalize:
File:Gram-Schmidt orthonormalization process.gif
Source: http://en.wikipedia.org/wiki/File:Gram-Schmidt_orthonormalization_process.gif
Claim: The column space of $A$ does not change when we orthogonalize it
Suppose that we take a matrix $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \ \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and orthogonalize its columns into $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \ \mathop{\mathbf q_n}\limits_|^| \Bigg]$
The implementation is taken from Smile: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/projection/RandomProjection.java
double[][] X; Math.unitize(X[0]); for (int i = 1; i < p; i++) { for (int j = 0; j < i; j++) { double t = -Math.dot(X[i], X[j]); Math.axpy(t, X[j], X[i]); } Math.unitize(X[i]); }
where
Math.unitize
unit-normalizes the vectorMath.axpy
updates an array by adding a multiple of another array y = a * x + y.
We can think of the Gram-Schmidt Process in the matrix language (like for Gaussian Elimination that brings us to LU Factorization)
How to find this $R$?
Now recall the way we constructed $Q$
Note that for the diagonal elements of $R$, $\mathbf q_i^T \mathbf a_i = \| \mathbf v_i \|$
This is often used for Linear Least Squares - to make the Normal Equation faster
Implementation of the pseudo-code from the Strang's book:
import numpy as np def qr_factorization(A): m, n = A.shape Q = np.zeros((m, n)) R = np.zeros((n, n)) for j in range(n): v = A[:, j] for i in range(j - 1): q = Q[:, i] R[i, j] = q.dot(v) v = v - R[i, j] * q norm = np.linalg.norm(v) Q[:, j] = v / norm R[j, j] = norm return Q, R A = np.random.rand(13, 10) * 1000 Q, R = qr_factorization(A) Q.shape, R.shape np.abs((A - Q.dot(R)).sum()) < 1e-6