|
|
(2 intermediate revisions by the same user not shown) |
Line 68: |
Line 68: |
| | | |
| | | |
− | === Implementation ===
| + | == Implementation == |
− | ==== Java ====
| + | === Java === |
| The implementation is taken from Smile: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/projection/RandomProjection.java | | The implementation is taken from Smile: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/projection/RandomProjection.java |
| | | |
Line 84: |
Line 84: |
| where | | where |
| * <code>Math.unitize</code> unit-normalizes the vector | | * <code>Math.unitize</code> unit-normalizes the vector |
− | * <code>Math.axpy</code> updates an array by adding a multiple of another array y = a * x + y. | + | * <code>Math.axpy</code> updates an array by adding a multiple of another array <code>y = a * x + y</code>. |
| + | * here <code>X[i]</code> is $i$th '''''column''''' of $X$ |
| + | |
| + | === Python === |
| + | |
| + | def normalize(v): |
| + | return v / np.sqrt(v.dot(v)) |
| + | |
| + | n = len(A) |
| + | |
| + | A[:, 0] = normalize(A[:, 0]) |
| + | |
| + | for i in range(1, n): |
| + | Ai = A[:, i] |
| + | for j in range(0, i): |
| + | Aj = A[:, j] |
| + | t = Ai.dot(Aj) |
| + | Ai = Ai - t * Aj |
| + | A[:, i] = normalize(Ai) |
| | | |
| | | |
Line 134: |
Line 152: |
| | | |
| === Usage === | | === Usage === |
− | This is often used for [[Linear Least Squares]] - to make the [[Normal Equation]] faster | + | This decomposition is often used for [[Inverse Matrices|inverting the matrix]] |
| + | * often used for [[Linear Least Squares]] - to make the [[Normal Equation]] faster |
| + | * let $X = A^{-1}$, so want to solve $AX = I$ |
| + | * decompose $QR = X$, so $QRX = I$ |
| + | * multiply both sides by $Q^T$: $RX = Q^T$ |
| + | * since $R$ is upper-triangular, this system is very easy to solve - just use forward substitution as in [[Gaussian Elimination]] |
| + | |
| | | |
| | | |
Latest revision as of 23:46, 15 June 2017
$\require{cancel}$
Gram-Schmidt Process
In Linear Algebra, Gram-Schmidt process is a method for orthogonalization:
2D Case
Suppose we have two linearly independent vectors $\mathbf a$ and $\mathbf b$
- we want to make them orthogonal $\mathbf a \Rightarrow \mathbf v_1$, $\mathbf b \Rightarrow \mathbf v_2$
- and then we normalize them: $\mathbf v_1 \Rightarrow \mathbf q_1$ and $\mathbf v_2 \Rightarrow \mathbf q_2$
-
- $\mathbf a$ is already fine, let's keep its direction for $\mathbf q_1$ as well
- $\mathbf b$ is not fine: we want it to be orthogonal to $\mathbf q_1$
How do we produce a vector $\mathbf v_2$ from $\mathbf b$ such that $\mathbf v_2 \; \bot \; \mathbf v_1 = \mathbf a$
-
- let's project $\mathbf b$ on $\mathbf v_1$, and we get $P \cdot \mathbf v_1 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1$
- $\mathbf e$ is our projection error, so $\mathbf e = \mathbf b - P \mathbf v_1 = \mathbf b - P \mathbf v_1 = \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1$
- note that $\mathbf v_2 = \mathbf e$! it has the same length and the same direction
- so $\mathbf v_2 = \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1$
- interpretation: we take the original vector $\mathbf b$ and remove the projection of this vector onto $\mathbf v_1$, and it leaves only the orthogonal part
- now $\mathbf v_1 \; \bot \mathbf v_2$
- let's check: $\mathbf v_1^T \mathbf v_2 = \mathbf v_1^T \left( \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1 \right) = \mathbf v_1^T \mathbf b - \mathbf v_1^T \cdot \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1 = \mathbf v_1^T \mathbf b - \mathbf v_1^T \mathbf b = 0$
To find the final solution, we just normalize $\mathbf v_1$ and $\mathbf v_2$:
- $\mathbf q_1 = \mathbf v_1 / \| \mathbf v_1 \|$
- $\mathbf q_2 = \mathbf v_2 / \| \mathbf v_2 \|$
3D Case
What if we have 3 vectors $\mathbf a, \mathbf b, \mathbf c$?
- need to find (pair-wise) orthogonal vectors $\mathbf v_1, \mathbf v_2, \mathbf v_3$ and then normalize them
- have $\mathbf v_1 = a$, $\mathbf v_2 = \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \mathbf v_1$
- what about $\mathbf v_3$? Know that we want to have $\mathbf v_3 \; \bot \; \mathbf v_1$ and $\mathbf v_3 \; \bot \; \mathbf v_2$
- for $\mathbf v_3$, we want to subtract its components in direction $\mathbf v_1$ as well as in direction $\mathbf v_2$
- so $\mathbf v_3 = \mathbf c - \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1 - \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2} \cdot \mathbf v_2$
To get $\mathbf q_1, \mathbf q_2, \mathbf q_3$, we just normalize:
- $\mathbf q_1 = \mathbf v_1 / \| \mathbf v_1 \|$
- $\mathbf q_2 = \mathbf v_2 / \| \mathbf v_2 \|$
- $\mathbf q_3 = \mathbf v_3 / \| \mathbf v_3 \|$
3D Case Animation
File:Gram-Schmidt orthonormalization process.gif
Source: http://en.wikipedia.org/wiki/File:Gram-Schmidt_orthonormalization_process.gif
Claim: The column space of $A$ does not change when we orthogonalize it
Suppose that we take a matrix $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \ \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and orthogonalize its columns into $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \ \mathop{\mathbf q_n}\limits_|^| \Bigg]$
- Why $C(A) = C(Q)$?
- at each step of the Gram-Schmidt process we take linear combinations from $C(A)$!
- e.g. $\mathbf v_3 = \mathbf c - \alpha_1 \mathbf v_1 - \alpha_2 \mathbf v_2 = \mathbf c - \alpha_1\mathbf a - \alpha_2 \cdot \left(\mathbf b - \alpha_3 \mathbf a \right) = \mathbf c - \alpha_1 \mathbf a - \alpha_2 \mathbf b - \alpha_2 \alpha_3 \mathbf a$
- $\alpha_1 = \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1}, \alpha_2 = \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2}, \alpha_3 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1}$ are just scalars
Implementation
Java
The implementation is taken from Smile: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/projection/RandomProjection.java
double[][] X;
Math.unitize(X[0]);
for (int i = 1; i < p; i++) {
for (int j = 0; j < i; j++) {
double t = -Math.dot(X[i], X[j]);
Math.axpy(t, X[j], X[i]);
}
Math.unitize(X[i]);
}
where
-
Math.unitize
unit-normalizes the vector
-
Math.axpy
updates an array by adding a multiple of another array y = a * x + y
.
- here
X[i]
is $i$th column of $X$
Python
def normalize(v):
return v / np.sqrt(v.dot(v))
n = len(A)
A[:, 0] = normalize(A[:, 0])
for i in range(1, n):
Ai = A[:, i]
for j in range(0, i):
Aj = A[:, j]
t = Ai.dot(Aj)
Ai = Ai - t * Aj
A[:, i] = normalize(Ai)
QR Factorization
We can think of the Gram-Schmidt Process in the matrix language (like for Gaussian Elimination that brings us to LU Factorization)
- recall that $C(Q) = C(A)$
- because of this, there exists a third matrix $R$ that brings $A$ to $Q$
- or, $A = Q R$
- (so we want Gram-Schmidt applied to the columns of $A$)
How to find this $R$?
- $A = QR$, let's multiply both sides by $Q^T$:
- $Q^T A = Q^T Q R$
- since $Q^T Q = I$, we have $R = Q^T A$
- let $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \ \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \ \mathop{\mathbf q_n}\limits_|^| \Bigg]$
- thus we have [math]
R = Q^T A = \begin{bmatrix}
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\
\mathbf q_2^T \mathbf a_1 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf q_n^T \mathbf a_1 & \mathbf q_n^T \mathbf a_2 & \cdots & \mathbf q_n^T \mathbf a_n \\
\end{bmatrix}
[/math]
Now recall the way we constructed $Q$
- we took $\mathbf q_1 = \mathbf a_1 / \| \mathbf a_1 \|$ and make all other $\mathbf a_i$ orthogonal to it, so $\mathbf q_2^T \mathbf a_1 = \ ... \ = \mathbf q_n^T \mathbf a_1 = 0$
- then we took $\mathbf q_1$ and $\mathbf q_2$ and made all $\mathbf a_3, ..., \mathbf a_n$ orthogonal to them, so $\mathbf q_3^T \mathbf a_2 = \ ... \ = \mathbf q_n^T \mathbf a_2 = 0$
- proceeding this way till the end we see that $R$ is upper-diagonal:
- thus we have [math]
R = Q^T A = \begin{bmatrix}
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\
0 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \mathbf q_n^T \mathbf a_n \\
\end{bmatrix}
[/math]
Note that for the diagonal elements of $R$, $\mathbf q_i^T \mathbf a_i = \| \mathbf v_i \|$
- why?
-
- $\mathbf q_i^T \mathbf a_i = \| \mathbf q_i \| \| \mathbf a_i \| \cos \theta$ by the Dot Product definition
- $\| \mathbf q_i \| = 1$ and $\cos \theta = \cfrac{\| \mathbf v_i \|}{\| \mathbf a_i \|}$, so
- $\mathbf q_i^T \mathbf a_i = \| \mathbf q_i \| \| \mathbf a_i \| \cos \theta = 1 \cdot \cancel{\| \mathbf a_i \|} \cfrac{\| \mathbf v_i \|}{ \cancel{\| \mathbf a_i \|} } = \| \mathbf v_i \|$
- this can be uses to speed up the computation a bit
Usage
This decomposition is often used for inverting the matrix
- often used for Linear Least Squares - to make the Normal Equation faster
- let $X = A^{-1}$, so want to solve $AX = I$
- decompose $QR = X$, so $QRX = I$
- multiply both sides by $Q^T$: $RX = Q^T$
- since $R$ is upper-triangular, this system is very easy to solve - just use forward substitution as in Gaussian Elimination
Implementation code
Python
Implementation of the pseudo-code from the Strang's book:
import numpy as np
def qr_factorization(A):
m, n = A.shape
Q = np.zeros((m, n))
R = np.zeros((n, n))
for j in range(n):
v = A[:, j]
for i in range(j - 1):
q = Q[:, i]
R[i, j] = q.dot(v)
v = v - R[i, j] * q
norm = np.linalg.norm(v)
Q[:, j] = v / norm
R[j, j] = norm
return Q, R
A = np.random.rand(13, 10) * 1000
Q, R = qr_factorization(A)
Q.shape, R.shape
np.abs((A - Q.dot(R)).sum()) < 1e-6
Sources