Line 57: Line 57:
  
  
== [[Column Space]]s ==
+
=== [[Column Space]]s ===
 
Claim: The column space of $A$ does not change when we orthogonalize it
 
Claim: The column space of $A$ does not change when we orthogonalize it
  
Line 66: Line 66:
 
* e.g. $\mathbf v_3 = \mathbf c - \alpha_1  \mathbf v_1 - \alpha_2  \mathbf v_2 = \mathbf c - \alpha_1\mathbf a - \alpha_2 \cdot \left(\mathbf b - \alpha_3 \mathbf a \right) = \mathbf c - \alpha_1 \mathbf a - \alpha_2 \mathbf b - \alpha_2 \alpha_3 \mathbf a$
 
* e.g. $\mathbf v_3 = \mathbf c - \alpha_1  \mathbf v_1 - \alpha_2  \mathbf v_2 = \mathbf c - \alpha_1\mathbf a - \alpha_2 \cdot \left(\mathbf b - \alpha_3 \mathbf a \right) = \mathbf c - \alpha_1 \mathbf a - \alpha_2 \mathbf b - \alpha_2 \alpha_3 \mathbf a$
 
* $\alpha_1 = \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1}, \alpha_2 = \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2}, \alpha_3 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1}$ are just scalars  
 
* $\alpha_1 = \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1}, \alpha_2 = \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2}, \alpha_3 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1}$ are just scalars  
 +
 +
 +
=== Implementation ===
 +
==== Java ====
 +
The implementation is taken from Smile: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/projection/RandomProjection.java
 +
 +
double[][] X;
 +
Math.unitize(X[0]);
 +
for (int i = 1; i < p; i++) {
 +
    for (int j = 0; j < i; j++) {
 +
        double t = -Math.dot(X[i], X[j]);
 +
        Math.axpy(t, X[j], X[i]);
 +
    }
 +
    Math.unitize(X[i]);
 +
}
 +
 +
where
 +
* <code>Math.unitize</code> unit-normalizes the vector
 +
* <code>Math.axpy</code> updates an array by adding a multiple of another array y = a * x + y.
  
  
Line 81: Line 100:
 
* since $Q^T Q = I$, we have $R = Q^T A$
 
* since $Q^T Q = I$, we have $R = Q^T A$
 
* let $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \  \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \  \mathop{\mathbf q_n}\limits_|^| \Bigg]$
 
* let $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \  \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \  \mathop{\mathbf q_n}\limits_|^| \Bigg]$
* thus we have $R = Q^T A = \begin{bmatrix}  
+
* thus we have <math>
 +
R = Q^T A = \begin{bmatrix}  
 
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\
 
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\
 
\mathbf q_2^T \mathbf a_1 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\
 
\mathbf q_2^T \mathbf a_1 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\
 
\vdots & \vdots & \ddots & \vdots \\
 
\vdots & \vdots & \ddots & \vdots \\
 
\mathbf q_n^T \mathbf a_1 & \mathbf q_n^T \mathbf a_2 & \cdots & \mathbf q_n^T \mathbf a_n \\
 
\mathbf q_n^T \mathbf a_1 & \mathbf q_n^T \mathbf a_2 & \cdots & \mathbf q_n^T \mathbf a_n \\
\end{bmatrix}$
+
\end{bmatrix}
 +
</math>
  
  
Line 93: Line 114:
 
* then we took $\mathbf q_1$ and $\mathbf q_2$ and made all $\mathbf a_3, ..., \mathbf a_n$ orthogonal to them, so $\mathbf q_3^T \mathbf a_2 = \ ... \ = \mathbf q_n^T \mathbf a_2 = 0$
 
* then we took $\mathbf q_1$ and $\mathbf q_2$ and made all $\mathbf a_3, ..., \mathbf a_n$ orthogonal to them, so $\mathbf q_3^T \mathbf a_2 = \ ... \ = \mathbf q_n^T \mathbf a_2 = 0$
 
* proceeding this way till the end we see that $R$ is upper-diagonal:
 
* proceeding this way till the end we see that $R$ is upper-diagonal:
* thus we have $R = Q^T A = \begin{bmatrix}  
+
* thus we have <math>
 +
R = Q^T A = \begin{bmatrix}  
 
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\
 
\mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\
 
0 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\
 
0 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\
 
\vdots & \vdots & \ddots & \vdots \\
 
\vdots & \vdots & \ddots & \vdots \\
 
0 & 0 & \cdots & \mathbf q_n^T \mathbf a_n \\
 
0 & 0 & \cdots & \mathbf q_n^T \mathbf a_n \\
\end{bmatrix}$
+
\end{bmatrix}
 +
</math>
  
  
Line 110: Line 133:
  
  
== Usage ==
+
=== Usage ===
 
This is often used for [[Linear Least Squares]] - to make the [[Normal Equation]] faster
 
This is often used for [[Linear Least Squares]] - to make the [[Normal Equation]] faster
  
  
== Implementation code ==
+
=== Implementation code ===
=== Python ===
+
==== Python ====
 
Implementation of the pseudo-code from the Strang's book:
 
Implementation of the pseudo-code from the Strang's book:
  
Line 122: Line 145:
  
 
def qr_factorization(A):
 
def qr_factorization(A):
  m, n = A.shape
+
    m, n = A.shape
  Q = np.zeros((m, n))
+
    Q = np.zeros((m, n))
  R = np.zeros((n, n))
+
    R = np.zeros((n, n))
 +
 
 +
    for j in range(n):
 +
        v = A[:, j]
  
  for j in range(n):
+
         for i in range(j - 1):
    v = A[:, j]
+
            q = Q[:, i]
          
+
            R[i, j] = q.dot(v)
    for i in range(j - 1):
+
            v = v - R[i, j] * q
      q = Q[:, i]
+
      R[i, j] = q.dot(v)
+
      v = v - R[i, j] * q
+
  
    norm = np.linalg.norm(v)
+
        norm = np.linalg.norm(v)
    Q[:, j] = v / norm
+
        Q[:, j] = v / norm
    R[j, j] = norm
+
        R[j, j] = norm
  return Q, R
+
    return Q, R
  
 
A = np.random.rand(13, 10) * 1000
 
A = np.random.rand(13, 10) * 1000

Revision as of 23:01, 2 October 2016

$\require{cancel}$

Gram-Schmidt Process

In Linear Algebra, Gram-Schmidt process is a method for orthogonalization:


2D Case

Suppose we have two linearly independent vectors $\mathbf a$ and $\mathbf b$

  • we want to make them orthogonal $\mathbf a \Rightarrow \mathbf v_1$, $\mathbf b \Rightarrow \mathbf v_2$
  • and then we normalize them: $\mathbf v_1 \Rightarrow \mathbf q_1$ and $\mathbf v_2 \Rightarrow \mathbf q_2$
  • dc75bd285d314c4a8da6b7c6d1267716.png
  • $\mathbf a$ is already fine, let's keep its direction for $\mathbf q_1$ as well
  • $\mathbf b$ is not fine: we want it to be orthogonal to $\mathbf q_1$


How do we produce a vector $\mathbf v_2$ from $\mathbf b$ such that $\mathbf v_2 \; \bot \; \mathbf v_1 = \mathbf a$

  • dcc30433e98143b2b236fa419bc06d4d.png
  • let's project $\mathbf b$ on $\mathbf v_1$, and we get $P \cdot \mathbf v_1 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1$
  • $\mathbf e$ is our projection error, so $\mathbf e = \mathbf b - P \mathbf v_1 = \mathbf b - P \mathbf v_1 = \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1$
  • note that $\mathbf v_2 = \mathbf e$! it has the same length and the same direction
  • so $\mathbf v_2 = \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1$
  • interpretation: we take the original vector $\mathbf b$ and remove the projection of this vector onto $\mathbf v_1$, and it leaves only the orthogonal part
  • now $\mathbf v_1 \; \bot \mathbf v_2$
  • let's check: $\mathbf v_1^T \mathbf v_2 = \mathbf v_1^T \left( \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1 \right) = \mathbf v_1^T \mathbf b - \mathbf v_1^T \cdot \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1 = \mathbf v_1^T \mathbf b - \mathbf v_1^T \mathbf b = 0$


To find the final solution, we just normalize $\mathbf v_1$ and $\mathbf v_2$:

  • $\mathbf q_1 = \mathbf v_1 / \| \mathbf v_1 \|$
  • $\mathbf q_2 = \mathbf v_2 / \| \mathbf v_2 \|$


3D Case

What if we have 3 vectors $\mathbf a, \mathbf b, \mathbf c$?

  • need to find (pair-wise) orthogonal vectors $\mathbf v_1, \mathbf v_2, \mathbf v_3$ and then normalize them
  • have $\mathbf v_1 = a$, $\mathbf v_2 = \mathbf b - \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1} \mathbf v_1$
  • what about $\mathbf v_3$? Know that we want to have $\mathbf v_3 \; \bot \; \mathbf v_1$ and $\mathbf v_3 \; \bot \; \mathbf v_2$
  • for $\mathbf v_3$, we want to subtract its components in direction $\mathbf v_1$ as well as in direction $\mathbf v_2$
  • so $\mathbf v_3 = \mathbf c - \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1} \cdot \mathbf v_1 - \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2} \cdot \mathbf v_2$


To get $\mathbf q_1, \mathbf q_2, \mathbf q_3$, we just normalize:

  • $\mathbf q_1 = \mathbf v_1 / \| \mathbf v_1 \|$
  • $\mathbf q_2 = \mathbf v_2 / \| \mathbf v_2 \|$
  • $\mathbf q_3 = \mathbf v_3 / \| \mathbf v_3 \|$


3D Case Animation

File:Gram-Schmidt orthonormalization process.gif

Source: http://en.wikipedia.org/wiki/File:Gram-Schmidt_orthonormalization_process.gif


Column Spaces

Claim: The column space of $A$ does not change when we orthogonalize it


Suppose that we take a matrix $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \ \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and orthogonalize its columns into $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \ \mathop{\mathbf q_n}\limits_|^| \Bigg]$

  • Why $C(A) = C(Q)$?
  • at each step of the Gram-Schmidt process we take linear combinations from $C(A)$!
  • e.g. $\mathbf v_3 = \mathbf c - \alpha_1 \mathbf v_1 - \alpha_2 \mathbf v_2 = \mathbf c - \alpha_1\mathbf a - \alpha_2 \cdot \left(\mathbf b - \alpha_3 \mathbf a \right) = \mathbf c - \alpha_1 \mathbf a - \alpha_2 \mathbf b - \alpha_2 \alpha_3 \mathbf a$
  • $\alpha_1 = \cfrac{\mathbf v_1^T \mathbf c}{\mathbf v_1^T \mathbf v_1}, \alpha_2 = \cfrac{\mathbf v_2^T \mathbf c}{\mathbf v_2^T \mathbf v_2}, \alpha_3 = \cfrac{\mathbf v_1^T \mathbf b}{\mathbf v_1^T \mathbf v_1}$ are just scalars


Implementation

Java

The implementation is taken from Smile: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/projection/RandomProjection.java

double[][] X;
Math.unitize(X[0]);
for (int i = 1; i < p; i++) {
    for (int j = 0; j < i; j++) {
        double t = -Math.dot(X[i], X[j]);
        Math.axpy(t, X[j], X[i]);
    }
    Math.unitize(X[i]);
}

where

  • Math.unitize unit-normalizes the vector
  • Math.axpy updates an array by adding a multiple of another array y = a * x + y.


QR Factorization

We can think of the Gram-Schmidt Process in the matrix language (like for Gaussian Elimination that brings us to LU Factorization)

  • recall that $C(Q) = C(A)$
  • because of this, there exists a third matrix $R$ that brings $A$ to $Q$
  • or, $A = Q R$
  • (so we want Gram-Schmidt applied to the columns of $A$)


How to find this $R$?

  • $A = QR$, let's multiply both sides by $Q^T$:
  • $Q^T A = Q^T Q R$
  • since $Q^T Q = I$, we have $R = Q^T A$
  • let $A = \Bigg[ \mathop{\mathbf a_1}\limits_|^| \ \mathop{\mathbf a_2}\limits_|^| \ \cdots \ \mathop{\mathbf a_n}\limits_|^| \Bigg]$ and $Q = \Bigg[ \mathop{\mathbf q_1}\limits_|^| \ \mathop{\mathbf q_2}\limits_|^| \ \cdots \ \mathop{\mathbf q_n}\limits_|^| \Bigg]$
  • thus we have [math] R = Q^T A = \begin{bmatrix} \mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\ \mathbf q_2^T \mathbf a_1 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf q_n^T \mathbf a_1 & \mathbf q_n^T \mathbf a_2 & \cdots & \mathbf q_n^T \mathbf a_n \\ \end{bmatrix} [/math]


Now recall the way we constructed $Q$

  • we took $\mathbf q_1 = \mathbf a_1 / \| \mathbf a_1 \|$ and make all other $\mathbf a_i$ orthogonal to it, so $\mathbf q_2^T \mathbf a_1 = \ ... \ = \mathbf q_n^T \mathbf a_1 = 0$
  • then we took $\mathbf q_1$ and $\mathbf q_2$ and made all $\mathbf a_3, ..., \mathbf a_n$ orthogonal to them, so $\mathbf q_3^T \mathbf a_2 = \ ... \ = \mathbf q_n^T \mathbf a_2 = 0$
  • proceeding this way till the end we see that $R$ is upper-diagonal:
  • thus we have [math] R = Q^T A = \begin{bmatrix} \mathbf q_1^T \mathbf a_1 & \mathbf q_1^T \mathbf a_2 & \cdots & \mathbf q_1^T \mathbf a_n \\ 0 & \mathbf q_2^T \mathbf a_2 & \cdots & \mathbf q_2^T \mathbf a_n \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \mathbf q_n^T \mathbf a_n \\ \end{bmatrix} [/math]


Note that for the diagonal elements of $R$, $\mathbf q_i^T \mathbf a_i = \| \mathbf v_i \|$

  • why?
  • 4d5edf1db6d04f2a9b1310228db15afa.png
  • $\mathbf q_i^T \mathbf a_i = \| \mathbf q_i \| \| \mathbf a_i \| \cos \theta$ by the Dot Product definition
  • $\| \mathbf q_i \| = 1$ and $\cos \theta = \cfrac{\| \mathbf v_i \|}{\| \mathbf a_i \|}$, so
  • $\mathbf q_i^T \mathbf a_i = \| \mathbf q_i \| \| \mathbf a_i \| \cos \theta = 1 \cdot \cancel{\| \mathbf a_i \|} \cfrac{\| \mathbf v_i \|}{ \cancel{\| \mathbf a_i \|} } = \| \mathbf v_i \|$
  • this can be uses to speed up the computation a bit


Usage

This is often used for Linear Least Squares - to make the Normal Equation faster


Implementation code

Python

Implementation of the pseudo-code from the Strang's book:

import numpy as np

def qr_factorization(A):
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))

    for j in range(n):
        v = A[:, j]

        for i in range(j - 1):
            q = Q[:, i]
            R[i, j] = q.dot(v)
            v = v - R[i, j] * q

        norm = np.linalg.norm(v)
        Q[:, j] = v / norm
        R[j, j] = norm
    return Q, R

A = np.random.rand(13, 10) * 1000
Q, R = qr_factorization(A)

Q.shape, R.shape
np.abs((A - Q.dot(R)).sum()) < 1e-6


Sources