Reduced Rank Approximation
Given an $m \times n$ matrix $A$, the goal is to describe $A$ using fewer than $m \times n$ entries
-
also called ‘'’Total Least Squares’’’ because we want to approximate matrix $A$ with matrix $B$ by minimizing $| A - B |^2_F$ - in the Matrix Vector Spaces with Frobenius Norm
Redundancy of Matrix
- rank of $A$ specifies the number of linearly independent columns/rows
- so it’s a good measure of redundancy
- if the rank is low, $A$ has a lot of redundancy
- such matrices can be expressed more efficiently than just a table of entries
- for example, Rank One Matrices are very redundant: instead of $mn$ entries they can be represented by $m + n$ entries
Approximation with SVD
Best Approximation
Finding the approximation
- Suppose our matrix doesn’t have rank-1,
- but we want to find the best rank-1 approximation to this matrix
- so we want to express $A$ in terms of two vectors and their Outer Product
How do we define “best”?
-
let matrix $B$ be the best rank-one approximation of $A$ if $| B - A |^2$ is minimum - so $B$ is the best in terms of “Total Least Squares” - norm for a matrix? Use Frobenious Norm
-
so use element-wise Inner Product $\langle A, B \rangle = \sum_{ij} a_{ij} b_{ij}$ and norm is $| A |_F = \langle A, A \rangle$.
Let’s apply SVD:
- $A = U \Sigma V^T = \sum_i \sigma_i \mathbf u_i \mathbf v_i^T$
-
$| A |^2_F = \sum_i | \sigma_i \mathbf u_i \mathbf v_i^T |^2_F$ - - the terms are orthogonal w.r.t. matrix inner product - SVD is orthogonal decomposition into rank-1 matrices
-
also because norm of rank-1 matrix is $| \mathbf u_i \mathbf v_i^T |^2_F = | \mathbf u_i |^2 |\mathbf v_i |^2$ and $\mathbf v_i$ and $\mathbf u_i$ are orthonormal, we have - $| A |^2_F = \sum_i \sigma_i^2$
Approximation
Let’s express $A$ as $A = S_k + E_k$
- where $S_k = \sum_{i = 1}^k \sigma_i \mathbf u_i \mathbf v_i^T$ - first $k$ vectors
- and $E_k = \sum_{i = k+1}^r \sigma_i \mathbf u_i \mathbf v_i^T$ - remaining $k$ vectors
-
then $| A |_F^2 = | S_k |_F^2 + | E_k |_F^2 $ for any $k$ - also, $| S_k |F^2 = \sum{i = 1}^k \sigma_i^2$ and $| E_k |F^2 = \sum{i = k+1}^r \sigma_i^2$
Rank-1 Approximation
- ’'’Theorem’’’
- The best rank-1 approximation of $A$ is $\sigma_1 \mathbf u_1 \mathbf v_1^T$
Proof
- $S_1 = \sigma_1 \mathbf u_1 \mathbf v_1^T$ and $E_1 = \sigma_2^2 + \ … \ + \sigma_r^2$
- show that $E_1$ is the best achievable error
-
let $A_1$ be any rank-1 approximation, so it’s error is $| A - A_1 |^2$ - this norm is preserved under multiplication by orthogonal matrices (see Froubenius Norm) -
so $| A - A_1 |^2 = | U \Sigma V^T - A_1 |^2 = | \Sigma V^T - U^T A_1 |^2 = | \Sigma - U^T A_1 V |^2$ - let’s write $U^T A_1 V$ as $\alpha \mathbf x \mathbf y^T$ with $\alpha > 0$ and unit vectors $\mathbf x \in \mathbb R^m$ and $\mathbf y \in \mathbb R^n$ -
so, $| \Sigma - U^T A_1 V |^2_F = | \Sigma |^2_F - 2 \alpha \Sigma \mathbf x \mathbf y^T + \alpha^2 | \mathbf x \mathbf y^T |^2_F$ - $| \mathbf x \mathbf y^T |^2_F = 1$ because both $\mathbf x$ and $\mathbf y$ are unit vectors - $\Sigma \mathbf x \mathbf y^T = \sum_{i=1}^r \sigma_i x_i y_i \leqslant \sum_{i=1}^r \sigma_i x_i \, y_i \leqslant \sigma_1 \sum_{i=1}^r x_i \, y_i $ - recall that in SVD $\sigma_1$ the biggest singular value -
let $\mathbf x^* = ( x_1 , \ … \ , x_r )$ and $\mathbf y^* = ( y_1 , \ … \ , y_r )$ - so $\sum_{i=1}^r x_i \, y_i = \langle \mathbf x^, \mathbf y^ \rangle$ - by Cauchy-Schwartz Inequality we have $\langle \mathbf x^, \mathbf y^ \rangle \leqslant | \mathbf x^* | \, | \mathbf y^* | \leqslant | \mathbf x | \, | \mathbf y |= 1$ - so $\sigma_1 \langle \mathbf x^, \mathbf y^ \rangle \leqslant \sigma_1$ - and $\Sigma \mathbf x \mathbf y^T \leqslant \sigma_1$
-
$| \Sigma - U^T A_1 V |^2_F = | \Sigma |^2_F - 2 \alpha \Sigma \mathbf x \mathbf y^T + \alpha^2 | \mathbf x \mathbf y^T |^2_F \geqslant | \Sigma |^2_F - 2 \alpha\sigma_1$ - complete the square and get $| \Sigma |^2_F - 2 \alpha\sigma_1 = | \Sigma |^2_F - (\alpha - \sigma_1)^2 - \sigma_1^2$ - apparently it’s maximal when $\alpha = \sigma_1$ -
so $| \Sigma - U^T A_1 V |^2_F \geqslant | \Sigma |^2_F - \sigma_1^2 = E_2$ - the exact minimum is obtained when $\alpha = \sigma_1, \mathbf x = \mathbf e_1 \in \mathbb R^m, \mathbf y = e_1 \in \mathbb R^n$ - finally, $A_1 = \alpha (U \mathbf x) (V \mathbf y)^T = \sigma \mathbf u_1 \mathbf v_1^T$
$\square$
Greedy Approach to Rank-$k$ Approximation
Greedy approach:
- find rank-1 $A_1$ for which $E_1 = A - A_1$ has minimal norm
- next choose rank-1 matrix $A_2$ for which $E_2 = E_1 - A_2 = E - A_1 - A_2$ has the minimal norm
- $A_1 + A_2$ is rank-2 approximation
- at each step choose such $A_i$ that minimizes the norm of $E_i = E_{i-1} - A_i$
- after $k$ steps we have rank-$k$ approximation to $A \approx A_1 + \ … \ + A_k$
Step 1:
- find best rank-1 approximation
- already know how to do this
Step 2:
- now $E_1 = \sum_{i=2}^r \sigma_i \mathbf u_i \mathbf v_i^T$
-
and SVD of $E_1$ will give the same - so the best rank-1 approximation to $E_1$ is $\sigma_2 \mathbf u_2 \mathbf v_2^T$
Can repeat this argument for all $E_i$
Best Rank-$k$ Approximation
Actually, the greedy approach gives the best possible approximation
Proof:
- Lawson, Charles L., and Richard J. Hanson. Solving least squares problems. Vol. 161. Englewood Cliffs, NJ: Prentice-hall, 1974.
Fourier Transformation vs Reduced Rank Approximation
DFT
- it’s about representing a data vector in a special orthogonal basis
- the basis is sines and cosines
- So, the Fourier Decomposition represents data as a sum of basic functions with specific amplitudes
- often there are only a few principal frequencies that account for most variability in the data
- and the rest can be discarded
RRA with SVD
- it’s the same as DFT
- but here we find the best basis ourselves with SVD
- so we can see SVD as adaptive generalization of DFT
Sources
- Kalman, Dan. “A singularly valuable decomposition: the SVD of a matrix.” (1996). [http://www.math.washington.edu/~morrow/498_13/svd.pdf]