Reduced Rank Approximation

Given an $m \times n$ matrix $A$, the goal is to describe $A$ using fewer than $m \times n$ entries

  • also called Total Least Squares because we want to approximate matrix $A$ with matrix $B$ by minimizing $\| A - B \|^2_F$
  • in the Matrix Vector Spaces with Frobenius Norm


Redundancy of Matrix

  • rank of $A$ specifies the number of linearly independent columns/rows
  • so it's a good measure of redundancy
  • if the rank is low, $A$ has a lot of redundancy
  • such matrices can be expressed more efficiently than just a table of entries
  • for example, Rank One Matrices are very redundant: instead of $mn$ entries they can be represented by $m + n$ entries


Approximation with SVD

Best Approximation

Finding the approximation

  • Suppose our matrix doesn't have rank-1,
  • but we want to find the best rank-1 approximation to this matrix
  • so we want to express $A$ in terms of two vectors and their Outer Product


How do we define "best"?

  • let matrix $B$ be the best rank-one approximation of $A$ if $\| B - A \|^2$ is minimum
  • so $B$ is the best in terms of "Total Least Squares"
  • norm for a matrix? Use Frobenious Norm
  • so use element-wise Inner Product $\langle A, B \rangle = \sum_{ij} a_{ij} b_{ij}$ and norm is $\| A \|_F = \langle A, A \rangle$.


Let's apply SVD:

  • $A = U \Sigma V^T = \sum_i \sigma_i \mathbf u_i \mathbf v_i^T$
  • $\| A \|^2_F = \sum_i \| \sigma_i \mathbf u_i \mathbf v_i^T \|^2_F$
    • TODO: why??? prove it
    • the terms are orthogonal w.r.t. matrix inner product
  • SVD is orthogonal decomposition into rank-1 matrices
  • also because norm of rank-1 matrix is $\| \mathbf u_i \mathbf v_i^T \|^2_F = \| \mathbf u_i \|^2 \|\mathbf v_i \|^2$ and $\mathbf v_i$ and $\mathbf u_i$ are orthonormal, we have
  • $\| A \|^2_F = \sum_i \sigma_i^2$


Approximation

Let's express $A$ as $A = S_k + E_k$

  • where $S_k = \sum_{i = 1}^k \sigma_i \mathbf u_i \mathbf v_i^T$ - first $k$ vectors
  • and $E_k = \sum_{i = k+1}^r \sigma_i \mathbf u_i \mathbf v_i^T$ - remaining $k$ vectors
  • then $\| A \|_F^2 = \| S_k \|_F^2 + \| E_k \|_F^2 $ for any $k$
  • also, $\| S_k \|_F^2 = \sum_{i = 1}^k \sigma_i^2$ and $\| E_k \|_F^2 = \sum_{i = k+1}^r \sigma_i^2$


Rank-1 Approximation

Theorem

The best rank-1 approximation of $A$ is $\sigma_1 \mathbf u_1 \mathbf v_1^T$

Proof

  • $S_1 = \sigma_1 \mathbf u_1 \mathbf v_1^T$ and $E_1 = \sigma_2^2 + \ ... \ + \sigma_r^2$
  • show that $E_1$ is the best achievable error
  • let $A_1$ be any rank-1 approximation, so it's error is $\| A - A_1 \|^2$
  • this norm is preserved under multiplication by orthogonal matrices (see Froubenius Norm)
  • so $\| A - A_1 \|^2 = \| U \Sigma V^T - A_1 \|^2 = \| \Sigma V^T - U^T A_1 \|^2 = \| \Sigma - U^T A_1 V \|^2$
  • let's write $U^T A_1 V$ as $\alpha \mathbf x \mathbf y^T$ with $\alpha > 0$ and unit vectors $\mathbf x \in \mathbb R^m$ and $\mathbf y \in \mathbb R^n$
  • so, $\| \Sigma - U^T A_1 V \|^2_F = \| \Sigma \|^2_F - 2 \alpha \Sigma \mathbf x \mathbf y^T + \alpha^2 \| \mathbf x \mathbf y^T \|^2_F$
  • $\| \mathbf x \mathbf y^T \|^2_F = 1$ because both $\mathbf x$ and $\mathbf y$ are unit vectors
  • $\Sigma \mathbf x \mathbf y^T = \sum_{i=1}^r \sigma_i x_i y_i \leqslant \sum_{i=1}^r \sigma_i | x_i | \, | y_i | \leqslant \sigma_1 \sum_{i=1}^r | x_i | \, | y_i |$
    • recall that in SVD $\sigma_1$ the biggest singular value
  • let $\mathbf x^* = (|x_1|, \ ... \ , |x_r|)$ and $\mathbf y^* = (|y_1|, \ ... \ , |y_r|)$
  • so $\sum_{i=1}^r | x_i | \, | y_i | = \langle \mathbf x^*, \mathbf y^* \rangle$
  • by Cauchy-Schwartz Inequality we have $\langle \mathbf x^*, \mathbf y^* \rangle \leqslant \| \mathbf x^* \| \, \| \mathbf y^* \| \leqslant \| \mathbf x \| \, \| \mathbf y \|= 1$
  • so $\sigma_1 \langle \mathbf x^*, \mathbf y^* \rangle \leqslant \sigma_1$
  • and $\Sigma \mathbf x \mathbf y^T \leqslant \sigma_1$
  • $\| \Sigma - U^T A_1 V \|^2_F = \| \Sigma \|^2_F - 2 \alpha \Sigma \mathbf x \mathbf y^T + \alpha^2 \| \mathbf x \mathbf y^T \|^2_F \geqslant \| \Sigma \|^2_F - 2 \alpha\sigma_1$
  • complete the square and get $\| \Sigma \|^2_F - 2 \alpha\sigma_1 = \| \Sigma \|^2_F - (\alpha - \sigma_1)^2 - \sigma_1^2$
  • apparently it's maximal when $\alpha = \sigma_1$
  • so $\| \Sigma - U^T A_1 V \|^2_F \geqslant \| \Sigma \|^2_F - \sigma_1^2 = E_2$
  • the exact minimum is obtained when $\alpha = \sigma_1, \mathbf x = \mathbf e_1 \in \mathbb R^m, \mathbf y = e_1 \in \mathbb R^n$
  • finally, $A_1 = \alpha (U \mathbf x) (V \mathbf y)^T = \sigma \mathbf u_1 \mathbf v_1^T$

$\square$


Greedy Approach to Rank-$k$ Approximation

Greedy approach:

  • find rank-1 $A_1$ for which $E_1 = A - A_1$ has minimal norm
  • next choose rank-1 matrix $A_2$ for which $E_2 = E_1 - A_2 = E - A_1 - A_2$ has the minimal norm
    • $A_1 + A_2$ is rank-2 approximation
  • at each step choose such $A_i$ that minimizes the norm of $E_i = E_{i-1} - A_i$
  • after $k$ steps we have rank-$k$ approximation to $A \approx A_1 + \ ... \ + A_k$


Step 1:

  • find best rank-1 approximation
  • already know how to do this


Step 2:

  • now $E_1 = \sum_{i=2}^r \sigma_i \mathbf u_i \mathbf v_i^T$
  • and SVD of $E_1$ will give the same!
  • so the best rank-1 approximation to $E_1$ is $\sigma_2 \mathbf u_2 \mathbf v_2^T$


Can repeat this argument for all $E_i$


Best Rank-$k$ Approximation

Actually, the greedy approach gives the best possible approximation

Proof:

  • Lawson, Charles L., and Richard J. Hanson. Solving least squares problems. Vol. 161. Englewood Cliffs, NJ: Prentice-hall, 1974.


Fourier Transformation vs Reduced Rank Approximation

DFT

  • it's about representing a data vector in a special orthogonal basis
  • the basis is sines and cosines
  • So, the Fourier Decomposition represents data as a sum of basic functions with specific amplitudes
  • often there are only a few principal frequencies that account for most variability in the data
  • and the rest can be discarded

RRA with SVD

  • it's the same as DFT
  • but here we find the best basis ourselves with SVD
  • so we can see SVD as adaptive generalization of DFT


Sources

  • Kalman, Dan. "A singularly valuable decomposition: the SVD of a matrix." (1996). [1]