# ML Wiki

## Reduced Rank Approximation

Given an $m \times n$ matrix $A$, the goal is to describe $A$ using fewer than $m \times n$ entries

• also called Total Least Squares because we want to approximate matrix $A$ with matrix $B$ by minimizing $\| A - B \|^2_F$
• in the Matrix Vector Spaces with Frobenius Norm

### Redundancy of Matrix

• rank of $A$ specifies the number of linearly independent columns/rows
• so it's a good measure of redundancy
• if the rank is low, $A$ has a lot of redundancy
• such matrices can be expressed more efficiently than just a table of entries
• for example, Rank One Matrices are very redundant: instead of $mn$ entries they can be represented by $m + n$ entries

## Approximation with SVD

### Best Approximation

Finding the approximation

• Suppose our matrix doesn't have rank-1,
• but we want to find the best rank-1 approximation to this matrix
• so we want to express $A$ in terms of two vectors and their Outer Product

How do we define "best"?

• let matrix $B$ be the best rank-one approximation of $A$ if $\| B - A \|^2$ is minimum
• so $B$ is the best in terms of "Total Least Squares"
• norm for a matrix? Use Frobenious Norm
• so use element-wise Inner Product $\langle A, B \rangle = \sum_{ij} a_{ij} b_{ij}$ and norm is $\| A \|_F = \langle A, A \rangle$.

Let's apply SVD:

• $A = U \Sigma V^T = \sum_i \sigma_i \mathbf u_i \mathbf v_i^T$
• $\| A \|^2_F = \sum_i \| \sigma_i \mathbf u_i \mathbf v_i^T \|^2_F$
• TODO: why??? prove it
• the terms are orthogonal w.r.t. matrix inner product
• SVD is orthogonal decomposition into rank-1 matrices
• also because norm of rank-1 matrix is $\| \mathbf u_i \mathbf v_i^T \|^2_F = \| \mathbf u_i \|^2 \|\mathbf v_i \|^2$ and $\mathbf v_i$ and $\mathbf u_i$ are orthonormal, we have
• $\| A \|^2_F = \sum_i \sigma_i^2$

### Approximation

Let's express $A$ as $A = S_k + E_k$

• where $S_k = \sum_{i = 1}^k \sigma_i \mathbf u_i \mathbf v_i^T$ - first $k$ vectors
• and $E_k = \sum_{i = k+1}^r \sigma_i \mathbf u_i \mathbf v_i^T$ - remaining $k$ vectors
• then $\| A \|_F^2 = \| S_k \|_F^2 + \| E_k \|_F^2$ for any $k$
• also, $\| S_k \|_F^2 = \sum_{i = 1}^k \sigma_i^2$ and $\| E_k \|_F^2 = \sum_{i = k+1}^r \sigma_i^2$

### Rank-1 Approximation

Theorem

The best rank-1 approximation of $A$ is $\sigma_1 \mathbf u_1 \mathbf v_1^T$

Proof

• $S_1 = \sigma_1 \mathbf u_1 \mathbf v_1^T$ and $E_1 = \sigma_2^2 + \ ... \ + \sigma_r^2$
• show that $E_1$ is the best achievable error
• let $A_1$ be any rank-1 approximation, so it's error is $\| A - A_1 \|^2$
• this norm is preserved under multiplication by orthogonal matrices (see Froubenius Norm)
• so $\| A - A_1 \|^2 = \| U \Sigma V^T - A_1 \|^2 = \| \Sigma V^T - U^T A_1 \|^2 = \| \Sigma - U^T A_1 V \|^2$
• let's write $U^T A_1 V$ as $\alpha \mathbf x \mathbf y^T$ with $\alpha > 0$ and unit vectors $\mathbf x \in \mathbb R^m$ and $\mathbf y \in \mathbb R^n$
• so, $\| \Sigma - U^T A_1 V \|^2_F = \| \Sigma \|^2_F - 2 \alpha \Sigma \mathbf x \mathbf y^T + \alpha^2 \| \mathbf x \mathbf y^T \|^2_F$
• $\| \mathbf x \mathbf y^T \|^2_F = 1$ because both $\mathbf x$ and $\mathbf y$ are unit vectors
• $\Sigma \mathbf x \mathbf y^T = \sum_{i=1}^r \sigma_i x_i y_i \leqslant \sum_{i=1}^r \sigma_i | x_i | \, | y_i | \leqslant \sigma_1 \sum_{i=1}^r | x_i | \, | y_i |$
• recall that in SVD $\sigma_1$ the biggest singular value
• let $\mathbf x^* = (|x_1|, \ ... \ , |x_r|)$ and $\mathbf y^* = (|y_1|, \ ... \ , |y_r|)$
• so $\sum_{i=1}^r | x_i | \, | y_i | = \langle \mathbf x^*, \mathbf y^* \rangle$
• by Cauchy-Schwartz Inequality we have $\langle \mathbf x^*, \mathbf y^* \rangle \leqslant \| \mathbf x^* \| \, \| \mathbf y^* \| \leqslant \| \mathbf x \| \, \| \mathbf y \|= 1$
• so $\sigma_1 \langle \mathbf x^*, \mathbf y^* \rangle \leqslant \sigma_1$
• and $\Sigma \mathbf x \mathbf y^T \leqslant \sigma_1$
• $\| \Sigma - U^T A_1 V \|^2_F = \| \Sigma \|^2_F - 2 \alpha \Sigma \mathbf x \mathbf y^T + \alpha^2 \| \mathbf x \mathbf y^T \|^2_F \geqslant \| \Sigma \|^2_F - 2 \alpha\sigma_1$
• complete the square and get $\| \Sigma \|^2_F - 2 \alpha\sigma_1 = \| \Sigma \|^2_F - (\alpha - \sigma_1)^2 - \sigma_1^2$
• apparently it's maximal when $\alpha = \sigma_1$
• so $\| \Sigma - U^T A_1 V \|^2_F \geqslant \| \Sigma \|^2_F - \sigma_1^2 = E_2$
• the exact minimum is obtained when $\alpha = \sigma_1, \mathbf x = \mathbf e_1 \in \mathbb R^m, \mathbf y = e_1 \in \mathbb R^n$
• finally, $A_1 = \alpha (U \mathbf x) (V \mathbf y)^T = \sigma \mathbf u_1 \mathbf v_1^T$

$\square$

### Greedy Approach to Rank-$k$ Approximation

Greedy approach:

• find rank-1 $A_1$ for which $E_1 = A - A_1$ has minimal norm
• next choose rank-1 matrix $A_2$ for which $E_2 = E_1 - A_2 = E - A_1 - A_2$ has the minimal norm
• $A_1 + A_2$ is rank-2 approximation
• at each step choose such $A_i$ that minimizes the norm of $E_i = E_{i-1} - A_i$
• after $k$ steps we have rank-$k$ approximation to $A \approx A_1 + \ ... \ + A_k$

Step 1:

• find best rank-1 approximation
• already know how to do this

Step 2:

• now $E_1 = \sum_{i=2}^r \sigma_i \mathbf u_i \mathbf v_i^T$
• and SVD of $E_1$ will give the same!
• so the best rank-1 approximation to $E_1$ is $\sigma_2 \mathbf u_2 \mathbf v_2^T$

Can repeat this argument for all $E_i$

### Best Rank-$k$ Approximation

Actually, the greedy approach gives the best possible approximation

Proof:

• Lawson, Charles L., and Richard J. Hanson. Solving least squares problems. Vol. 161. Englewood Cliffs, NJ: Prentice-hall, 1974.

## Fourier Transformation vs Reduced Rank Approximation

DFT

• it's about representing a data vector in a special orthogonal basis
• the basis is sines and cosines
• So, the Fourier Decomposition represents data as a sum of basic functions with specific amplitudes
• often there are only a few principal frequencies that account for most variability in the data
• and the rest can be discarded

RRA with SVD

• it's the same as DFT
• but here we find the best basis ourselves with SVD
• so we can see SVD as adaptive generalization of DFT

## Sources

• Kalman, Dan. "A singularly valuable decomposition: the SVD of a matrix." (1996).