# ML Wiki

## Frobenius Norm

Is a norm for Matrix Vector Spaces: a vector space of matrices

• Define Inner Product element-wise: $\langle A, B \rangle = \sum_{ij} a_{ij} b_{ij}$
• then the norm based on this product is $\| A \|_F = \langle A, A \rangle$
• this norm is Frobenius Norm

Orthogonality:

• Matrices $A$ and $B$ are orthogonal if $\langle A, B \rangle = 0$

## Norm of Matrix Multiplication

### Rank-1 Matrices

What about the norm of two rank-1 matrices?

• let $A = \mathbf x \mathbf y^T$ and $B = \mathbf u \mathbf v^T$
• then $\langle A, B \rangle = \langle \mathbf x \mathbf y^T, \mathbf u \mathbf v^T \rangle$
• $\mathbf x \mathbf y^T = \begin{bmatrix} | & & | \\ \mathbf x y_1 & \cdots & \mathbf x y_n \\ | & & | \\ \end{bmatrix}$ and $\mathbf u \mathbf v^T = \begin{bmatrix} | & & | \\ \mathbf u v_1 & \cdots & \mathbf u v_n \\ | & & | \\ \end{bmatrix}$

• thus, $\langle \mathbf x \mathbf y^T, \mathbf u \mathbf v^T \rangle = \sum\limits_i \langle \mathbf x y_i , \mathbf u v_i \rangle = \langle \mathbf x, \mathbf u \rangle \sum_i y_i v_i = \langle \mathbf x, \mathbf u \rangle \langle \mathbf y, \mathbf v \rangle$

Orthogonality

• so two rank-1 matrices will be orthogonal if $\mathbf x \; \bot \; \mathbf u$ or $\mathbf y \; \bot \; \mathbf v$

### General Case

• Let $X$ and $Y$ be two matrices
• and $\mathbf x_i$ be the columns of $X$ and $\mathbf y_i^T$ be the rows of $Y$
• then norm of the multiplication is $\| XY \|_F = \langle XY, XY \rangle = (\sum_i \mathbf x_i \mathbf y_i^T) (\sum_j \mathbf x_j \mathbf y_j^T) = \sum_{ij} \langle \mathbf x_i \mathbf x_j \rangle \langle \mathbf y_i \mathbf y_j \rangle = \sum_i \| \mathbf x_i \|^2 \| \mathbf y_i \|^2 + \sum_{i \ne j} \langle \mathbf x_i \mathbf x_j \rangle \langle \mathbf y_i \mathbf y_j \rangle$

if $\mathbf x_i$ are orthogonal, then

• $\| XY \|_F = \sum_i \| \mathbf x_i \|^2 \| \mathbf y_i \|^2$ (cross terms are 0 because of orthogonality)

if $\mathbf x_i$ are orthonormal, then

Same applies if $\mathbf y_i$ are orthogonal/orthonormal

## Norm of Matrices

### Rank-1 Matrices

Suppose $A$ is a rank-1 matrix, i.e. $A = \mathbf x \mathbf y^T$

• $A = \mathbf x \mathbf y^T = \begin{bmatrix} | & & | \\ \mathbf x y_i & \cdots & \mathbf x y_n \\ | & & | \\ \end{bmatrix} = \begin{bmatrix} - & x_1 \mathbf y & - \\ & \vdots & \\  - & x_n \mathbf y & - \end{bmatrix} = \begin{bmatrix} x_1 y_1 & \cdots & x_1 y_n \\ \vdots & \ddots & \vdots \\ x_n y_1 & \cdots & x_n y_n \\ \end{bmatrix}$

• thus $\| A \|^2_F = \sum_i \| y_i \mathbf x \|^2 = \sum_i \| x_i \mathbf y \|^2 = \sum_{ij} (x_i y_j)^2$
• can simplify it further: $\| A \|^2_F = \sum_i \| y_i \mathbf x \|^2 = \sum_i y_i^2 \| \mathbf x \|^2 = \| \mathbf x \|^2 \sum_i y_i^2 = \| \mathbf x \|^2 \| \mathbf y \|^2$

### General Case

• If $A$ is an $m \times n$ matrix
• and $\mathbf a_i$ are columns of $A$ and $\mathbf r_j$ are rows of $A$, then
• $\| A \|^2_F = \sum_{ij} A_{ij} = \sum_i \| \mathbf a_i \|^2 = \sum_j \| \mathbf r_j \|^2$

Using SVD, we can find another way:

• SVD of $A$ is $A V = U \Sigma$
• then $\| A V \|_F^2 = \| U \Sigma \|_F^2$
• both $V$ and $U$ are orthonormal, thus by norm multiplication have
• then $\| A \|_F^2 = \| \Sigma \|_F^2$
• or, $\| A \|_F^2 = \sum_{i=1}^r \sigma_i^2$ - sum of singular values, and $\| A \|_F = \sqrt{\sum_{i=1}^r \sigma_i^2}$

## Properties

For any matrix $A$, $\| A \|_F = \sqrt{\text{tr}(AA^T)} = \sqrt{\text{tr}(A^T A)}$

• $\| A \|_F^2 = \sum_{i=1}^n \| \mathbf a_i \|^2$ where $\mathbf a_i$ are columns of $A$
• consider $A^T A$: on the main diagonal we have $\mathbf a_i^T \mathbf a_i = \| \mathbf a_i \|^2$
• so $\| A \|_F^2 = \text{tr}(A^T A)$
• can show the same way for rows of $A$ via $A A^T$

Can also apply SVD to show that:

• let $A = U \Sigma V^T$ be SVD of A
• then $\| A \|_F^2 = \| \Sigma \|_F^2 = \sum\limits_{i=1}^r \sigma_i^2$
• $\sigma_i^2$ are Eigenvalues of $AA^T$ and $A^TA$
• then, $\sum \sigma_i^2 = \text{tr}(A A^T) = \text{tr}(A^T A)$
• so it also shows that sum of eigenvalues is the trace of the matrix

## Application

This is used for Reduced Rank Approximation to show that SVD gives the best approximation in terms of Total Least Squares

## Sources

• Kalman, Dan. "A singularly valuable decomposition: the SVD of a matrix." (1996). [1]

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".