ML Wiki

Euclidean LSH

Euclidean LSH - LSH for the Euclidean Space

$p$-Stable Distributions

A family of hash functions $\mathcal H$ is $(r_1, r_2, p_1, p_2)$-sensitive if

• for all $\mathbf p, \mathbf q$
• if $\mathbf p \in B(\mathbf q, r_1)$ then $P_{\mathcal H} \Big[ h(\mathbf q) = h(\mathbf p) \Big] \geqslant p_1$
• if $\mathbf p \not \in B(\mathbf q, r_2)$ then $P_{\mathcal H} \Big[ h(\mathbf q) = h(\mathbf p) \Big] \leqslant p_2$

Here: choose $r_1 = R$ and $r_2 = c \cdot R$

A Probability Distribution $D$ over $\mathbb R$ is $p$-stable

• if there exists $p \geqslant 0$ s.t. for any $n$ real numbers $v_1, ..., v_n$
• and iid samples from distribution $D$: $\ X_1, X_2, \ ... \ , X_n \sim D$
• the Random Variable $\sum v_i \, X_i$ follows the same distribution as $\left( \sum |v_i|^p \right)^{1/p} \cdot X = \| \mathbf v \| \cdot X$ where $X \sim D$

Known $p$-stable Distribution:

In CS $p$-stable distributions are useful for Sketching

• can be used to estimate $\| \mathbf v \|_p$

Papers:

• Nolan, J. "Stable distributions." 2009.  
• Indyk, Piotr. "Stable distributions, pseudorandom generators, embeddings, and data stream computation." 2006. 

E2 LSH

A hash function family is locality-sensitive if

• "similar" $\mathbf v_1, \mathbf v_2$ - i.e. have small $\| \mathbf v_1 - \mathbf v_2 \|_p$
• they should collide: have same hash value with high probability

Consider $\mathbf a \cdot \mathbf v$ - it's a projection of $\mathbf v$ to a real line

• it follows from the $p$-stability that for two $\mathbf v_1, \mathbf v_2$ distance between the projections $\mathbf a \cdot \mathbf v_1 - \mathbf a \cdot \mathbf v_2$ is distributed as $\| \mathbf v_1 - \mathbf v_2 \|_p \cdot X$
• so if we chop the real line into equi-width segments of $w$, then the hash functions should be locality-preserving

Random Projections

The Dot Product is a projection

• this is at the core of LSH
• let $\mathbf v$ be the query and $\mathbf x$ some random vector with each $x_i \sim N(0, 1)$ (components of $\mathbf x$ are sampled from the Normal Distribution)
• the hash function is then defined as $h(\mathbf v) = \mathbf v \cdot \mathbf x$

Quantization:

• then we quantize $h(\mathbf v)$ into a set of hash buckets - hoping that nearby items will fall into the same bin
• i.e. we get the following hash function:
• $h_{\mathbf x, b}(\mathbf v) = \left\lfloor \cfrac{\mathbf v \cdot \mathbf x + b}{w} \right\rfloor$
• where $w$ is the length of each quantization bucket
• and $b$ is a Random Variable sampled from the Uniform Distribution: $b \sim \text{unif}[0, w]$
• $w$ - quantization step

Requirements for the projection operator

• let $\mathcal H$ be a family of LSH functions
• for any $\mathbf p, \mathbf q \in \mathbb R^d$ that are close to each other, i.e. for $\|\mathbf p - \mathbf q \| \leqslant R_1$
• probability that they end up in the same bucket should be high, i.e. $P_{\mathcal H} \Big[ h(\mathbf p) = h(\mathbf q) \Big] \geqslant p_1$
• if $\mathbf p, \mathbf q$ are far apart, i.e. $\|\mathbf p - \mathbf q \| \geqslant R_2$
• probability that they end up in the same bucket is low, i.e. $P_{\mathcal H} \Big[ h(\mathbf p) = h(\mathbf q) \Big] \leqslant p_2$

I.e. $\| h(\mathbf p) - h(\mathbf q) \| \sim \| \mathbf p - \mathbf q \|$ ($\sim$ = "distributed proportionally to")

$L$ Random Projections

Can magnify the difference between $P_1$ and $P_2$ by performing projection on $K$ random directions in parallel

$\left( \cfrac{p_1}{p_2} \right)^K > \cfrac{p_1}{p_2}$

• So given a query $\mathbf v$ apply $K$ individual dot products
• and obtain $K$ real numbers $h_i (\mathbf v) \in \mathbb R$ (all quantified - i.e. put into buckets - separately)
• so $H(\mathbf v) = \Big[ h_1(\mathbf v), h_2(\mathbf v), \ ... \ , h_K(\mathbf v) \Big] \in \mathbb R^K$

Now we say that $\mathbf u$ is a NN candidate for $\mathbf v$ if they end up in the same bucket, i.e. $H(\mathbf u) = H(\mathbf v)$ (for all $h_i(\cdot)$)

Note that for $K$ dot products the probability of having the same bucket is $p_1^K$ - which decreases as we increase $K$

• to reduce this effect form $L$ independent projections - i.e. we'll have $L$ hash functions $H_1, \ ... \ , H_L$
• then $\mathbf u$ is a candidate if it ends up in the same bucket as $\mathbf v$ for at least one projection $H_i$
• it's very unlikely that the true NN will be absent in all $L$ bins

Hash Implementation

(E2 LSH)

$\{ H_i \}$ put each data point into a hash bucket described by $K$-vector

• this $K$-dim space is very sparse - can densify it
• use exact hashes - i.e. "classical" Hash Tables for this
• so let $T_1(\mathbf v) = \mathbf w_2^T H(\mathbf v) \ \mod P_1$ for some random weight vector $\mathbf w$ and a big prime number $P_1$ (which is also the size of the hash table)

Unrelated points may collide to the same bucket

• how to handle it?
• have another function $T_2$ with different weights $\mathbf w_2$ and $P_2$
• let $T_2(\mathbf v)$ be the "fingerprint" of $\mathbf v$
• we store the fingerprint of $\mathbf v$ in the bin chosen by $T_1$
• then during retrieval we can find the true match by comparing fingerprints: they are short and their comparison is fast
• chances that both $T_1$ and $T_2$ collide are very small

Accuracy

• accuracy is determined by the probability it will find the true NN

$p$-Stable Distributions

A distribution is $p$-stable if

• for iid sample $X_1, \ ... \ , X_n \sim D$
• for any $v_1, \ ... \ , v_n \in \mathbb R$
• $\sum v_i \, X_i$ follows the same distribution as the random variable $\left( \sum_i \|v_i \big \|_p \right)^{1 / p} \cdot X$
• where $\| \, \cdot \, \|_p$ is $L_p$ norm
• and $X \sim D$
• PDF of this RV is $f_p$

For $p=2$ Gaussian is $p$-stable

What it means:

• $\mathbf u = \| \mathbf p - \mathbf q \|$ will always be small if $\mathbf p$ and $\mathbf q$ are close
• but because of quantization they may end up in different buckets

Probability of getting to the same bucket:

• $P_{\mathbf a, b} = P \Big [ h_{\mathbf a, b}(\mathbf p) = h_{\mathbf a, b}(\mathbf q) \Big] = \int\limits_{0}^{w} \frac{1}{\mathbf u} \cdot f_p(\frac{t}{\mathbf u}) \, (1 - \frac{t}{w}) , \text{d}t$

Sources

• Slaney, Malcolm, and Michael Casey. "Locality-sensitive hashing for finding nearest neighbors [lecture notes]." 2008. 
• Datar, Mayur, et al. "Locality-sensitive hashing scheme based on p-stable distributions." 2004.