This is a stub. Edit it. |

This page is currently a draft. Edit it. |

KNN Graph a graph that contains links only between $k$ closest neighborhoods used in

Simple and popular method for many areas of machine learning, statistics and beyond

KNN approach is used in SNN Clustering

- can use the number of shared neighbors as a similarity measure

- If $k$th nearest neighbor is close, then the region is most likely of high density
- so the distance to $k$th neighbor gives a measure of density of a point
- can use it with Euclidean Distance, Cosine Similarity or SNN Similarity (see SNN Clustering)

Also note that for high dimensional data many distance/similarity measures become less meaningful

- especially Euclidean Distance
- can use special functions that can handle high dimensional data: SNN Similarity (see SNN Clustering)

Brute force search for $k$NN takes $O(N)$ where $N$ is the size of the database

- need to use Multi-Dimensional Indexes, for example, trees: Kd-Trees or R-Trees
- however for high dimensional data tree performance degrades from $O(\log N)$ to $O(N)$
- see Weber98: all indexing techniques degrade to linear search for large dimensionality
- also can't use classical hash-based indexes (like Linear Hashing or Extensible Hashing): they aim at exact match and don't handle KNN queries

Trees

- Metric Trees
- Spill-Trees gives approximate answer to KNN
- both don't work well in high dimensions, but can apply Random Projections to make them work

Hashes:

How to deal with the Curse of Dimensionality?

Pre-aggregating data:

- can use Locality Sensitive Hashing: probabilistic/approximate indexing techniques that return the true KNNs most of the time correctly
- it would put data items into hash buckets, and when we look for KNN of $q$, we look into buckets where $q$ landed
- alternatively, can use canopies

- Weber, Roger, Hans-Jörg Schek, and Stephen Blott. "A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces." 1998. [1]
- Nearest-Neighbor Methods in Learning and Vision: Theory and Practice [2]

- Ertöz, Levent, Michael Steinbach, and Vipin Kumar. "Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data." 2003. [3]