Distance
A metric function (or distance) is a generalization of geometric distance (i.e. Euclidean Distance)
Direct similarity measures are not always reliable for high-dimensional clustering (see Guha1999)
Similarity is the opposite of distance
- usually can be turned to distance
- Cosine Similarity and Dot Product
- Jaccard Coefficient
Non-metric
Resources
- Strehl, Alexander, Joydeep Ghosh, and Raymond Mooney. “Impact of similarity measures on web-page clustering.” 2000. [http://strehl.com/download/strehl-aaai00.pdf]
- Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. “ROCK: A robust clustering algorithm for categorical attributes.” 1999. [http://www.cacs.louisiana.edu/~jyoon/grad/adb/References/clustering/ROCK-clus99icde.pdf]
Source
- http://en.wikipedia.org/wiki/Metric_%28mathematics%29