Chameleon Clustering
Combines initial partition of data with hierarchical clustering techniques
it modifies clusters dynamically
Step1:
- Generate a KNN graph
- because it's local, it reduces influence of noise and outliers
- provides automatic adjustment for densities
Step2:
- use METIS: a graph partitioning algorithm
- get equally-sized groups of well-connected vertices
- this produces "sub-clusters" - something that is a part of true clusters
Step3:
- recombine sub-clusters
- combine two clusters if
- they are relatively close
- they are relatively interconnected
- so they are merged only if the new cluster will be similar to the original ones
- i.e. when "self-similarity" is preserved (similar to the join operation in Scatter/Gather)
But
- Curse of Dimensionality makes similarity functions behave poorly
- distances become more uniform as dimensionality grows
- and this makes clustering difficult
Similarity between two point of high dimensionality can be misleading
- often points may be similar even though they should belong to different clusters
Solutions:
References
- Karypis, George, Eui-Hong Han, and Vipin Kumar. "Chameleon: Hierarchical clustering using dynamic modeling." (1999). [1]
Sources
- Ertöz, Levent, Michael Steinbach, and Vipin Kumar. "Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data." 2003. [2]