Canopy Clustering
repeat
- sample a point
- form a group around this point
- other points that are within some similarity threshold
- remove closest points
result
- set of (potentially overlapping) groups
- they are much smaller than the original dataset
canopies reduce the computation time:
- for Cluster Analysis
- in general, for KNN queries
Links
- http://en.wikipedia.org/wiki/Canopy_clustering_algorithm
- http://www.kamalnigam.com/papers/canopy-kdd00.pdf