CURE Algorithm for Clustering
Use a set of representative points to find non-global clusters
- these points capture the geometry and shape of clusters
TODO: see Scalable Data Analytics and Data Mining AIM3 (TUB) lectures
Choose points
- 2 farthest away points
- 3 and so on - furthest away from previous ones
- this procedure guarantees that the points are well distributed
Then shrink the points towards the centroids by factor of $\alpha$
CURE eliminates outliers by discarding small slowly growing clusters
- but it has a notion of center - not all shapes has natural center
- Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. "Cure: an efficient clustering algorithm for large databases." (2001) [1]