CURE Algorithm for Clustering

Use a set of representative points to find non-global clusters

  • these points capture the geometry and shape of clusters

TODO: see Scalable Data Analytics and Data Mining AIM3 (TUB) lectures


Choose points

  • 2 farthest away points
  • 3 and so on - furthest away from previous ones
  • this procedure guarantees that the points are well distributed

Then shrink the points towards the centroids by factor of $\alpha$


CURE eliminates outliers by discarding small slowly growing clusters

  • but it has a notion of center - not all shapes has natural center


References

  • Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. "Cure: an efficient clustering algorithm for large databases." (2001) [1]

Sources