TODO: see Scalable Data Analytics and Data Mining AIM3 (TUB) lectures


Canopy Clustering

repeat

  • sample a point
  • form a group around this point
    • other points that are within some similarity threshold
  • remove closest points


result

  • set of (potentially overlapping) groups
  • they are much smaller than the original dataset


canopies reduce the computation time:


Links