Canopy Clustering


  • sample a point
  • form a group around this point
    • other points that are within some similarity threshold
  • remove closest points


  • set of (potentially overlapping) groups
  • they are much smaller than the original dataset

canopies reduce the computation time:


