TODO: see Scalable Data Analytics and Data Mining AIM3 (TUB) lectures


Canopy Clustering

repeat

  • sample a point
  • form a group around this point
    • other points that are within some similarity threshold
  • remove closest points


result

  • set of (potentially overlapping) groups
  • they are much smaller than the original dataset


canopies reduce the computation time:


Links

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion