TODO: see Scalable Data Analytics and Data Mining AIM3 (TUB) lectures
Canopy Clustering
repeat
- sample a point
- form a group around this point
- other points that are within some similarity threshold
- remove closest points
result
- set of (potentially overlapping) groups
- they are much smaller than the original dataset
canopies reduce the computation time:
Links