# ML Wiki

TODO: see Scalable Data Analytics and Data Mining AIM3 (TUB) lectures

## Canopy Clustering

repeat

• sample a point
• form a group around this point
• other points that are within some similarity threshold
• remove closest points

result

• set of (potentially overlapping) groups
• they are much smaller than the original dataset

canopies reduce the computation time: