ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

DBSCAN

DBSCAN

It’s a density-based clustering algorithm

Density associated with a point is obtained by counting the number of points in a region of specified radius $\epsilon$ around each point

  • points with density $\geqslant \text{min_pts}$ are considered as “core points”
  • noise and non-core points are discarded
  • clusters are formed around the core points
  • if two core points are within a radius $\epsilon$, then they belong to the same cluster

Disadvantages

  • can find clusters of different shapes, but can’t find clusters of different densities

Extensions

SNN Clustering

  • an extension of DBSCAN that words better for high-dimensional data
  • also can find clusters of different density

References

  • Ester, Martin, et al. “A density-based algorithm for discovering clusters in large spatial databases with noise.” 1996. [http://www.aaai.org/Papers/KDD/1996/KDD96-037]

Sources

  • Ertöz, Levent, Michael Steinbach, and Vipin Kumar. “Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data.” 2003. [http://static.msi.umn.edu/rreports/2003/73.pdf]