# ML Wiki

## Agglomerative Clustering

General concept: merge items into clusters based on distance/similarity

• usually based on best pairwise similarity

Typical steps:

• at the beginning each document is a cluster on its own
• then we compute similarity between all pairs of clusters and store the results in a similarity matrix
• merge two most similar clusters
• update the similarity matrix
• repeat until everything belongs to the same cluster

How to join two clusters?

Suppose we have clusters $A, B, C, ...$ that we want to merge

TODO: need more mathematical definitions

Merge two groups $A$ and $B$ based on their closest pair

Implementation:

• compute all similarity pairs
• sort them in order of decrease
• process pairs in this order

• efficient to implement
• equivalent to a Spanning Tree algo on the complete graph of pair-wise distances TODO: Link to Algo 2 from Coursera!
• can use Prim's Spanning Tree algo

Drawbacks

• encourages chaining
• similarity is usually not transitive:
• i.e. if $A$ is similar to $B$, and $B$ is similar to $C$, it doesn't mean that $A$ must be similar to $C$
• but single linkage encourages grouping through transitivity chains

References:

• Sibson, Robin. "SLINK: an optimally efficient algorithm for the single-link cluster method." 1973. [1]

Worst-case similarity:

• avoids chaining altogether
• but it's very expensive computationally

References:

• Defays, Daniel. "An efficient algorithm for a complete link method." 1977. [2]

similarity between groups $A$ and $B$ are calculated as average similarity between each $a \in A$ and $b \in B$

• It's way slower than single linkage,
• but it's more robust: it doesn't show the chaining behavior

Speeding up:

• can approximate it by using the distance between centroids: mean doc in $A$ and mean doc in $B$

### Ward's Method

Merge the pair of clusters that minimizes the total within-group error (sum of squares) between each document and centroid

Result:

• spherical tightly bound clusters
• less sensitive to outliers

References:

• El-Hamdouchi, Abdelmoula, and Peter Willett. "Hierarchic document classification using Ward's clustering method." 1986. [3]

### Pros and Cons

• Single-link algorithms are best for capturing clusters of different sizes and shapes
• but it's also sensitive to noise
• complete link and group average are not affected by noise, but have a bias towards finding global patterns

Computational complexity:

• only Single-Link is computationally possible for large datasets, but it doesn't give good results because uses too little information

## References

• Rasmussen, Edie M. "Clustering Algorithms." Information retrieval: data structures & algorithms 1992. [4]
• Voorhees, Ellen M. "Implementing agglomerative hierarchic clustering algorithms for use in document retrieval." 1986. [5]

## Sources

• Ertöz, Levent, Michael Steinbach, and Vipin Kumar. "Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data." 2003. [6]
• Oikonomakou, Nora, and Michalis Vazirgiannis. "A review of web document clustering approaches." Data mining and knowledge discovery handbook. 2010. [7]