Co-Clustering

Co-clustering is a set of techniques in Cluster Analysis


Co-clustering is also called bi-clustering

  • let $A$ be $m \times n$ matrix,
  • goal is to generate biclusters/co-clusters: a subset of rows which exhibit similar behavior across a subset of columns, or vice versa.


Co-clustering is defined as two map functions:

  • rows -> row cluster indexes
  • columns -> column cluster indexes
  • these map functions are learned simultaneously
  • Unlike Two-Phase Document Clustering where we first cluster columns and then we use this to cluster rows


Subspace Clustering

Can use subspace clustering for co-clustering

  • subspace clustering $\approx$ local feature selection


Non-Negative Matrix Factorization

One way of doing Co-Clustering is via NMF:

  • let $A = UV^T$ where $U$ is $m \times k$ and $V$ is $n \times k$
  • then rows of $U$ may correspond to clusters of rows, and rows of $V$ to clusters of columns


References

  • Dhillon, Inderjit S. "Co-clustering documents and words using bipartite spectral graph partitioning." 2001. [1]
  • Dhillon, Inderjit S., Subramanyam Mallela, and Dharmendra S. Modha. "Information-theoretic co-clustering." 2003. [2]
  • Li, Tao, Sheng Ma, and Mitsunori Ogihara. "Document clustering via adaptive subspace iteration." 2004. [3]

Sources