Co-Clustering
Co-clustering is a set of techniques in Cluster Analysis
Co-clustering is also called bi-clustering
- let $A$ be $m \times n$ matrix,
- goal is to generate biclusters/co-clusters: a subset of rows which exhibit similar behavior across a subset of columns, or vice versa.
Co-clustering is defined as two map functions:
- rows -> row cluster indexes
- columns -> column cluster indexes
- these map functions are learned simultaneously
- Unlike Two-Phase Document Clustering where we first cluster columns and then we use this to cluster rows
Can use subspace clustering for co-clustering
- subspace clustering $\approx$ local feature selection
One way of doing Co-Clustering is via NMF:
- let $A = UV^T$ where $U$ is $m \times k$ and $V$ is $n \times k$
- then rows of $U$ may correspond to clusters of rows, and rows of $V$ to clusters of columns
References
- Dhillon, Inderjit S. "Co-clustering documents and words using bipartite spectral graph partitioning." 2001. [1]
- Dhillon, Inderjit S., Subramanyam Mallela, and Dharmendra S. Modha. "Information-theoretic co-clustering." 2003. [2]
- Li, Tao, Sheng Ma, and Mitsunori Ogihara. "Document clustering via adaptive subspace iteration." 2004. [3]
Sources