Concept Decomposition

We can use clustering for Dimensionality Reduction of text data

do clustering

  • most frequent terms in the centroids are the basis
  • they are almost orthogonal - they shouldn't appear a lot in other clusters
  • then represent each document in terms of this basis



