This is a stub. Edit it. |

Textual data (esp in Vector Space Models) suffers from the Curse of Dimensionality

- so need to use Dimensionality Reduction

But often we can use the following idea:

- correlation in high-dimensional data is usually local (esp. in text data)
- for some data items features are correlated, but for some the same features are not

Subspace clustering:

- it's the task of detecting all clusters in all subspaces
- a data point may belong to many different clusters - with each cluster in some subspace

Survey paper:

- Parsons, Lance, Ehtesham Haque, and Huan Liu. "Subspace clustering for high dimensional data: a review." (2004). [1]
- Domeniconi, Carlotta, et al. "Subspace Clustering of High Dimensional Data." SDM. Vol. 73. 2004. [2]

There are two types of subspace clustering

- based on how they determine a measure of locality for evaluating subspaces
- bottom-up subspace search
- top-down search

Use downward closure property for density to reduce the search space (Apriori-style)

CLIQUE:

- Agrawal, Rakesh, et al. Automatic subspace clustering of high dimensional data for data mining applications.1998. [3]

ENCLUS

- Cheng, Chun-Hung, Ada Waichee Fu, and Yi Zhang. "Entropy-based subspace clustering for mining numerical data." 1999. [4]

MAFIA

- Goil, Sanjay, Harsha Nagesh, and Alok Choudhary. "MAFIA: Efficient and scalable subspace clustering for very large data sets." 1999. [5]

CBF

- Chang, Jae-Woo, and Du-Seok Jin. "A new cell-based clustering method for large, high-dimensional data in data mining applications.", 2002. [6]

CLTree

- clustering via decision trees

DOC

- Procopiuc, Cecilia M., et al. "A Monte Carlo algorithm for fast projective clustering." 2002. [7]

PROCLUS

- Aggarwal, Charu C., et al. "Fast algorithms for projected clustering." 1999. [8]

ORCLUS

- Aggarwal, Charu C., and Philip S. Yu. Finding generalized projected clusters in high dimensional spaces. 2000. [9]

FINDIT

- Woo, Kyoung-Gu, et al. "FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting." (2004). [10]

$\beta$-Clusters

- Yang, Jiong, et al. "δ-clusters: Capturing subspace correlation in a large data set." 2002. [11]

COSA

- Friedman, Jerome H., and Jacqueline J. Meulman. "Clustering objects on subsets of attributes (with discussion)." (2004). [12]

Text data is very high-dimensional

- creates problems for Document Clustering

Adaptive subspace iteration:

- reduce data
- identify subspaces

- Jing, Liping. "Survey of text clustering." (2008). [13]
- http://en.wikipedia.org/wiki/Clustering_high-dimensional_data