# ML Wiki

## Frequent Word Patterns

Frequent word patters is a technique of Local Pattern Discovery applied to documents

• we can see a document as a transaction and words like items
• then want to find frequent itemsets of words in these documents - like in Frequent Patterns Mining with Apriori or Eclat
• frequent itemset $\equiv$ frequent wordset

## Document Clustering

We can use FPM for Term Clustering

• cluster = all documents that contain a certain frequent term set
• so frequent term sets describe clusters
• note that here clustering is not strict (it's Fuzzy Clustering): it allows some overlap between clusters
• which is sometimes natural in text documents

Problem formalization

• let $R$ be set of chosen frequent term sets (FTS)
• $f_i$ be the # of FTSs from $R$ contained in document $d_i$
• we put a constraint on $f_i$: it must be at least one to ensure complete coverage (there should be no documents without category)
• we want: minimize the average value of $f_i - 1$

Algorithm:

• at each iteration
• pick FTS with minimal overlap with other clusters
• see more in the reference

## References

• Beil, Florian, Martin Ester, and Xiaowei Xu. "Frequent term-based text clustering." 2002. [1]

## Sources

• Aggarwal, Charu C., and ChengXiang Zhai. "A survey of text clustering algorithms." Mining Text Data. Springer US, 2012. [2]

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".