Sources Index
Only papers I read and used as sources (or small books that don’t deserve a separate wiki page)
- ordered by first author
- ABCDEFGHIJKLMNOPQRSTUVWXYZ
A
- Aggarwal, Charu C., and ChengXiang Zhai. “A survey of text clustering algorithms.” Mining Text Data. 2012. Document Clustering, K-Means, K-Medoids, Co-Clustering, Two-Phase Document Clustering, Non-Negative Matrix Factorization, Semi-Supervised Clustering, Topic Models, Probabilistic LSA, Term Strength, Term Contribution, Stop Words
B
C
- Cristianini, Nello, John Shawe-Taylor, and Huma Lodhi. “Latent semantic kernels.” 2002. [Kernel Methods(Kernel_Methods) Latent Semantic Kernels
- Cutting, et al. “Scatter/gather: A cluster-based approach to browsing large document collections.” 1992. [Scatter/Gather(Scatter_Gather)
D
- Datar, Mayur, et al. “Locality-sensitive hashing scheme based on p-stable distributions.” 2004. [Locality Sensitive Hashing(Locality_Sensitive_Hashing), Euclidean LSH
- De Smet, Yves. “An introduction to multicriteria decision aid: The PROMETHEE and GAIA methods.” PROMETHEE
- Deerwester, Scott C., et al. “Indexing by latent semantic analysis.” 1990. [Latent Semantic Analysis(Latent_Semantic_Analysis)
- Domingos, Pedro. “A few useful things to know about machine learning.” 2012. [Overfitting(Overfitting)
E
- Elsayed, Tamer, Jimmy Lin, and Douglas W. Oard. “Pairwise document similarity in large collections with MapReduce.” 2008. [Inverted Index(Inverted_Index)
- Ertöz, Levent et al. “Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data.” 2003. [Document Clustering(Document_Clustering), DBSCAN, SNN Clustering, Euclidean Distance, Curse of Dimensionality, Chameleon Clustering, CURE Clustering, ROCK Clustering
F
G
- Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. “Similarity search in high dimensions via hashing.” 1999. [Locality Sensitive Hashing(Locality_Sensitive_Hashing), Bit Sampling LSH
H
- Hopcroft, John, and Ravindran Kannan. “Foundations of Data Science1.” 2014. Power Iteration
I
J
- Jauregui, Jeff. “Principal component analysis with linear algebra.” 2012. [SVD(SVD), Principal Component Analysis
- Jing, Liping. “Survey of text clustering.” 2008. [Vector Space Model(Vector_Space_Model), Document Clustering, Cluster Analysis, Subspace Clustering, Semi-Supervised Clustering
K
- Kalman, Dan. “A singularly valuable decomposition: the SVD of a matrix.” 1996. [SVD(SVD)
- Koll, Matthew B. “WEIRD: An approach to concept-based information retrieval.” 1979. Latent Semantic Analysis
- Korenius, Tuomo, Jorma Laurikkala, and Martti Juhola. “On principal component analysis, cosine and Euclidean measures in information retrieval.” 2007. [Principal Component Analysis(Principal_Component_Analysis), Latent Semantic Analysis, Distance Functions, Cosine Similarity, Euclidean Distance
- Kristianto, et al. “Extracting definitions of mathematical expressions in scientific papers.” 2012. [Mathematical Definition Extraction(Mathematical_Definition_Extraction), Math-Aware POS Tagging
- Kristianto, et al. “Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers.” 2014. [Mathematical Definition Extraction(Mathematical_Definition_Extraction)
L
- Landauer, T. et al. “An introduction to latent semantic analysis.” 1998. [Latent Semantic Analysis(Latent_Semantic_Analysis)
- Larsen, Bjornar et al. “Fast and effective text mining using linear-time document clustering.” 1999. [Document Clustering(Document_Clustering)
- Li, Yong H., et al. “Classification of text documents.” 1998. [Term Clustering(Term_Clustering)
- Liu, Tao, et al. “An evaluation on feature selection for text clustering.” 2003. [Term Contribution(Term_Contribution)
M
N
O
- Oikonomakou, Nora, and Michalis Vazirgiannis. “A review of web document clustering approaches.” Data mining and knowledge discovery handbook. 2010. [Cluster Analysis(Cluster_Analysis) Agglomerative Clustering K-Means
- Osinski, Stanislaw. “Improving quality of search results clustering with approximate matrix factorisations.” 2006. [Non-Negative Matrix Factorization(Non-Negative_Matrix_Factorization)
P
- Pagael, Rober, and Moritz Schubotz. “Mathematical Language Processing Project.” 2014. [Mathematical Definition Extraction(Mathematical_Definition_Extraction) Math-Aware POS Tagging
- Paulevé, Loïc, et al. “Locality sensitive hashing: A comparison of hash function types and querying mechanisms.” 2010. [Locality Sensitive Hashing(Locality_Sensitive_Hashing), K-Means LSH
Q
R
S
- Salton, et al. “A vector space model for automatic indexing.” 1975. [Vector Space Model(Vector_Space_Model)
- Salton, Buckley. “Term-weighting approaches in automatic text retrieval.” 1988. [TF-IDF(TF-IDF)
- Schelter, Sebastian, et al. “Efficient Sample Generation for Scalable Meta Learning.” [2014. Meta Learning(Meta_Learning)
- Schöneberg et al. “POS Tagging and its Applications for Mathematics.” 2014. Math-Aware POS Tagging
- Sculley, David. “Web-scale k-means clustering.” 2010. [K-Means(K-Means)
- Sebastiani, Fabrizio. “Machine learning in automated text categorization.” 2002. [Document Classification(Document_Classification), Term Clustering
- Slaney, Malcolm, and Michael Casey. “Locality-sensitive hashing for finding nearest neighbors [lecture notes].” 2008. [Locality Sensitive Hashing(Locality_Sensitive_Hashing), Euclidean LSH
- Steinbach, Michael, et al. “A comparison of document clustering techniques.” 2000. Document Clustering, K-Means
- Strang, Gilbert. “The fundamental theorem of linear algebra.” 1993. [SVD(SVD)
T
U
V
W
- Wilbur, W. John, “The automatic identification of stop words.” 1992. [Stop Words(Stop_Words), Term Strength
X
- Xu, Wei, Xin Liu, and Yihong Gong. “Document clustering based on non-negative matrix factorization.” 2003. [Cluster Analysis(Cluster_Analysis), Non-Negative Matrix Factorization
Y
Z
- Zhai, ChengXiang. “Statistical language models for information retrieval.” (Book) 2008. Information Retrieval, Statistical Language Models, Multinomial Distribution, Smoothing for Language Models, TF-IDF, Probabilistic Retrieval Model
- Zhukov, Leonid, and David Gleich. “Topic identification in soft clustering using PCA and ICA”. 2004. [Latent Semantic Analysis(Latent_Semantic_Analysis)