ML Wiki

Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Data Reduction

data-mining

Data Reduction

In Machine Learning and Data Mining

How to speed up computation for our model?

There are two approaches:

reducing the number of rows
reducing the number of columns

Rows

The main approach is to randomly select a subset of the dataset

Sampling

Columns

Main approach: remove dependent variables

Correlation coefficient for two Quantitative Variables
Chi-square Test of Independence for two Categorical Variables
One-Way ANOVA F-Test for quantitative vs categorical case

Other techniques:

Principal Component Analysis and Singular Value Decomposition

See Also

Bivariate Analysis

Sources

Data Mining (UFRT)

✏️ Edit on GitHub