Data Reduction
In Machine Learning and Data Mining
- How to speed up computation for our model?
There are two approaches:
- reducing the number of rows
- reducing the number of columns
Rows
The main approach is to randomly select a subset of the dataset
Columns
Main approach: remove dependent variables
- Correlation coefficient for two Quantitative Variables
- Chi-square Test of Independence for two Categorical Variables
- One-Way ANOVA F-Test for quantitative vs categorical case
Other techniques: