Data Mining Process


  • CRISP-DM (CRoss Industry Standard Process for Data Mining)
  • datamining-process.png
  • there are 6 steps

CRISP-DM: four levels of abstraction

  • Phases
    • Example: Data Preparation
  • Generic Tasks
  • Specialized Task
    • A specific task that belongs to a generic task
    • Example: Missing Value Handling
  • Process Instance
    • How a specific task is carried out?
    • Example: The mean value for numeric attributes and the most frequent for categorical attributes

Business Understanding

Main Objectives

  • Define the success criteria
  • Forms of output?
  • How to integrate the output with existing technologies?

Data Understanding

Main Objectives

Data Preparation

Need to prepare data so it can be processed by Models


Prediction Tasks

  • models to predict unknown or future values
  • Classification Models: predict a categorical value
  • Regression Models: predict a continuous value

Description Tasks

  • Goal: find patterns / clusters that describe a data set
  • Cluster Analysis: find clusters in data
  • Extraction of local patterns: find local properties in a data set


Main Questions

Objective Measures:

  • Error rate of a classifier - Error Metrics
  • Conference of associative rules

Subjective Measures: