Data Mining

Data mining - methods and algorithms to explore and analyze large volumes of data

Goal: to find patterns in data that are

  • valid: with some certainty
    • e.g. everybody speaks English in Blois - not true
  • novel: non obvious for a human
    • everybody speaks French in Blois - obvious
  • useful: can do something with extracted knowledge
  • understandable for humans

What is DM

What is NOT Data Mining:

  • look up a phone number in a dictionary
  • compute the number of customers who bought iPad in August
  • can use SQL for that

What is Data Mining:

  • What is the profile of the customers who bought iPad?
  • Which customers will buy the new iPhone?
  • Which customers will buy which products?


DM is a discipline with roots from

Main Focuses

DM is mostly used

  • Customer Relationship Management (CRM)
    • churn scoring - predict if a customer leaves to a competitor
    • direct marketing - show ads only to whose who are interested
    • credit scoring
    • sales forecasting
    • etc
  • website/search optimization
  • supply chain optimization
  • many others

Types of Data Mining

Rule Mining

Local Pattern Discovery

Sequence Mining:

Graph Mining

  • Social Network Mining


Data Mining Process

CRISP-DM (CRoss Industry Standard Process for Data Mining)

  • datamining-process.png

Business Understanding

  • Define the success criteria
  • How to integrate the output with existing technologies?

Data Understanding

Data Preparation

Data Modeling