Data Mining
Data mining - methods and algorithms to explore and analyze large volumes of data
Goal: to find patterns in data that are
- valid: with some certainty
- e.g. everybody speaks English in Blois - not true
- novel: non obvious for a human
- everybody speaks French in Blois - obvious
- useful: can do something with extracted knowledge
- understandable for humans
What is DM
What is NOT Data Mining:
- look up a phone number in a dictionary
- compute the number of customers who bought iPad in August
- can use SQL for that
What is Data Mining:
- What is the profile of the customers who bought iPad?
- Which customers will buy the new iPhone?
- Which customers will buy which products?
Origins
DM is a discipline with roots from
Main Focuses
DM is mostly used
- Customer Relationship Management (CRM)
- churn scoring - predict if a customer leaves to a competitor
- direct marketing - show ads only to whose who are interested
- credit scoring
- sales forecasting
- etc
- website/search optimization
- supply chain optimization
- many others
Types of Data Mining
Local Pattern Discovery
Sequence Mining:
Others
Data Mining Process
CRISP-DM (CRoss Industry Standard Process for Data Mining)
Business Understanding
- Define the success criteria
- How to integrate the output with existing technologies?
Data Understanding
Data Preparation
Data Modeling
Evaluation
Links
Source