Data Mining
Data mining - methods and algorithms to explore and analyze large volumes of data
Goal: to find patterns in data that are
- valid: with some certainty
- e.g. everybody speaks English in Blois - not true
- novel: non obvious for a human
- everybody speaks French in Blois - obvious
- useful: can do something with extracted knowledge
- understandable for humans
What is DM
What is NOT Data Mining:
- look up a phone number in a dictionary
- compute the number of customers who bought iPad in August
- can use SQL for that
What is Data Mining:
- What is the profile of the customers who bought iPad?
- Which customers will buy the new iPhone?
- Which customers will buy which products?
Origins
DM is a discipline with roots from
- Artificial Intelligence
- Statistics
- Machine Learning
- Pattern Recognition
- Cognitive Science
- Database Systems
Main Focuses
DM is mostly used
- Customer Relationship Management (CRM)
- churn scoring - predict if a customer leaves to a competitor
- direct marketing - show ads only to whose who are interested
- credit scoring
- sales forecasting
- etc
- website/search optimization
- supply chain optimization
- many others
Types of Data Mining
Rule Mining
Sequence Mining:
Graph Mining
- Social Network Mining
Others
- Cluster Analysis
- Web Mining
- Text Mining - part of Natural Language Processing and Information Retrieval
- Stream Mining
- Tree Mining
- Preference Mining
Data Mining Process
CRISP-DM (CRoss Industry Standard Process for Data Mining)
Business Understanding
- Define the success criteria
- How to integrate the output with existing technologies?
Data Understanding
- Collect the data from Data Sources
- Summarizing Data: First Look at the Data
- Exploratory Data Analysis
- Univariate Analysis - to analyze how variable values behave in isolation
- Bivariate Analysis - to analyze how two variables interact
Data Preparation
- Need to prepare data so it can be processed by Models
- Data Cleaning
- Data Transformation
- Data Reduction
Data Modeling
Evaluation
Links
- http://en.wikipedia.org/wiki/Data_mining
- nice DM&ML slides link
- Data Mining syllabus in Boston College link
- Data Mining map by Saed Sayad link
