ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Data Mining

Data Mining

Data mining - methods and algorithms to explore and analyze large volumes of data

Goal: to find patterns in data that are

  • valid: with some certainty
    • e.g. everybody speaks English in Blois - not true
  • novel: non obvious for a human
    • everybody speaks French in Blois - obvious
  • useful: can do something with extracted knowledge
  • understandable for humans

What is DM

What is NOT Data Mining:

  • look up a phone number in a dictionary
  • compute the number of customers who bought iPad in August
  • can use SQL for that

What is Data Mining:

  • What is the profile of the customers who bought iPad?
  • Which customers will buy the new iPhone?
  • Which customers will buy which products?

Origins

DM is a discipline with roots from

Main Focuses

DM is mostly used

  • Customer Relationship Management (CRM)
    • churn scoring - predict if a customer leaves to a competitor
    • direct marketing - show ads only to whose who are interested
    • credit scoring
    • sales forecasting
    • etc
  • website/search optimization
  • supply chain optimization
  • many others

Types of Data Mining

Rule Mining

Local Pattern Discovery

Sequence Mining:

Graph Mining

  • Social Network Mining

Others

Data Mining Process

CRISP-DM (CRoss Industry Standard Process for Data Mining)

  • Image

Business Understanding

  • Define the success criteria
  • How to integrate the output with existing technologies?

Data Understanding

Data Preparation

Data Modeling

Evaluation

  • http://en.wikipedia.org/wiki/Data_mining
  • nice DM&ML slides [http://www.evernote.com/shard/s344/sh/284d7df3-ef98-41d3-9de5-9cbc4ad4b800/77713ac8ce6e2d4b52e2b5c63e7fe2f5]
  • Data Mining syllabus in Boston College [http://www.evernote.com/shard/s344/sh/da3d2ca3-390f-4a0b-b443-b1773c7c24d4/9ad3c26bd0ef9e637d8bdce2011db309]
  • Data Mining map by Saed Sayad [http://www.saedsayad.com/data_mining_map.htm]

Source