(Spring 2013)
Syllabus
Part 1: Data Manipulation, at Scale
- Databases
- Traditional Relational Databases and Relational Algebra
- MapReduce
- NoSQL
- Document-Oriented Databases and Eventual Consistency
- Column-Oriented Databases
- Tradeoffs of SQL and NoSQL
- Data cleaning, entity resolution, data integration, information extraction
Part 2: Analytics
- Basic statistical modeling, experiment design
- Introduction to Machine Learning
- Supervised Learning: decision trees/forests, simple nearest neighbor
- Unsupervised learning: K-Means, multi-dimensional scaling
Part 3: Interpreting and Communicating Results
- Visualization, visual data analytics