Distributed Databases

databases distributed-systems nosql

Architectures

Evolution of Distributes DBs

Logical multi-processor database design:

Assumptions that we make about distributes systems

Data size

Reliability

the system must be highly available to serve all applications
nodes may occasionally crash
but data must be safe
$\to$ therefore we need to replicate each row to multiple nodes and remain available despite failures

Performance

for real-time use
95/99 percentile is more important than the average latency (we care about longest latency measures)
want it run on cheap commodity hardware
$\to$ need to be able to maintain low latency even during recovery operations

Partitioning / Incremental Scalability

Scale out one node at a time with minimal impact (with techniques like Consistent Hashing)

Symmetry

Decentralization

Heterogeneity

work distribution must be proportional to the capabilities of individual servers
don’t need to upgrade old servers when adding a newer one

Introduction to Data Science (coursera)
Design Patterns for Distributed Non-relational Databases
Amazon’s Dynamo paper [http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf]