CAP Theorem

CAP stands for Consistency, Availability and Partition tolerance

Consistency

In a consistent system values of any objects don't contradict each other

  • Do all applications see the same data?

Availability

An available system is always usable

  • If some nodes fail, does everything still works?

Partition Tolerance

If two parts of your system cannot communicate to each other, can they proceed on their own?

  • if not - sacrifice availability
  • if yes - you need to sacrifice consistency

If a system is partition-tolerant, then it can continue to operate even in presence of failures


Theorem: It is impossible to implement a distributed system which will have all three mentioned properties. Only 2 of the 3 is possible to achieve


Choices

  • Consequently, one of the 3 must be abandoned
  • Different databases choose different options:


cap-triangle.png


Consistency and Availability

  • But no Partition tolerance
  • This is typically preferred by RDBMSs - which is why they usually don't offer scalability
  • Easy to achieve ACID under C+A
  • Need special algorithms to ensure consistency (like Two-Phase Commit)

Consistency and Partition tolerance

  • But no availability
  • HBase chooses consistency and partitioning (no availability)

Availability and Partition tolerance

See also

Sources