To scale incrementally, Distributed Databases need a mechanism to dynamically partition over a set of nodes. Consistent Hashing is one of them: it allows to distribute load across several nodes.
"Regular" hashing
- need assign M data keys to N servers
- assign each key to server number k mod N
What happens if we increase a number of serves from N to 2N?
Every existent key will have to be remapped.
Consistent Hashing approach
- In consistent hashing a hash function is viewed as a ring: largest hash values wrap around to smallest
- The ring is divided onto N regions (N - number of servers)
- Each server has its own key region (its "position" on the ring)
- ⇒ adding or removing a node affects only direct neighbors
for example
- the key region for 2nd server is the area between 1 and 2
- and only 2 is responsible for keys in that region
Suppose we want to add a new server
- we just pick some area
- and divide it on 2 parts
- and then assign the new server one of these two
- the keys that happen to be in that region are moved to the new server
So routing is simple in this schema:
- each server knows the key range which it manages
- so we can route the request to the server that is closes to the key we're looking for
Virtual Nodes
There are some challenges with this basic approach
- random position assignment may lead to non-uniform data/load distribution
- heterogeneity is performance is assumed (that is, we assume that all the servers have same performance)
A variant of Consistent Hashing algorithm addresses this issue:
- instead of mapping a single node to the ring,
- each node gets multiple points there
- so each node has several virtual nodes
A virtual node looks like a single node, but it refers to the real node.
Advantages
- if a node becomes unavailable, the load is distributes across the remained nodes uniformly (not just the closest neighbor gets all the load)
- and when a new node is added, it gets roughly equivalent amount of load from each node
- number of virtual nodes is chosen based on the capabilities of a node
Sources