Concurrency Control

In a typical relational database when we modify a table, we put a lock - and all other clients that want to access the table are queued

  • This sequential execution of tasks wastes a lot of processor's power and time:
  • under high load it may spend a lot of time trying to figure out whose turn is next

MVCC, Multi-Version Concurrency Control, is one of the Concurrency Control techniques.

  • it is used to manage concurrent access to the data
  • it runs effectively even under high load, without worrying about queuing requests

couchdb-concurrency.png

(figure source: [1])


This is used in:


Ways to Achieve

B-Tree Storage Engine

This way it is achieved in CouchDB

B-Tree is used everywhere, also for internal data: documents and views

  • Usage of this data structure imposes an important restriction: can access only by key.
  • Reason: to be make huge performance gains

Modification in B-Trees to support MVCC:

  • reads and writes without locking the system
  • writes do not block reads
  • this is because of append-only design


Append-only design

  • old versions are not deleted
  • every time something is updated, a new node is created, and a new root as well
    • but old reads still have references to the old root,
    • so they are able to continue reading without being interrupted,
    • i.e. have old consistent state
  • data never lost and never corrupted

Idea:

  • When a new request comes, it uses the current root for querying
  • If the root changes by other transaction, this request is still active and will get the old valid and consistent value


Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion