Consistency (databases)

databases distributed-systems

Consistency

For databases, ‘‘consistency’’ means satisfying integrity constraints, which are about the correctness of the data in a database. So a database is ‘‘consistent’’ if all its constraint are satisfied.

Integrity Constraints

Some of the integrity constraints are:

entity integrity constraint (a primary key cannot be null)
referential integrity constraint (if a tuple $X$ in one relation refers to some other tuple $Y$ in another relation, $Y$ must always exist in that relation)

Examples of predicates that must hold:

$x$ is a key of relation $R$
Functional Dependency $x \to y$ holds in $R$
domain($x$) = {Red, Green, Blue} - the only allowed values
no employee should make more than twice average salary (achieved with triggers in Active Databases)

In a database to specify if data is valid we use constraints.

Transaction Constraints

Transaction Consistency

essentially involve two database states: the old state (before $T$) and the new state (after $T$)
but ‘'’always’’’ maintaining a database in a consistent state is impossible

Example:

we have $n$ accounts in a bank: $a_1, …, a_n$
suppose that we store the total sum somewhere in the database
constraint: $a_1 + … + a_n = \text{TOTAL}$
but during a transaction the database may be in inconsistent state
transaction: deposit 100 USD to $a_2$
to do that we need:
- update $a_2: a_2 \leftarrow a_2 + 100$
- (at this moment the constraint is not satisfied)
- update TOTAL: $\text{TOTAL} \leftarrow \text{TOTAL} + 100$
so during the transaction we’ll have a state in which the DB is not consistent

We can define a ‘‘transaction’’ as a sequence of updates on the database.

It ‘‘preserves consistency’’ if executing it brings a database from one consistent state to another.
The database doesn’t have to be consistent during the transaction.
For transactions, consistency is the letter “C” in the ACID.
And a transaction should happen in Isolation (Letter “I” in ACID)

Crash Recovery

But what if during the execution of a transaction a crash occurs?

if we take no action the database will be left in an inconsistent state
main techniques: Database Transaction Logs
- Undo Logging, Redo Logging, Undo/Redo Logging

Consistency Models

For Distributed Databases maintaining consistency is harder. Consistency models determine rules for ‘‘visibility’’ and ‘‘order’’ of updates.

Strict Consistency

every replica sees every update in the same order
all reads return the most up-to-date data no matter what replica is asked
need to employ some techniques for commit propagation, for example, Two-Phase Commit
according to the CAP Theorem, cannot achieve strict consistency at the same time with partition-tolerance

Eventual Consistency

order in which updates received is important
as $t \to \infty$ all readers will see the writes
but updates are not atomic as in case of Strict Consistency

Weak Consistency

every replica will see updates
but there’s no guarantee on the order

in this case later updates may be overwritten by earlier ones because they arrived later