NoSQL

databases nosql

RDBMSs / Row-Oriented databases

replication solutions are limited

use traditional replication algorithms to give strong consistency (like Two-Phase Commit)
but data is not made available until the commit finishes (and the database is back to the consistent state)

not an option for systems where network failures are possible

- as the volume of data grows, queries become inefficient - not easily scalable

- need to wait too long for all replicas to finish with commit

Are better for storing large amounts of data, especially when the number of columns is very large

Sets of columns are stored together, so a particular record is actually split across several blocks
Within each block data is stored in sorted order
Need to maintain “join index” - to pull together different blocks that are for the same record
Especially good for analytical queries (such as OLAP)

Main unit of data is a document - a self-contained (typically) record with all information at hand

No ACID transactions, usually use weaker concurrency model (BASE)
Simpler API - usually no query language
restricted joins (for better efficiency)
Ability to horizontally scale “simple operations” throughput over many servers

simple = key lookups, read/write of one record
Ability to replicate and partition data over many servers
Efficient use of distributed indexes and RAM for data storage
The ability to dynamically add new attributes to data records

Showed that in-memory indexes can be highly scalable and it’s possible to distribute and replicate objects over multiple nodes

Pioneered the idea of eventual consistency as a new way to achieve higher availability and scalability :

data fetches are not guaranteed to be up-to-date, but
updates are guaranteed to be propagated to all nodes (eventually)
DHT (Distributed Hash Table) with replication
- for $N$ replicas stores values at servers $k$, $k + 1$, …, $k + N - 1$
- eventually consistent via vector clock to capture causality
Reconciliation at read time:
- writes never fail
- conflict resolution: last write wins or application specific
Configurable Consistency

Showed that persistend record storage could be scaled to thousands of nodes

Introduction to Data Science (coursera)
Web Intelligence and Big Data (coursera)
Amazon’s Dynamo paper [http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf]

✏️ Edit on GitHub