Sparse Index

database-indexes database-systems-architecture

Sparse Index

A sparse index has one (key, pointer) per each block

so it uses less space than Dense Index
but requires more time to find a record

not all keys are referenced by this index
pointers point only to the 1st key of the block
cannot say from index if a key is present or not, always need to follow a pointer, load the block and check the key there

Nested Levels

Key benefit of this

we can build the 2nd index on top of index

Operations

Lookup

Rule to retrieve record with specified $k$

we find such $k$ in index that
- is greater or equal to $k$,
- but less than consequent one

suppose we want to find a record with $k = 20$
- $10 \leqslant k = 20 < 90$, follow 10
- $10 \leqslant k = 20 < 30$, follow 10
- load (10, 20) block
if we looked for $k = 15$
- we also would follow 10, and 10, then read the block and discover it’s not there

Deletion

suppose we want to delete $k = 40$

we locate the record, remove it and leave a tombstone there

if we want to delete $k = 30$

locate the record, delete it and leave a tombstone there
but we want to keep it sequential and continuous: so we move 40 up
also note that we need to update the index as well: not it needs to point to 40 instead of 30

what if we want to delete both $k = 30$ and $k = 40$?

locate the records and delete both
the whole block is no longer needed - we remove the pointers to it along with corresponding index record
so we remove 30 from the index and move 50 and 70 up

Insertion

insert key 34

we first locate where we insert
lucky case: some room in the block, just add the record there

insert key 15

this time no room in the block where we want to insert it
there are two options
- Immediate Reorganization
- Overflow Blocks

’'’Immediate Reorganization’’’

re-distribution data
we try to push the data down
in this case 20 is moved to the next block
note that we also have to update the index key that points to the second block since its first key got updated
worst case: we will move all the data blocks
variant: insert a new block and update the index

’'’Overflow Blocks’’’

we create a new block and create a pointer to it from
it will be reorganized later
it may degenerate to a linked list
- may have to traverse the whole chain only to find out that the value is not there
- i.e. back to linear search

Duplicate Keys

Suppose we have duplicate keys in our database. How to build index?

Option 1

There could be some problems we build it same way as without assuming duplicate keys

careful with looking for 20 or 30

- if we follow the pointer for 20, we’ll loose the previous record for 20

- so in this case will need to also load the previous block to check if it contains something

Option 2

We may point to previous values, so we know the range

so in the index we indicate the ‘‘first new’’ (not seen previously) key from block

Sources

Database Systems Architecture (ULB)

✏️ Edit on GitHub

Sparse Index

Sparse Index

Nested Levels

Operations

Lookup

Deletion

Insertion

Duplicate Keys

Option 1

Option 2

See also

Sources