B-Tree

data-structures database-indexes database-systems-architecture

B+ Tree

A search tree is a way to organize data to allow efficient

'’B-Tree’’ - same idea, but for secondary memory, for blocks
A ‘‘B+ Tree’’ variation of B-Tree. Here if B-Tree is mentioned, it’s usually referred to B+ Tree

consists of ‘‘leaf nodes’’ and ‘‘intermediate nodes’’

Order of Tree

parameter $n$ is the order of a tree

it determines the layout of the block
each block will have
- $n$ search keys
- $n + 1$ pointer

Example

suppose our block has size 4096 bytes
integers are 4 bytes long, pointers are 8 bytes long
suppose there’s no header
then we want such $n$ that $4n + 8(n + 1) \leqslant 4096$
that $n = 340$

Example 2

block size 4096
8 bytes per pointer and 8 bytes per key
$(n + 1) \cdot 8 + n \cdot 8 \leqslant 4096$
$n = 255$: we can store 256 pointers and 255 keys in one block

Height

How to estimate height?

Suppose $128 \times 10^6$ tuples in $R$
we have a b-tree index with order $n = 255$ on attribute $C$
assuming all leaf blocks are full, what’s the height?
- so there are $\left\lceil \cfrac{128 \times 10^6}{255} \right\rceil$ leaf blocks
- $\left\lceil \cfrac{128 \times 10^6}{255^2} \right\rceil$ blocks at level 2
- $\left\lceil \cfrac{128 \times 10^6}{255^3} \right\rceil$ blocks at level 3
- …
- $\left\lceil \cfrac{128 \times 10^6}{255^h} \right\rceil$ blocks at level $h$
- so it’s logarithm
$h = \lceil \log_{255} (128 \cdot 10^6) \rceil = 4$
so #Lookup is 4 + 1 I/Os

What if blocks are half-full?

since leaves are half-full, they keep 128 records
$h = \lceil \log_{128} (128 \cdot 10^6) \rceil = 4$
again 4 + 1 I/Os

Non-Leaf Node

$n$ search keys, $n + 1$ pointers
first pointer points to keys that are strictly less than the first key
last pointer points to keys that are greater or equal to the last key
for in-between pointers, if $k_i$ is a key, and $p_{i-1}$ and $p_i$ are pointers around it, then $p_{i-1} \leqslant k_i < p_i$

Leaf Node

$n$ keys,
$n$ pointers to actual records, 1 extra pointer to the next leaf in the sequence

Balancing

Reasons

I/O cost of looking up is the longest path from the root to a leaf
so we want our tree be balanced:
to have paths as short as possible - with all the leaves at the same depth

Idea

Recall that for $n$ we have $n + 1$ pointers and $n$ keys
we don’t want to have too empty nodes
so we will require all nodes to be at least half-full

Size Invariants

nodes are half-full:

at least $\left\lceil \cfrac{n + 1}{2} \right\rceil$ pointers for non leaf
at least $\left\lfloor \cfrac{n + 1}{2} \right\rfloor$ pointers for leaf
the only exception is root: it can contain any number of pointers

for example

note that for leaf nodes a pointer to the next node counts as well

max # pointers

max # keys

max # pointers to data

min # keys

non-leaf

$n + 1$

$n$

$\left\lceil \cfrac{n + 1}{2} \right\rceil$

$\left\lceil \cfrac{n + 1}{2} \right\rceil - 1$

leaf

$n + 1$

$n$

$\left\lfloor \cfrac{n + 1}{2} \right\rfloor$

root

$n + 1$

$n$

Operations

Cost of operations are expressed in I/O Model of Computation

Lookup

suppose we are looking for $k = 35$

Algorithm

start at the root
follow the suitable pointer (as described in #Leaf Node) - this is the left root pointer
for the next, take the last ($k \geqslant 35$)
and finally read the block

I/O Cost

head of the tree + 1 I/O to read
for $k = 35$: 3 I/O
for $k = 40$ (which doesn’t exist): same path as for $k = 35$, cost is 3 I/O
so I/O cost = the longest path from the root to leaf (which is why we want it balanced)

Range Lookups

BTree supports range queries as well
such as $35 \leqslant k \leqslant 40$
we lookup the key corresponding to the left range boundary
since we have a pointer to the next block, we follow it until we hit the right range boundary
I/O cost in this case is
- length of the path from the root to the leaf
- then the number of leaves that we need to follow
- and also we need to follow a pointer for ‘'’each’’’ key in the range

Insert

4 cases

simple case - space available in leaf
leaf overflow
intermediate node overflow
new root

Simple Case

add $k = 32$
look up $k$ to identify the block where it should be stored
we have some room there - so just add the record there

Leaf Overflow

add $k = 7$
look up the block for 7, but it’s fyll
so we ‘‘split’’ this block
- create a new one and re-distribute items between them
- this way they both become half-full (the invariant is maintained)
then in the new block we have some space (Simple Case) and we can put this record there

Intermediate Node Overflow

add $k = 160$ ($n = 3$)
there’s no room in $\fbox{150, 156, 179}$ to add $160$ (Leaf Overflow case)
- so we create a new block and re-distribute the keys between them
since we created a new block, we need to add a pointer to it, but there is no room in $\fbox{120, 150, 180}$
so we split the intermediate block and move $180$ to the new block
also need to modify the root block to add pointer to the new block $\fbox{180}$

New Root

sometimes we need to create a new root
add $k = 45$ ($n = 3$)
we cannot add it to $\fbox{30, 32, 40}$ (Leaf Overflow case)
- split it to 2 nodes, need to add the pointer to the new node
cannot add it to $\fbox{10, 20, 30}$ (Intermediate Node Overflow case)
- split it into 2 nodes, need to add the pointer to new block
but $\fbox{10, 20, 30}$ is a root - need to split it to two nodes - and promote $\fbox{30}$ to the root

I/O Cost

operations
- search
- create a new block (split, write two blocks): 2 operations
- go level up
so in the worst case 2 I/O at each level + possibly writing a new root (1 I/O)
then the total cost is
- $\text{depth} + 2 \times \text{depth} + 1 = 3 \times \text{depth} + 1$

Deleting

Again 4 cases

Simple - just delete it
Coalesce Leaf With Siblings
Redistribute Keys
Intermediate Nodes Coalescing

Coalesce Leaf With Siblings

we delete $k = 50$ ($n = 4$)
we locate the block with 50 and remove this record from it
now the block becomes too empty (recall #Size Invariants)
so we coalesce it with $\fbox{10, 20, 30}$
now we have a new block, and the old one is not needed anymore - we remove it
additional bookkeeping: need to make sure the next pointer point to the record the old block pointed to

Redistribute Keys

we delete $k = 50$ ($n = 4$)
when we delete 50, the block that contained it becomes almost empty
cannot coalesce with $\fbox{10, 20, 30, 35}$ because it’s full
so to fix that we borrow a key from $\fbox{10, 20, 30, 35}$ to become half-full again
and update the parents accordingly

Intermediate Nodes Coalescing

we delete $k = 37$ ($n = 4$)
first we locate the block with 37 and remove the record from it (Coalesce Leaf With Sibling case)
- since this block $\fbox{30}$ becomes too small we coalesce it with $\fbox{25, 26}$
- need to remove the pointer to the deleted block from the parental node $\fbox{30, 40}$
- i.e. we remove 30 from there
but then the parental also node becomes too small, so we coalesce it with its sibling as well
finally we don’t need the root $\fbox{25}$ anymore since new block $\fbox{10, 20, 25, 40}$ can reference all the records

I/O Cost

Cost

search: depth of the tree
remove and regroup: 2 I/Os at each level
no need to follow the pointer (i.e. don’t do +1 as with #Insertion)
total: $\text{depth} + 2 \times \text{depth} = 3 \times \text{depth}$

Multiple Keys

Sometimes we want to address a key made of several keys

Lexicographical Order

For B-Trees have to be ordered somehow

we may need to compare tuples in ‘‘lexicographical order’’
- so we define this ordering as
- $(x, y, z) \leqslant (x’, y’, z’) \iff \ x < x’ \lor (x = x’ \land y < y’) \lor (x = x’ \land y = y’ \land z \leqslant z’)$

Problem with Lexicographical Order

Assume index of (age, salary) pairs with lexicographical order

btree-lex-ord.png
query age < 20 - ok
- start at the beginning of index ans scan till see 20
query salary < 30 - not fine
- linear scan, need to scan everything
query age < 20 $\land$ sal < 20
- also scan index till see 20,
- meanwhile filtering records with sal < 20
so using lexicographical ordering doesn’t allow all queries we want

need other types of indexes - Multi-Dimensional Indexes

R-Tree in particular: it’s generalization of B-Trees to multidimensional data

Sources

Database Systems Architecture (ULB)

✏️ Edit on GitHub

B-Tree

B+ Tree

Order of Tree

Height

Non-Leaf Node

Leaf Node

Balancing

Size Invariants

Operations

Lookup

I/O Cost

Range Lookups

Insert

Simple Case

Leaf Overflow

Intermediate Node Overflow

New Root

I/O Cost

Deleting

Coalesce Leaf With Siblings

Redistribute Keys

Intermediate Nodes Coalescing

I/O Cost

Multiple Keys

Lexicographical Order

Problem with Lexicographical Order

See also

Sources