B+ Tree
A search tree is a way to organize data to allow efficient
 BTree  same idea, but for secondary memory, for blocks
 A B+ Tree variation of BTree. Here if BTree is mentioned, it's usually referred to B+ Tree
 consists of leaf nodes and intermediate nodes
Order of Tree
parameter $n$ is the order of a tree
 it determines the layout of the block
 each block will have
 $n$ search keys
 $n + 1$ pointer
Example
 suppose our block has size 4096 bytes
 integers are 4 bytes long, pointers are 8 bytes long
 suppose there's no header
 then we want such $n$ that $4n + 8(n + 1) \leqslant 4096$
 that $n = 340$
Example 2
 block size 4096
 8 bytes per pointer and 8 bytes per key
 $(n + 1) \cdot 8 + n \cdot 8 \leqslant 4096$
 $n = 255$: we can store 256 pointers and 255 keys in one block
Height
How to estimate height?
 Suppose $128 \times 10^6$ tuples in $R$
 we have a btree index with order $n = 255$ on attribute $C$
 assuming all leaf blocks are full, what's the height?
 so there are $\left\lceil \cfrac{128 \times 10^6}{255} \right\rceil$ leaf blocks
 $\left\lceil \cfrac{128 \times 10^6}{255^2} \right\rceil$ blocks at level 2
 $\left\lceil \cfrac{128 \times 10^6}{255^3} \right\rceil$ blocks at level 3
 ...
 $\left\lceil \cfrac{128 \times 10^6}{255^h} \right\rceil$ blocks at level $h$
 so it's logarithm
 $h = \lceil \log_{255} (128 \cdot 10^6) \rceil = 4$
 so #Lookup is 4 + 1 I/Os
What if blocks are halffull?
 since leaves are halffull, they keep 128 records
 $h = \lceil \log_{128} (128 \cdot 10^6) \rceil = 4$
 again 4 + 1 I/Os
NonLeaf Node
 $n$ search keys, $n + 1$ pointers
 first pointer points to keys that are strictly less than the first key
 last pointer points to keys that are greater or equal to the last key
 for inbetween pointers, if $k_i$ is a key, and $p_{i1}$ and $p_i$ are pointers around it, then $p_{i1} \leqslant k_i < p_i$
Leaf Node
 $n$ keys,
 $n$ pointers to actual records, 1 extra pointer to the next leaf in the sequence
Balancing
Reasons
 I/O cost of looking up is the longest path from the root to a leaf
 so we want our tree be balanced:
 to have paths as short as possible  with all the leaves at the same depth
Idea
 Recall that for $n$ we have $n + 1$ pointers and $n$ keys
 we don't want to have too empty nodes
 so we will require all nodes to be at least halffull
Size Invariants
nodes are halffull:
 at least $\left\lceil \cfrac{n + 1}{2} \right\rceil$ pointers for non leaf
 at least $\left\lfloor \cfrac{n + 1}{2} \right\rfloor$ pointers for leaf
 the only exception is root: it can contain any number of pointers
for example

 note that for leaf nodes a pointer to the next node counts as well

max # pointers

max # keys

max # pointers to data

min # keys

nonleaf

$n + 1$ 
$n$ 
$\left\lceil \cfrac{n + 1}{2} \right\rceil$ 
$\left\lceil \cfrac{n + 1}{2} \right\rceil  1$

leaf

$n + 1$ 
$n$ 
$\left\lfloor \cfrac{n + 1}{2} \right\rfloor$ 
$\left\lfloor \cfrac{n + 1}{2} \right\rfloor$

root

$n + 1$ 
$n$ 
2 
1

Operations
Cost of operations are expressed in I/O Model of Computation
Lookup
suppose we are looking for $k = 35$
Algorithm
 start at the root
 follow the suitable pointer (as described in #Leaf Node)  this is the left root pointer
 for the next, take the last ($k \geqslant 35$)
 and finally read the block

I/O Cost
 head of the tree + 1 I/O to read
 for $k = 35$: 3 I/O
 for $k = 40$ (which doesn't exist): same path as for $k = 35$, cost is 3 I/O
 so I/O cost = the longest path from the root to leaf (which is why we want it balanced)
Range Lookups
 BTree supports range queries as well
 such as $35 \leqslant k \leqslant 40$
 we lookup the key corresponding to the left range boundary
 since we have a pointer to the next block, we follow it until we hit the right range boundary
 I/O cost in this case is
 length of the path from the root to the leaf
 then the number of leaves that we need to follow
 and also we need to follow a pointer for each key in the range
Insert
4 cases
 simple case  space available in leaf
 leaf overflow
 intermediate node overflow
 new root
Simple Case

 add $k = 32$
 look up $k$ to identify the block where it should be stored
 we have some room there  so just add the record there
Leaf Overflow

 add $k = 7$
 look up the block for 7, but it's fyll
 so we split this block
 create a new one and redistribute items between them
 this way they both become halffull (the invariant is maintained)
 then in the new block we have some space (Simple Case) and we can put this record there
Intermediate Node Overflow

 add $k = 160$ ($n = 3$)
 there's no room in $\fbox{150, 156, 179}$ to add $160$ (Leaf Overflow case)
 so we create a new block and redistribute the keys between them
 since we created a new block, we need to add a pointer to it, but there is no room in $\fbox{120, 150, 180}$
 so we split the intermediate block and move $180$ to the new block
 also need to modify the root block to add pointer to the new block $\fbox{180}$
New Root
 sometimes we need to create a new root

 add $k = 45$ ($n = 3$)
 we cannot add it to $\fbox{30, 32, 40}$ (Leaf Overflow case)
 split it to 2 nodes, need to add the pointer to the new node
 cannot add it to $\fbox{10, 20, 30}$ (Intermediate Node Overflow case)
 split it into 2 nodes, need to add the pointer to new block
 but $\fbox{10, 20, 30}$ is a root!
 need to split it to two nodes
 and promote $\fbox{30}$ to the root
I/O Cost
 operations
 search
 create a new block (split, write two blocks): 2 operations
 go level up
 so in the worst case 2 I/O at each level + possibly writing a new root (1 I/O)
 then the total cost is
 $\text{depth} + 2 \times \text{depth} + 1 = 3 \times \text{depth} + 1$
Deleting
Again 4 cases
 Simple  just delete it
 Coalesce Leaf With Siblings
 Redistribute Keys
 Intermediate Nodes Coalescing
Coalesce Leaf With Siblings

 we delete $k = 50$ ($n = 4$)
 we locate the block with 50 and remove this record from it
 now the block becomes too empty (recall #Size Invariants)
 so we coalesce it with $\fbox{10, 20, 30}$
 now we have a new block, and the old one is not needed anymore  we remove it
 additional bookkeeping: need to make sure the next pointer point to the record the old block pointed to
Redistribute Keys

 we delete $k = 50$ ($n = 4$)
 when we delete 50, the block that contained it becomes almost empty
 cannot coalesce with $\fbox{10, 20, 30, 35}$ because it's full
 so to fix that we borrow a key from $\fbox{10, 20, 30, 35}$ to become halffull again
 and update the parents accordingly
Intermediate Nodes Coalescing

 we delete $k = 37$ ($n = 4$)
 first we locate the block with 37 and remove the record from it (Coalesce Leaf With Sibling case)
 since this block $\fbox{30}$ becomes too small we coalesce it with $\fbox{25, 26}$
 need to remove the pointer to the deleted block from the parental node $\fbox{30, 40}$
 i.e. we remove 30 from there
 but then the parental also node becomes too small, so we coalesce it with its sibling as well
 finally we don't need the root $\fbox{25}$ anymore since new block $\fbox{10, 20, 25, 40}$ can reference all the records
I/O Cost
Cost
 search: depth of the tree
 remove and regroup: 2 I/Os at each level
 no need to follow the pointer (i.e. don't do +1 as with #Insertion)
 total: $\text{depth} + 2 \times \text{depth} = 3 \times \text{depth}$
Multiple Keys
Sometimes we want to address a key made of several keys
Lexicographical Order
For BTrees have to be ordered somehow
 we may need to compare tuples in lexicographical order
 so we define this ordering as
 $(x, y, z) \leqslant (x', y', z') \iff \\ x < x' \lor (x = x' \land y < y') \lor (x = x' \land y = y' \land z \leqslant z')$
Problem with Lexicographical Order
Assume index of (age, salary) pairs with lexicographical order
 btreelexord.png
 query age < 20  ok
 start at the beginning of index ans scan till see 20
 query salary < 30  not fine
 linear scan, need to scan everything
 query age < 20 $\land$ sal < 20
 also scan index till see 20,
 meanwhile filtering records with sal < 20
 so using lexicographical ordering doesn't allow all queries we want
need other types of indexes  MultiDimensional Indexes
 RTree in particular: it's generalization of BTrees to multidimensional data
See also
Sources