# ML Wiki

## External Merge Sort

Full name: External Memory Multi-Way Merge Sort

## Algorithm

This is a multi-way algorithms. That is, the sorting of relation $R$ is done in several passes

• First Pass - in memory sorting
• Second and onwards - merging

Notation:

• $R$ relation we want to sort
• $M$ - max # of blocks we can use in memory

### First Pass

1-st pass:

• read $M$ blocks
• sort all elements in memory with any sorting algorithm (say, Merge Sort or Quick Sort)
• write sorted results back to disk
• repeat for remaining blocks of $R$ until all are processed

After the first pass we have $\left\lceil \cfrac{B(R)}{M} \right\rceil$ sorted sub-results

### Second Pass

• Since we have $M$ available buffers, we can simultaneously process at most $M$ sorted sub-results from the previous pass
• So we divide all sub-results into categories with $M$ sub-results in each
• Since we have $M$ sub-results and $M$ blocks in each, each result of this pass should have $M \times M$ blocks
• After this pass we will have $\left \lceil \cfrac{B(R) / M}{M} \right\rceil = \left \lceil \cfrac{B(R)}{M^2} \right\rceil$ sorted subresults

2-nd pass:

• merge the first $M$ sublists to a single sublist of $M^2$ blocks
• by synchronous iteration

### Synchronous Iteration

This is essentially the merging phase of Merge Sort

Algorithm:

• load block of $R$ to $N_R$, block of $S$ to $N_S$
• iterate over tuples $t_R \in N_R$ and $t_S \in N_S$ synchronously
• if $t_R \geqslant t_S$
• output $t_R$
• move $t_R$ pointer to the next tuple in $R$ (load next block if needed)
• if $t_R > t_S$
• output $t_S$
• move $t_S$ pointer to the next tuple in $S$ (load next block if needed)

### Third and Onwards Passes

• if needed - repeat the second pass again
• until everything is sorted

## Cost

• at each pass we read and write the entire relation once
• there are $P = \lceil \log_M B(R) \rceil$ passes
• so the total cost is $\underbrace{2 \times B(R)}_\text{1 read + 1 write} \times P = 2 \times B(R) \times \lceil \log_M B(R) \rceil$