External Merge Sort

algorithms database-systems-architecture sorting

Full name: ‘‘External Memory Multi-Way Merge Sort’’

Algorithm

This is a multi-way algorithms. That is, the sorting of relation $R$ is done in several passes

Notation:

1-st pass:

read $M$ blocks
sort all elements in memory with any sorting algorithm (say, Merge Sort or Quick Sort)
write sorted results back to disk
repeat for remaining blocks of $R$ until all are processed

After the first pass we have $\left\lceil \cfrac{B(R)}{M} \right\rceil $ sorted sub-results

Since we have $M$ available buffers, we can simultaneously process at most $M$ sorted sub-results from the previous pass
So we divide all sub-results into categories with $M$ sub-results in each
Since we have $M$ sub-results and $M$ blocks in each, each result of this pass should have $M \times M$ blocks
After this pass we will have $\left \lceil \cfrac{B(R) / M}{M} \right\rceil = \left \lceil \cfrac{B(R)}{M^2} \right\rceil$ sorted subresults

2-nd pass:

This is essentially the merging phase of Merge Sort

Algorithm:

at each pass we read and write the entire relation once
there are $P = \lceil \log_M B(R) \rceil$ passes
so the total cost is $\underbrace{2 \times B(R)}_\text{1 read + 1 write} \times P = 2 \times B(R) \times \lceil \log_M B(R) \rceil$

Database Systems Architecture (ULB)
Database Systems: The Complete Book (2nd edition) by H. Garcia-Molina, J. D. Ullman, and J. Widom

✏️ Edit on GitHub