Intro
Each node in a Logical Query Plan may be executed in several ways
- there is no single implementation that is always better than the others
- so we need to compare alternatives based on their costs (in I/O Model of Computation it’s # of I/O operations)
Statistics
To estimate a cost we use the following statistics from Database System Catalog:
- $B(R)$ - # of blocks that relation $R$ holds
- $T(R)$ - # of tuples in $R$
- typically can be used to calculate $B(R)$ when we know how many bytes we have per block
- $V(R, A_1, …, A_n) = | \delta \pi_{A_1, …, A_n} (R) |$ - # of distinct values |
Parameters
And also we use
- $M$ - number of available main memory buffers
To simplify we suppose that the situation is ideal
- all buffers of Buffer Manager are available
- there are no other operations that concurrently claim the space
Operators Overview
| Type $\downarrow$ | Bag Union | Set Union | Set Intersection | Bag Intersection | Set Difference | Bag Difference | Join | One-Pass | || || || || || || || Sort-Based | || || || || || || || Hash-Based | || || not here || not here || not here || not here || | Also:
- For Joins: #Nested Loop Join
Bag Union
$R \cup_B S$
- all elements from both
- no need to remove duplicates
- for this operation just need to use one buffer block
- load elements to the block and then just output
’'’Algorithm’’’
- for each block $B_R \in R$
- load $B_R$ to buffer $N$
- for each tuple $t_R \in N$
- output $t_R$
- for each block $B_S \in S$
- load $B_S$ to buffer $N$
- for each tuple $t_S \in N$
- output $t_S$
’'’Cost’’’
- $B(R) + B(S)$
- we don’t count output
- and we must have some available buffers: $M \geqslant 1$
Set Union
$R \cup_B S$
- this time we care about duplicates
- assume that $R$ is smaller than $S$
One-Pass Set Union
$R$ fits into memory:
- We assume that we have $B(R) + 1$ available buffers
- i.e. $R$ can fit into memory and after that there’s at least one remaining available buffer
Idea:
- load all elements of $R$
- using 1 extra buffer go through all blocks of $S$
- output element if only it’s not in $R$
’'’Algorithm’’’
- load $R$ to $N_1, …, N_{B(R)}$ buffers
- for each tuple $t_R \in \cup_i N_i$
- output $t_R$
- for each block $B_S \in S$
- load $B_S$ to $N_0$
- for each tuple $t_S \in N_0$
- if $t_S \not \in \cup_i N_i$, output $t_S$
’'’Cost’’’
- $B(R) + B(S)$
- pass once through $R$ and once through $S$
- we ignore the cost of searching in memory - interested only in I/O cost
Problem:
- We can do it only when one of the relations fin into memory
- usually they don’t fit| | |
Sort-Based Set Union
what if there’s no enough memory available?
Algorithm:
- Sort $R$ with External Merge Sort
- Sort $S$ with External Merge Sort
- Iterate Synchronously over $R$ and $S$
- like in External Merge Sort, but don’t output repeating values
- i.e.: increase both pointers when see a duplicate
Synchronous Iteration:
- load block of $R$ to $N_R$, block of $S$ to $N_S$
- iterate over tuples $t_R \in N_R$ and $t_S \in N_S$ ‘‘synchronously’’
- if $t_R < t_S$
- output $t_R$
- move $t_R$ pointer to the next tuple in $R$ (load next block if needed)
- if $t_R > t_S$
- output $t_S$
- move $t_S$ pointer to the next tuple in $S$ (load next block if needed)
- if $t_R = t_S$
- output $t_S$
- move ‘'’both’’’ $t_R$ and $t_S$
- if $t_R < t_S$
Cost:
- $2 B(R) \lceil \log_M B(R) \rceil$: cost of sorting $R$
- $2 B(S) \lceil \log_M B(S) \rceil$: cost of sorting $S$
- $B(R) + B(S)$ to iterate synchronously to output the results
Optimization
- Synchronous Iteration of Sort-Based Union is very similar to the merge phase of External Merge Sort
-
Sometimes we can combine them - and avoid doing the last pass of Merge Sort Algorithm: - Sort $R$, but don’t execute the last merge phase
- we know that after that $R$ is divided into $l$ sorted lists
- $1 < l \leqslant M$
- Sort $S$, but don’t execute the last merge phase
- we know that after that $S$ is divided into $k$ sorted lists
- $1 < k \leqslant M$
- if $l + k \leqslant M$ then we can apply the optimization
- because there are enough buffers available to synchronously iterate through both set of sub-results
Cost:
- $2 B(R) \big( \lceil \log_M B(R) \rceil - 1 \big)$: cost of sorting $R$ without last pass
- $2 B(S) \big( \lceil \log_M B(S) \rceil - 1 \big)$: cost of sorting $S$ without last pass
- $B(R) + B(S)$ to iterate synchronously to output the results
- or $2 B(R) \lceil \log_M B(R) \rceil + 2 B(S) \lceil \log_M B(S) \rceil - B(R) - B(S)$
- we save $B(R) + B(S)$ I/Os| | | Note that
- this optimization is only possible if $k + l \leqslant M$
- can calculate $k$ and $l$ as
- $l = \left\lceil \cfrac{B(R)}{ M^{\lceil \log_M B(R) \rceil - 1} } \right\rceil $ - # of passes to sort $R$ - 1
- $k = \left\lceil \cfrac{B(S)}{M^{\lceil \log_M B(S) \rceil - 1}} \right\rceil$ - # of passes to sort $S$ - 1
it’s usually sufficient to have only 2 passes for sorting
- in this case can apply optimization if
- $\left\lceil \cfrac{B(R)}{M} \right\rceil + \left\lceil \cfrac{B(S)}{M} \right\rceil \leqslant M$ or
- $B(R) + B(S) \leqslant M^2$
- cost in this case is $3B(R) + 3B(S)$ with optimization
- $5B(R) + 5B(S)$ without optimization
Example
Suppose $M = 15, B(R) = 100, B(S) = 120$
- to sort $R$ need $\lceil \log_M B(R) \rceil = 2$ passes
- to sort $S$ need $\lceil \log_M B(S) \rceil = 2$ passes
- $l = \left\lceil \cfrac{100}{15^2} \right\rceil$
- $k = \left\lceil \cfrac{120}{15^2} \right\rceil$
- $l + k = 15 \leqslant M$, therefore we can apply optimization
- cost is $2 \cdot 100 \cdot 2 + 2 \cdot 120 \cdot 2 - 100 - 120 = 660$
Hash-Based Set Union
Main idea: we want to partition both $R$ and $S$ in such a way that
- if a tuple appears in some bucket from $R$ it should appear in the corresponding bucket from $S$
- each bucket contains no more that $M - 1$ blocks
- so it is possible to apply One-Pass Set Union to each bucket
Record distribution
- $B(R) < B(S)$ - $R$ is smaller than $S$
- we suppose that we can partition $R$ in $k$ buckets $R_i$
- to do that we apply some hash function $h$
- and then distribute tuples from $S$ also into $k$ buckets $S_i$
- also by applying $h$
- all records in $R_i$ and $S_i$ ended up in the bucket $i$ because they have the same hash value
- if there’s a record that occurs both in $R_i$ and $S_i$, it’s a duplicate
- and we need to consider only these buckets, this record cannot appear in buckets other than $i$
Computing Set Union
- to compute set union we compute unions for all buckets $i$: $R_i \cup_S S_i$
- since $R_i$ contains at most $M - 1$ block can do that in one pass
Partitioning $R$
How to partition $R$ in blocks of size at most $M - 1$?
Algo
- first pass
- we load each block of $R$ into buffer $N_0$
- and we have $M - 1$ remaining buffers - we will use them as buckets
- for each tuple $t_R \in R$ we calculate $h(t_R)$ to find which bucket it belongs to
- once a bucket buffer is full, we flush it to disk to some block, empty the buffer and continue
- $\Rightarrow$ after first pass we’ll have $M - 1$ buckets of $\cfrac{B(R)}{M - 1}$ blocks (assuming $h$ distributes records uniformly)
- 2+ passes
- if there are buckets that have more that $M - 1$ blocks we need to hash them again
- but this time with another hash function $h’$ (otherwise the old $h$ will just put all the tuples back to the same bucket)
- so we repeat the first pass again, but for each overfull bucket separately
- $\Rightarrow$ after second pass we’ll have $(M - 1)^2$ buckets of $\cfrac{B(R)}{(M - 1)^2}$ blocks
- continue this process until all buckets have no more than $M - 1$ blocks
- $\Rightarrow$ after $k$ passes we’ll have $(M - 1)^k$ buckets of $\cfrac{B(R)}{(M - 1)^k}$ blocks
One level of partitioning is usually enough
- so we need two passes
- 1st: to partition
- 2nd: to do pair-wise single pass unions of buckets
- it’s sufficient when $\cfrac{B(R)}{M - 1} \leqslant M - 1$ or $\approx B(R) \leqslant M^2$
Cost of partitioning
- $2B(R) \underbrace{\lceil log_{M - 1} B(R) - 1 \rceil}_\text{# of passes}$:
- at each pass we read and write $R$ once
- so for one pass it’s just $2B(R)$
Cost
Total cost of partitioning
- $2B(R) \lceil \log_{M - 1} B(R) - 1 \rceil$ for $R$
- $2B(S) \lceil \log_{M - 1} B({ \color{blue}{R} }) - 1 \rceil$ for $S$
- note that number of passes for $S$ is the same as of $R$
- $B(R) + B(S)$ one pass union for each bucket
- if one pass is sufficient, then the total cost is $3B(R) + 3B(R)$
Set Intersection
$R \cap_S S$
- assume $R$ is smaller than $S$
One-Pass Set Intersection
Essentially same as #One-Pass Set Union
- if $R$ is small enough to fit into $M - 1$ buffers
Algorithm
- load $R$ to $K_R = N_1, …, N_{B(R)}$ buffers
- for each block $B_S \in S$
- load $B_S$ to $N_0$
- for each tuple $t_S \in N_0$
- if $t_S \in K_R$, output $t_S$
Sort-Based Set Intersection
- same as #Sort-Based Set Union
- output $t$ if it appears in both $R$ and $S$
Bag Intersection
$R \cap_B S$
- assume $R$ is smaller than $S$
One-Pass Bag Intersection
Essentially same as #One-Pass Set Union
- if $R$ is small enough to fit into $M - 1$ buffers
- but for each distinct value we associate a ‘'’count’’’ - number of times this tuple occurred
- generally, this structure can take more that $M - 1$ memory buffer if there are few duplicates
- but if there are a lot of duplicates - this way of organizing will take less room than $M - 1$
Algorithm
- load $R$ to $K_R = N_1, …, N_{B(R)}$ buffers
- for each block $B_S \in S$
- load $B_S$ to $N_0$
- for each tuple $t_S \in B_S$
- if $t_S \in K_R$
- output $t_S$
- decrease ‘'’count’’’ of $t_S$ in $K_R$
- if ‘'’count’’’ = 0 then remove this tuple from memory
- if $t_S \in K_R$
Sort-Based Bag Intersection
- same as #Sort-Based Set Union
- output $t$ the number of times it appears both in $R$ and $S$
Set Difference
Note that this operation is not commutative
- $S -_S R$ is not the same as $R -_S S$
- assume that $R$ is smaller than $S$
One-Pass Set Difference
For this we assume $R$ fits into $M-1$ blocks
$S -_S R$ case
- read $R$ into buffers $K_R = N_1, …, N_{B(R)}$
- for each block $B_S \in S$
- load $B_S$ into $N_0$
- for each tuple $t_S \in B_S$
- if $t_S \not \in K_R$ output $t_S$
$R -_S S$ case
- read $R$ into buffers $K_R = N_1, …, N_{B(R)}$
- for each block $B_S \in S$
- load $B_S$ to $N_0$
- for each tuple $t_S \in B_S$
- if $t_S \in K_R$ remove $t_S$ from $K_R$
- for each tuple $t_R$ that is still in $K_R$ output $t_R$
Sort-Based Set Difference
- same as #Sort-Based Set Union
- output $t$ if it appears in $R$ but not in $S$
Bag Difference
- same as in #Bag Intersection: we ‘'’count’’’ the number of occurrences
- also two cases: $S -_B R$ and $R -_B S$
One-Pass Bag Difference
For this we assume $R$ fits into $M-1$ blocks
$S -_B R$ case: count $c$ in this case is $c$ reasons not to output a tuple
- read tuples of $R$ to $K_R = N_1, …, N_{B(R)}$, ‘'’count’’’ the number of occurrences
- load each block $B_S$ to $N_0$
- for each tuple $t_S \in B_S$
- if $t_S \not \in K_R$: output $t_S$
- otherwise
- decrement ‘'’count’’’ of $t_S$ and
- remove $t_S$ from $K_S$ if count = 0
- for each tuple $t_S \in B_S$
$R -_B S$ case
- read tuples of $R$ to $K_R = N_1, …, N_{B(R)}$, ‘'’count’’’ the number of occurrences
- load each $B_S$ to $N_0$
- if $t_S \in K_R$
- decrement ‘'’count’’’ for $t_S$ in $K_R$
- remove $t_S$ from $K_R$ if ‘'’count’’’ = 0
- if $t_S \in K_R$
- for each remaining $t_R \in K_R$
- output $t_R$ ‘'’count’’’ times
Sort-Based Bag Difference
- same as #Sort-Based Set Union
- for each tuple $t$
- let $c_R$ = number of times $t$ appears in $R$
- let $c_S$ = number of times $t$ appears in $S$
- if $c = c_R - c_S \leqslant 0$ don’t output anything
- otherwise output $t$ $c$ times
Join
Join is most costly operator to evaluate
- sometimes it’s quadratic (when it’s equivalent to cartesian product)
Suppose we want to evaluate $R(X, Y) \Join S(Y, Z)$
- $Y$ is a matching attribute and we will join on it
- we again assume that $B(R) < B(S)$
One-Pass Join
To be able to do it in one pass $R$ must fit into memory
- $R(B) \leqslant M -1$
Algo
- load $R$ into buffers $N_1, …, N_{B(R)}$
- for each block $B_S \in S$
- load $B_S$ to $N_0$
- for each tuple $t_S \in N_0$
- for each matching tuple $t_R \in \U_i N_i$
- output $t_R \Join t_S$
Cost
- $B(R) + B(S)$
- again ignore cost of finding matching tuple in memory
Nested Loop Join
Tuple-Based Nested Join Loop
First variant is ‘‘tuple-based nested join loop’’
- for each $r \in R$
- for each $s \in S$
- if $r$ matches $s$ output $r \Join s$
We need only one buffer for $R$ and one buffer for $S$
Cost
-
$T(R) \times T(S)$ - very expensive #### Block-Based Nested Join Loop - We divide $R$ into segments of $M - 1$ blocks
- and for each such segment we go through entire $S$
Algo
- load each segment of $R$ into buffers $N_1, …, N_{M - 1}$
- for each block $B_S \in S$
- load $B_S$ into $N_0$
- for each tuple $t_R \in \cup_i N_i$
- for each matching tuple $t_S \in N_0$: output $t_R \Join t_S$
Sort-Based Join
Essentially the same as #Sort-Based Set Union
- but in this case we need to take care about duplicates that may be in both $R$ and $S$
Algo
- Sort $R$ on matching attribute $Y$
- Sort $S$ on matching attribute $Y$
- Iterate Synchronously through $R$ and $S$
- if $t_R.Y < t_S.Y$ then advance pointer $t_R$
- if $t_R.Y > t_S.Y$ then advance pointer $t_S$
- if $t_R.Y = t_S.Y$ then
- for each pointers $t’_S$ with same $Y$ value that follow $t_S$ (including $t_S$ itself)
- output $t_R \Join t’_S$
- advance pointer $t_R$ and rewind $t’_S$ to $t_S$
- (this way we join each tuple from $R$ that has value $Y$ with each tuple from $S$ that also has value $Y$)
- for each pointers $t’_S$ with same $Y$ value that follow $t_S$ (including $t_S$ itself)
Cost
- usually depends of the number of tuples with equal values
- worst case: all tuples have the same value for $Y$ - in this case the cost is $B(R) \times B(S)$
- but joins are usually performed on foreign keys
- i.e. tuples in $R$ have distinct values for $Y$
- and for each tuple $t_R$ we have several (maybe 0) tuples $t_S$ - one-to-many relationship
- so we don’t need to rewind the pointer for $t_S$
- in this case the cost analysis is similar to the #Sort-Based Set Union
- sorting cost + $B(R) + B(S)$
- it’s also possible to optimize and save additional $B(R) + B(S)$ I/Os
- sorting cost - $B(R) - B(S)$
- ’'’NB’’’: if there’s a clustered index on $Y$ we don’t need to sort it - it’s already sorted
(Partition) Hash Join
- Essentially the same as #Hash-Based Set Union
- the only difference is that we hash the join attribute and not the whole tuple
Algo
- partition $R$ by hashing $Y$ into buckets each with at most $M-1$ blocks
- let $k$ be the number of buckets we got in result
- partition $S$ by hashing $Y$ into $k$ buckets
- let $R_i$ and $S_i$ be blocks of bucket #$i$ that ended up there because their $Y$ values have the same hash
- a tuple $t_S \in S$ matches $t_R \in S$ $\iff$ there $\exists$ a bucket $i$ s.t. $t_R \in R_i$ and $t_S \in S_i$
- we compute join by calculating $R_i \Join S_i$ for all $i$ using #One-Pass Join algorithm
Cost
- same as for Hash-Based Set Union
- $k = \lceil \log_{M - 1} B(R) - 1 \rceil$
- total # of I/Os: $2B(R) \cdot k + 2B(S) \cdot k + B(R) + B(S)$
See also
Sources
- Database Systems Architecture (ULB)
- Database Systems: The Complete Book (2nd edition) by H. Garcia-Molina, J. D. Ullman, and J. Widom