Physical Query Plan Optimization
Translating SQL to RA expression is the first step in Query Processing Pipeline
- Input: Optimized Logical Query plan - expression in Extended Relational Algebra
- Output: Optimized Physical Query Plan - expression in Relational algebra with each node assigned some physical algorithm
Cost-Based Plan Selection
We need to select the optimal plan based on
- Cost of Physical Operators and
- on estimated cost of subqueries
This is an Optimization Problem
- One of the possible approaches is Greedy Algorithms
Greedy Algorithm
At each step we choose the operation with least local cost
Bottom-up approach:
- first assign physical operators to leaves
- then to parents of the leaves
- then to their parents
- and so on
- at each step choose an operator that gives the lowest cost
- for join operators use a gredy algorithm for join ordering
Limitations
Doesn’t take into account the properties of the output of an operator
Consider this example:
- suppose optimized sort-merge join is not available - not enough memory
- so the Greedy algorithm decides to use hash-based join (recall that its output is not sorted)
- we could’ve taken slightly more expensive sort-merge join
- but instead of 3-pass duplicate elimination we’d just need one pass for $\delta$
- which would give us overall lower total cost