Translating SQL to RA expression is the first step in Query Processing Pipeline
Suppose we have this query
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid AND y.cid = z.cid AND x.price > 100 AND z.city = 'Seattle'
We translate it to the following expression:
But there is a more optimal way to obtain the same results
The process of finding a cheaper equivalent expression is called (logical) query optimization
A Relational Algebra expression $e$ is optimal if there is no other expression $e'$ s.t.
The Optimization Problem
Undecidability
In practice, most queries are of the form Select-Project-Join (SPJ)
Example
And we can optimize this kind of queries!
SELECT movieTitle FROM StarsIn S1 WHERE starName IN ( SELECT name FROM MovieStar, StarsIn S2 WHERE birthdate = 1960 ANDS2.movieTitle = S1.movieTitle)
Note that this query is equivalent to
SELECT movieTitle FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate = 1960)
This one has one join less to execute (and the join is the most expensive operation!)
Why?
It is possible to automatically translate from first kind of query to second one
The problem:
To do that we could exploit one of the properties of Conjunctive Querys:
The algorithm to remove redundant joins is as follows:
This will eliminate the redundant joins in an RA expression
To optimize an RA expression further (after eliminating redundant joins) we can use some heuristics
For relations $R(A, B), S(C, D)$ consider the following expression
$\pi_A \sigma_{B < D} \big( \sigma_{A = 5}(R) \times S \big)$
$\pi_A \big( \sigma_{A = 5}(R) \Join_{B < D} S \big)$
$\pi_A \big( \sigma_{A = 5}(R) \Join_{B < D} \pi_D (S) \big)$
See LQP Optimization Exercises (DBSA)