(\newcommand{\AntiJoin}{ \ \bar{\Join} \ } )
Relational Algebra
- Relational Algebra repesents the operations on relations for Relational Databases
- Relational Algebra is algebra that consists of operations for constructing new relations from given relations
- (it’s ‘‘closed’’, i.e. each operation always produces another relation)
- RA is not used as a query language, but usually SQL is translated to it in RDBMS
Relational Algebra Operators
Operations of traditional RA:
- usual set operations (union, intersection, difference)
-
- operations that remove some parts of the relation
- selections eliminate tuples (rows)
- projections eliminate attributes (columns)
- operations to combine tuples of two relations: Cartesian product, joins, etc
- renaming
Binary
- Union U
- Intersection
- Difference
Unary
- Projection
- Join
Extended RA
- bags semantic
- duplicate elimination: $d$
- Grouping and aggregation: $\gamma$
- Sorting: $t$
Traditional RA Operations
Set-Based Union
$R_1 \cup R_2$
- Both $R_1$ and $R_2$ must have the same schema
- And the result is a relation with the same schema
- The result contains elements that are in $R_1$ or in $R_2$
SQL:
SELECT * FROM R1
UNION
SELECT * FROM R2
Example:
$\begin{array}{ c | c } | A & B
\hline
\hline
a_1 & b_1 \
a_2 & b_1
\end{array}
\cup
\begin{array}{ c | c } | A & B
\hline
\hline
a_1 & b_1 \
a_3 & b_3
\end{array}
=
\begin{array}{ c | c } | A & B
\hline
\hline
a_1 & b_1 \
a_2 & b_1
a_3 & b_3
\end{array}
$
Set-Based Difference
$R_1 - R_2$
- Both $R_1$ and $R_2$ must have the same schema
- And the result is a relation with the same schema
- The result contains elements that are in $R_1$ but not in $R_2$
SELECT * FROM R1
EXCEPT
SELECT * FROM R2
Example:
- $\begin{array}{ c | c } | A & B
\hline \hline a_1 & b_1 \ a_2 & b_1
\end{array} \cup \begin{array}{ c | c } | A & B
\hline \hline a_1 & b_1
a_3 & b_3
\end{array} = \begin{array}{ c | c } | A & B
\hline \hline a_2 & b_1
\end{array} $
Set-Based Intersection
$R_1 \cap R_2$
- Both $R_1$ and $R_2$ must have the same schema
- And the result is a relation with the same schema
- The result contains elements that are in $R_1$ and in $R_2$
SQL
SELECT * FROM R1
INTERSECT
SELECT * FROM R2
Example
$\begin{array}{ c | c } | A & B
\hline
\hline
a_1 & b_1 \
a_2 & b_1
\end{array}
\cup
\begin{array}{ c | c } | A & B
\hline
\hline
a_1 & b_1 \
a_3 & b_3
\end{array}
=
\begin{array}{ c | c } | A & B
\hline
\hline
a_1 & b_1 \
\end{array}
$
Can be expressed via Union and Difference
- $R_1 \cap R_2 = R_1 - (R_1 - R_2)$
- $R_1 \cap R_2 = R_1 \Join R_2$
Selection
A binary operator
- Takes a relation $R$
- Outputs all tuples of $R$ that satisfy a certain condition $\theta$
- Attributes that take part in $\theta$ must be present in the relation $R$
in SQL:
SELECT ... FROM R WHERE {condition}
For example
- $\sigma_\theta(\text{Relation})$
- $\sigma_{A \geqslant 3} \left (
\begin{array}{ c | c } | A & B
\hline \hline 1 & 2 \ 3 & 4
\end{array} \right) = \begin{array}{ c | c } | A & B
\hline \hline 3 & 4
\end{array} $
A condition can be anything
- salary > 40000 (inequality)
- name = “Smith” (equation)
- etc
Set-Based Projection
$\pi_{A_1, …, A_n}(R)$
A binary operator
- Takes a relation $R$
- Outputs only specified attributes $A_1, …, A_n$ of $R$
- All $A_1, …, A_n$ must be present in $R$
SQL:
SELECT A1, ..., An FROM R
example
- $\pi_{\text{SSN}, \text{name}}(\text{Employee})$
Note that for set-based projection there are no duplicated in the output:
- $\pi_{A, C} \left(
\begin{array}{ c | c | c | c} | A & B & C & D
\hline \hline 1 & 2 & 3 & 4 \ 1 & 2 & 3 & 5
3 & 4 & 5 & 6
5 & 6 & 3 & 4
\end{array} \right) = \begin{array}{ c | c} | A & C
\hline \hline 1 & 3 \ 3 & 5
5 & 3
\end{array}$
Cartesian Product
Sometimes also called “Cross-Product”
$R_1 \times R_2$
- Result of $R_1 \times R_2$ is a new relation
- in which each tuple in $R_1$ concatenated with each tuple in $R_2$
- i.e. it outputs all possible combinations of tuples
- $R_1$ and $R_2$ must have disjoint schema
SQL
SELECT * FROM R1, R2
Example:
- $\begin{array}{ c | c} | A & B
\hline \hline 1 & 2 \ 3 & 4
\end{array} \times \begin{array}{ c | c} | C & D
\hline \hline 2 & 6 \ 3 & 7
4 & 9
\end{array} = \begin{array}{ c | c | c | c} | A & B & C & D
\hline \hline 1 & 2 & 2 & 6
1 & 2 & 3 & 7
1 & 2 & 4 & 9
3 & 4 & 2 & 6
3 & 4 & 3 & 7
3 & 4 & 4 & 9
\end{array} $
Natural Join
Or “Equi-Join”
$R \Join S$
- no requirements for schema for $R$ and $S$
- if they have one or more attributes in common, in the output tuples with same values will be matched
- if they don’t have attributes in common - the result is the same as $R \times S$ (Cartesian Product)
Example:
- $\begin{array}{ c | c} | A & B
\hline \hline 1 & 2 \ 3 & 4
\end{array} \Join \begin{array}{ c | c} | B & D
\hline \hline 2 & 6 \ 3 & 7
4 & 9
\end{array} = \begin{array}{ c | c | c} | A & B & D
\hline \hline 1 & 2 & 6
3 & 4 & 9
\end{array} $ - $\begin{array}{ c | c} | A & B
\hline \hline 1 & 2 \ 3 & 4
\end{array} \Join \begin{array}{ c | c} | C & D
\hline \hline 2 & 6 \ 3 & 7
4 & 9
\end{array} = \begin{array}{ c | c | c | c} | A & B & C & D
\hline \hline 1 & 2 & 2 & 6
1 & 2 & 3 & 7
1 & 2 & 4 & 9
3 & 4 & 2 & 6
3 & 4 & 3 & 7
3 & 4 & 4 & 9
\end{array} $
Theta Join
$R_1 \Join_{\theta} R_2$
- A join that involves some predicate $\theta$
- For all combinations of tuples from $R_1 \Join_{\theta} R_2$, a tuple is output if $\theta$ holds for the combination
- essentially is the same as Cartesian Product plus Selection
Examples
- $R_1 \Join_{\theta} R_2 = \sigma_{\Theta}{R_1 \times R_2}$
- $\begin{array}{ c | c} | A & B
\hline \hline 1 & 2 \ 3 & 4
\end{array} \Join_{B = C} \begin{array}{ c | c} | C & D
\hline \hline 2 & 6 \ 3 & 7
4 & 9
\end{array} = \begin{array}{ c | c | c | c} | A & B & C & D
\hline \hline 1 & 2 & 2 & 6
3 & 4 & 4 & 9
\end{array} $
Anti-Join
$R \AntiJoin S$
- if for a tuple in $R$ there is a match in $S$, we do not output this tuple from $R$
- $R \AntiJoin S \equiv R - (R \Join S)$
- The result is only tuples from $R$ (the resulting schema is also the same as in $R$)
Difference between $R \AntiJoin S$ and $R - S$ (Difference):
- for Difference $R - S$ both $R$ and $S$ need to have the same schema
- for $R \AntiJoin S$ - any schema
- if $R$ and $S$ have the same schema, then $R \AntiJoin S \equiv R - S$
Example
- $\begin{array}{ c | c} | A & B
\hline \hline 1 & 2 \ 3 & 4
5 & 6
6 & 7
\end{array} \AntiJoin \begin{array}{ c | c | c } | B & C & D
\hline \hline 2 & 5 & 6 \ 6 & 4 & 2
\end{array} = \begin{array}{ c | c } | A & B
\hline \hline 3 & 4
6 & 7
\end{array} $
Renaming
$\rho_{\text{prefix}}(R)$
- takes all attributes of $R$ and
- produces a new relation with $\text{prefix.}$ appended to all of them
- so the resulting relation is the same, but with changed schema
SQL
SELECT * FROM Relation R
Example
- $\rho_T \left(
\begin{array}{ c | c} | A & B
\hline \hline 1 & 2 \ 3 & 4
\end{array} \right) = \begin{array}{ c | c} | T.A & T.B
\hline \hline 1 & 2 \ 3 & 4
\end{array}$
Examples
find all hospitals within 5 ms of a school
SELECT DISTINCT h.name
FROM Hospital h, School s
WHERE distance(h.location, s.location) < 5
$\pi_\text{name} ( \rho_{\text{h}}(\text{Hospital}) \Join_{\text{h.location = s.location}} \rho_{\text{s}}(\text{School}) )$
Outer Joins
…
Semi Joins
$R \ltimes S = \pi_{R.*}(R \Join S)$
Extended RA
- Adds additional operations to Transitional RA
- Allows Bag semantics for operations
Sets vs Bags
- '’Set’’ of tuples: no duplicates allowed
- '’Bag’’ of tuples: there can be duplicates
- In theory set semantics is usually assumed
- But in implementation - bag semantics
Duplicates
- Practically we don’t care about duplicates
- We remove them only when required (duplicate elimination: $d$)
RA has two semantics:
- set semantics = traditional RA
- bag semantics = extended RA
All set-based operations are straightforwardly extended to bags
Bag-Based Intersection, Difference, Union
Intersection
- If the same tuple occurs twice in one relation,
- It must also occur twice in the second relation
- Then the result will also contain 2 tuples
Same idea with Difference and Union
Grouping
$\gamma_{\text{grouping_attribute}, \ \text{func}(A) \ \to \ \text{name}}(R)$
- unary relation that takes $R$ as input
- first parameter ($\text{grouping_attribute}$) is attribute on which $R$ will be grouped
- function $\text{func}$ is applied to each group, and the result is written to attribute $\text{name}$
-
’'’NB’’’: all other (non-mentioned) attributes are not output to the result Example - $\gamma_{A, \ \text{min}(B) \ \to \ D} \left(
\begin{array}{ c | c | c } | A & B & C
\hline \hline 1 & 2 & a
1 & 3 & b
2 & 3 & c
2 & 4 & a
2 & 5 & d
\end{array} \right) = \begin{array}{ c | c | c } | A & D
\hline \hline 1 & 2
2 & 3
\end{array} $
Projection
- same as Set-Based Projection, but we don’t need to eliminate duplicates
- hence it’s more efficient
In Extended RA we also can allow renaming in projection
- $\pi_{A, C \to D} \left(
\begin{array}{ c | c | c | c} | A & B & C & D
\hline \hline 1 & 2 & 2 & 6
1 & 2 & 2 & 7
1 & 2 & 2 & 9
3 & 4 & 3 & 6
3 & 4 & 3 & 7
3 & 4 & 3 & 9
\end{array} \right) = \begin{array}{ c | c} | A & D
\hline \hline 1 & 2
1 & 2
1 & 2
3 & 3
3 & 3
3 & 3
\end{array} $ (bag semantics) - $\pi_{A, C \to D} \left(
\begin{array}{ c | c | c | c} | A & B & C & D
\hline \hline 1 & 2 & 2 & 6
1 & 2 & 2 & 7
1 & 2 & 2 & 9
3 & 4 & 3 & 6
3 & 4 & 3 & 7
3 & 4 & 3 & 9
\end{array} \right) = \begin{array}{ c | c} | A & D
\hline \hline 1 & 2
3 & 3
\end{array} $ (set semantics) - $C$ is renamed to $D$
Translating SQL to RA
Sources
- Database Systems Architecture (ULB)
- Introduction to Data Science (coursera)
- Database Systems: The Complete Book (2nd edition) by H. Garcia-Molina, J. D. Ullman, and J. Widom