ML Wiki

$\newcommand{\AntiJoin}{ \ \bar{\Join} \ }$

Relational Algebra

• Relational Algebra repesents the operations on relations for Relational Databases
• Relational Algebra is algebra that consists of operations for constructing new relations from given relations
• (it's closed, i.e. each operation always produces another relation)
• RA is not used as a query language, but usually SQL is translated to it in RDBMS

Relational Algebra Operators

• usual set operations (union, intersection, difference)
• operations that remove some parts of the relation
selections eliminate tuples (rows)
projections eliminate attributes (columns)
• operations to combine tuples of two relations: Cartesian product, joins, etc
• renaming

Binary

• Union U
• Intersection
• Difference

Unary

• Projection
• Join

Extended RA

• bags semantic
• duplicate elimination: $d$
• Grouping and aggregation: $\gamma$
• Sorting: $t$

Set-Based Union

$R_1 \cup R_2$

• Both $R_1$ and $R_2$ must have the same schema
• And the result is a relation with the same schema
• The result contains elements that are in $R_1$ or in $R_2$

SQL:

SELECT * FROM R1
UNION
SELECT * FROM R2


Example:

$\begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\ a_2 & b_1 \\  \end{array} \cup \begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\ a_3 & b_3 \\  \end{array} = \begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\ a_2 & b_1 \\ a_3 & b_3 \\  \end{array}$

Set-Based Difference

$R_1 - R_2$

• Both $R_1$ and $R_2$ must have the same schema
• And the result is a relation with the same schema
• The result contains elements that are in $R_1$ but not in $R_2$

SELECT * FROM R1
EXCEPT
SELECT * FROM R2


Example:

• $\begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\ a_2 & b_1 \\  \end{array} \cup \begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\ a_3 & b_3 \\  \end{array} = \begin{array}{ c | c }  A & B \\ \hline \hline a_2 & b_1 \\  \end{array}$

Set-Based Intersection

$R_1 \cap R_2$

• Both $R_1$ and $R_2$ must have the same schema
• And the result is a relation with the same schema
• The result contains elements that are in $R_1$ and in $R_2$

SQL

SELECT * FROM R1
INTERSECT
SELECT * FROM R2


Example

$\begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\ a_2 & b_1 \\  \end{array} \cup \begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\ a_3 & b_3 \\  \end{array} = \begin{array}{ c | c }  A & B \\ \hline \hline a_1 & b_1 \\  \end{array}$

Can be expressed via Union and Difference

• $R_1 \cap R_2 = R_1 - (R_1 - R_2)$
• $R_1 \cap R_2 = R_1 \Join R_2$

Selection

A binary operator

• Takes a relation $R$
• Outputs all tuples of $R$ that satisfy a certain condition $\theta$
• Attributes that take part in $\theta$ must be present in the relation $R$

in SQL:

SELECT ... FROM R WHERE {condition}

For example

• $\sigma_\theta(\text{Relation})$
• $\sigma_{A \geqslant 3} \left ( \begin{array}{ c | c }  A & B \\ \hline \hline 1 & 2 \\ 3 & 4 \\  \end{array} \right) = \begin{array}{ c | c }  A & B \\ \hline \hline 3 & 4 \\  \end{array}$

A condition can be anything

• salary > 40000 (inequality)
• name = "Smith" (equation)
• etc

Set-Based Projection

$\pi_{A_1, ..., A_n}(R)$

A binary operator

• Takes a relation $R$
• Outputs only specified attributes $A_1, ..., A_n$ of $R$
• All $A_1, ..., A_n$ must be present in $R$

SQL:

SELECT A1, ..., An FROM R

example

• $\pi_{\text{SSN}, \text{name}}(\text{Employee})$

Note that for set-based projection there are no duplicated in the output:

• $\pi_{A, C} \left( \begin{array}{ c | c | c | c}  A & B & C & D \\ \hline \hline 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 5 \\ 3 & 4 & 5 & 6 \\ 5 & 6 & 3 & 4 \\  \end{array} \right) = \begin{array}{ c | c}  A & C \\ \hline \hline 1 & 3 \\ 3 & 5 \\ 5 & 3 \\  \end{array}$

Cartesian Product

Sometimes also called "Cross-Product"

$R_1 \times R_2$

• Result of $R_1 \times R_2$ is a new relation
• in which each tuple in $R_1$ concatenated with each tuple in $R_2$
• i.e. it outputs all possible combinations of tuples
• $R_1$ and $R_2$ must have disjoint schema

SQL

SELECT * FROM R1, R2

Example:

• $\begin{array}{ c | c}  A & B \\ \hline \hline 1 & 2 \\ 3 & 4 \\  \end{array} \times \begin{array}{ c | c}  C & D \\ \hline \hline 2 & 6 \\ 3 & 7 \\ 4 & 9 \\  \end{array} = \begin{array}{ c | c | c | c}  A & B & C & D \\ \hline \hline 1 & 2 & 2 & 6 \\ 1 & 2 & 3 & 7 \\ 1 & 2 & 4 & 9 \\ 3 & 4 & 2 & 6 \\ 3 & 4 & 3 & 7 \\ 3 & 4 & 4 & 9 \\  \end{array}$

Natural Join

Or "Equi-Join"

$R \Join S$

• no requirements for schema for $R$ and $S$
• if they have one or more attributes in common, in the output tuples with same values will be matched
• if they don't have attributes in common - the result is the same as $R \times S$ (Cartesian Product)

Example:

• $\begin{array}{ c | c}  A & B \\ \hline \hline 1 & 2 \\ 3 & 4 \\  \end{array} \Join \begin{array}{ c | c}  B & D \\ \hline \hline 2 & 6 \\ 3 & 7 \\ 4 & 9 \\  \end{array} = \begin{array}{ c | c | c}  A & B & D \\ \hline \hline 1 & 2 & 6 \\ 3 & 4 & 9 \\  \end{array}$

• $\begin{array}{ c | c}  A & B \\ \hline \hline 1 & 2 \\ 3 & 4 \\  \end{array} \Join \begin{array}{ c | c}  C & D \\ \hline \hline 2 & 6 \\ 3 & 7 \\ 4 & 9 \\  \end{array} = \begin{array}{ c | c | c | c}  A & B & C & D \\ \hline \hline 1 & 2 & 2 & 6 \\ 1 & 2 & 3 & 7 \\ 1 & 2 & 4 & 9 \\ 3 & 4 & 2 & 6 \\ 3 & 4 & 3 & 7 \\ 3 & 4 & 4 & 9 \\  \end{array}$

Theta Join

$R_1 \Join_{\theta} R_2$

• A join that involves some predicate $\theta$
• For all combinations of tuples from $R_1 \Join_{\theta} R_2$, a tuple is output if $\theta$ holds for the combination
• essentially is the same as Cartesian Product plus Selection

Examples

• $R_1 \Join_{\theta} R_2 = \sigma_{\Theta}{R_1 \times R_2}$
• $\begin{array}{ c | c}  A & B \\ \hline \hline 1 & 2 \\ 3 & 4 \\  \end{array} \Join_{B = C} \begin{array}{ c | c}  C & D \\ \hline \hline 2 & 6 \\ 3 & 7 \\ 4 & 9 \\  \end{array} = \begin{array}{ c | c | c | c}  A & B & C & D \\ \hline \hline 1 & 2 & 2 & 6 \\ 3 & 4 & 4 & 9 \\  \end{array}$

Anti-Join

$R \AntiJoin S$

• if for a tuple in $R$ there is a match in $S$, we do not output this tuple from $R$
• $R \AntiJoin S \equiv R - (R \Join S)$
• The result is only tuples from $R$ (the resulting schema is also the same as in $R$)

Difference between $R \AntiJoin S$ and $R - S$ (Difference):

• for Difference $R - S$ both $R$ and $S$ need to have the same schema
• for $R \AntiJoin S$ - any schema
• if $R$ and $S$ have the same schema, then $R \AntiJoin S \equiv R - S$

Example

• $\begin{array}{ c | c}  A & B \\ \hline \hline 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 6 & 7 \\  \end{array} \AntiJoin \begin{array}{ c | c | c }  B & C & D \\ \hline \hline 2 & 5 & 6 \\ 6 & 4 & 2 \\  \end{array} = \begin{array}{ c | c }  A & B \\ \hline \hline 3 & 4 \\ 6 & 7 \\  \end{array}$

Renaming

$\rho_{\text{prefix}}(R)$

• takes all attributes of $R$ and
• produces a new relation with $\text{prefix.}$ appended to all of them
• so the resulting relation is the same, but with changed schema

SQL

SELECT * FROM Relation R

Example

• $\rho_T \left( \begin{array}{ c | c}  A & B \\ \hline \hline 1 & 2 \\ 3 & 4 \\  \end{array} \right) = \begin{array}{ c | c}  T.A & T.B \\ \hline \hline 1 & 2 \\ 3 & 4 \\  \end{array}$

Examples

find all hospitals within 5 ms of a school

SELECT DISTINCT h.name
FROM Hospital h, School s
WHERE distance(h.location, s.location) < 5


$\pi_\text{name} ( \rho_{\text{h}}(\text{Hospital}) \Join_{\text{h.location = s.location}} \rho_{\text{s}}(\text{School}) )$

...

Semi Joins

$R \ltimes S = \pi_{R.*}(R \Join S)$

Extended RA

• Allows Bag semantics for operations

Sets vs Bags

• Set of tuples: no duplicates allowed
• Bag of tuples: there can be duplicates
• In theory set semantics is usually assumed
• But in implementation - bag semantics

Duplicates

• Practically we don't care about duplicates
• We remove them only when required (duplicate elimination: $d$)

RA has two semantics:

• set semantics = traditional RA
• bag semantics = extended RA

All set-based operations are straightforwardly extended to bags

Bag-Based Intersection, Difference, Union

Intersection

• If the same tuple occurs twice in one relation,
• It must also occur twice in the second relation
• Then the result will also contain 2 tuples

Same idea with Difference and Union

Grouping

$\gamma_{\text{grouping_attribute}, \ \text{func}(A) \ \to \ \text{name}}(R)$

• unary relation that takes $R$ as input
• first parameter ($\text{grouping_attribute}$) is attribute on which $R$ will be grouped
• function $\text{func}$ is applied to each group, and the result is written to attribute $\text{name}$
• NB: all other (non-mentioned) attributes are not output to the result!

Example

• $\gamma_{A, \ \text{min}(B) \ \to \ D} \left( \begin{array}{ c | c | c }  A & B & C \\ \hline \hline 1 & 2 & a \\ 1 & 3 & b \\ 2 & 3 & c \\ 2 & 4 & a \\ 2 & 5 & d \\  \end{array} \right) = \begin{array}{ c | c | c }  A & D \\ \hline \hline 1 & 2 \\ 2 & 3 \\  \end{array}$

Projection

In Extended RA we also can allow renaming in projection

• $\pi_{A, C \to D} \left( \begin{array}{ c | c | c | c}  A & B & C & D \\ \hline \hline 1 & 2 & 2 & 6 \\ 1 & 2 & 2 & 7 \\ 1 & 2 & 2 & 9 \\ 3 & 4 & 3 & 6 \\ 3 & 4 & 3 & 7 \\ 3 & 4 & 3 & 9 \\  \end{array} \right) = \begin{array}{ c | c}  A & D \\ \hline \hline 1 & 2 \\ 1 & 2 \\ 1 & 2 \\ 3 & 3 \\ 3 & 3 \\ 3 & 3 \\  \end{array}$ (bag semantics)

• $\pi_{A, C \to D} \left( \begin{array}{ c | c | c | c}  A & B & C & D \\ \hline \hline 1 & 2 & 2 & 6 \\ 1 & 2 & 2 & 7 \\ 1 & 2 & 2 & 9 \\ 3 & 4 & 3 & 6 \\ 3 & 4 & 3 & 7 \\ 3 & 4 & 3 & 9 \\  \end{array} \right) = \begin{array}{ c | c}  A & D \\ \hline \hline 1 & 2 \\ 3 & 3 \\  \end{array}$ (set semantics)

• $C$ is renamed to $D$

Translating SQL to RA

Main Article: Translating SQL to Relational Algebra