Sequenced Queries
Temporal Databases provide mechanisms to store and manipulate time-varying information. But how to query it?
Valid Time Tables
A valid time table is a table with
- attributes that specify a period when the information in these tables were valid
- in SQL92 it’s usually modeled with 2 date attributes
from
andto
For example,
Employee(SSN, FirstName, LastName, BirthDate)
Position(PCN, JobTitle)
Incumbents(SSN, PCN, FromDate, ToDate)
Salary(SSN, Amount, FromDate, ToDate)
Incumbents
andSalary
are valid-time tables
Example
Consider this table:
Incumbents | SSN | PCN | FromDate | ToDate | 111223333 | 900225 | 1996-01-01 | 1996-06-01 || 111223333 | 900225 | 1996-06-01 | 1996-08-01 || 111223333 | 900225 | 1996-08-01 | 1996-10-01 || 111223333 | 900225 | 1996-10-01 | 3000-01-01 || 111223333 | 900225 | 1997-01-01 | 3000-01-01 | This is a valid time table:
- it has attributes from and to that show when this information is valid
- special date
3000-01-01
is used to show “currently valid” (i.e. till now) - Closed-open periods are typically used:
- it makes it easier to merge in SQL: just do $T_1.\text{to} = T_2.\text{from}$
Type of Queries
On Valid-Time tables we can query for:
- current state: now
- time-sliced: valid at some point of time
- sequenced: applied to each point in time
- non-sequenced: applied to all points in time, completely ignoring temporal aspects
Temporal Keys and Constraints
Consider this table again:
Incumbents | SSN | PCN | FromDate | ToDate | 111223333 | 900225 | 1996-01-01 | 1996-06-01 || 111223333 | 900225 | 1996-06-01 | 1996-08-01 | What are keys that we can use to uniquely identify a record in this table?
- candidate keys:
(SSN, PCN, FromDate)
(SSN, PCN, ToDate)
(SSN, PCN, FromDate, ToDate)
But we want to enforce the following constraint:
- An employee can have only one position at a point of time
- none of these keys can enforce such a constraint
Sequenced Primary Key
This can only be enforced with triggers
| |```sql CREATE TRIGGER Seq_Primary_Key ON Incumbents FOR INSERT, UPDATE AS IF EXISTS (SELECT * FROM Incumbents AS I1 – (1) no overlap WHERE 1 < (SELECT COUNT(I2.SSN) FROM Incumbents AS I2 WHERE I1.SSN = I2.SSN AND I1.PCN = I2.PCN – <= the key AND I1.FromDate < I2.ToDate AND I2.FromDate < I1.ToDate)) OR EXISTS – no NULL (SELECT * FROM Incumbents AS I WHERE I.SSN IS NULL OR I.PCN IS NULL) BEGIN RAISERROR(‘Violation of sequenced primary key constraint’,1,2) rollback transaction END
| |We look for intervals with the same key
<br>
that overlap with each over
<br>
<img src="https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/adb/td-sequenced-key.png" alt="Image">
<br>
also we don't allow <code>NULL</code>s
### Other Constraints
And there are other constraints that we want to enforce:
- No Duplicates
- Referential Integrity (introducing sequenced foreign keys)
- The history must be contiguous (no gaps)
## Queries
### Coalescing
Consider this table:
- <code>Employee(Name, Salary, Title, BirthDate, FromDate, ToDate)</code>
- each time the salary or title change, the time of the change is captured and new record with this change is added
- so we can see the entire history of changes
| Name | Salary | Title | BirthDate | FromDate | ToDate | John | 60.000 | Assistant | 9/9/60 | 1/1/95 | 1/6/95 || John | 70.000 | Assistant | 9/9/60 | 1/6/95 | 1/10/95 || John | 70.000 | Lecturer | 9/9/60 | 1/10/95 | 1/2/96 || John | 70.000 | Professor | 9/9/60 | 1/2/96 | 1/1/97 |
We see how the salary grows and how John changes his title
Now consider these queries:
- show John's current salary
- show the history of John's salary changes
sequenced queries
The first query is easy:
```sql
SELECT Salary
FROM Employee
WHERE Name = 'John' AND
FromDate <= now() AND
now() <= ToDate
But it’s not easy to do the second one
Main idea:
- find intervals that overlap or adjacent and merge them
- if we have 5 periods
- (title A, salary X), (title B, salary X), (title B, salary Y), (title C, salary Y), (title D, salary Y)
- we want to extract only periods with equal salary and get (salary X) and (salary Y)
Imperative Version
Idea:
- create a temporary table and modify the timestaps of intervals there
- for period $T_1$ find a period $T_2$ for which
- they have the same salary
- they are adjacent or overlap
- $T_1$ starts before $T_2$ starts
- $T_2$ starts when $T_1$ has not finished yet
- $T_2$ finishes after $T_1$ finishes
- if there exists such $T_2$ then update the finish time of $T_1$ to $T_2$
CREATE TABLE Temp(Salary, FromDate, ToDate) AS
SELECT Salary, FromDate, ToDate FROM Employee
WHERE Name = 'John'
repeat
UPDATE TEMP T1
SET (T1.ToDate) =
(SELECT MAX(T2.ToDate)
FROM TEMP AS T2
WHERE T1.Salary = T2.Salary
AND T1.FromDate < T2.FromDate
AND T1.ToDate >= T2.FromDate
AND T1.ToDate < T2.ToDate) WHERE EXISTS
(SELECT *
FROM TEMP AS T2
WHERE T1.Salary = T2.Salary
AND T1.FromDate < T2.FromDate
AND T1.ToDate >= T2.FromDate
AND T1.ToDate < T2.ToDate)
until no tuples updated
This is how it looks like after each pass:
- we see that it there are intervals that are no longer needed
- so we remove intervals that are not maximal
Removal of non-maximal intervals
- for interval $T_1$ find interval $T_2$ such that
- it has the same salary
- $T_2$ includes $T_1$:
- $T_2$ start after $T_1$ and
- $T_2$ finish earlier than $T_1$
- td-coalecsing-imperative-remove-min.png
- if there exists such $T_2$ then remove $T_1$
DELETE FROM Temp T1
WHERE EXISTS (
SELECT * FROM Temp AS T2
WHERE T1.Salary = T2.Salary AND
((T1.FromDate > T2.FromDate AND T1.ToDate <= T2.ToDate) OR
(T1.FromDate >= T2.FromDate AND T1.ToDate < T2.ToDate))
Declarative Version in SQL
Algorithm:
Let $E$ be a database with John’s history
- for salary $x$ we need to find $f, l \in E$ such that
- $f$ is the first period with salary $x$ and $l$ last period with $x$
- and the interval between $f$ and $l$ is continuous:
- for all $t \in E$ with salary $x$ that inside $[f, l)$
- there’s $t_1 \in E$ that starts before $t$ and finishes before $t$ finishes
- to ensure that $f,l$ are indeed the bounds of the interval we try to find $t_2 \in E$ such that
- it either starts before $f$ or finishes after $l$
- if we find such $t_2$ - $[f, l)$ is not maximal and we need to try another pair of $f,l$
- More formally this looks like this:
- find $f, l \in E$ such that
- $f.\text{from} < l.\text{to} \land f.\text{salary} = l.\text{salary} \land$ :: $\forall t \in E$ :: $t.\text{salary} = f.\text{salary} \land f.\text{from} < t.\text{from} \land t.\text{from} < l.\text{to}: $ ::: $\exists t_1 \in E: t_1.\text{salary} = f.\text{salary} \land $ :::: $t_1.\text{from} < t.\text{from} \land f.\text{from} \leqslant t1.\text{to}$ :: $\land$ :: $\lnot \exists t_2 \in E $ ::: $t_1.\text{salary} = f.\text{salary} \land$ ::: $\Big[ (t_2.\text{from} < f.\text{from} \land f.\text{from} \leqslant t_2.\text{to}) \lor $ ::: $\ (t_2.\text{from} \leqslant l.\text{to} \land l.\text{to} < t_2.\text{to}) \Big]$
Let’s see how it looks like graphically
find $f, l \in E$ such that | $f.\text{from} < l.\text{to} \land$ | ![]() |
$f.\text{salary} = l.\text{salary} \land $ | $\forall t \in E$ | $\ \ t.\text{salary} = f.\text{salary} \land$ | $\ \ f.\text{from} < t.\text{from} \land t.\text{from} < l.\text{to} \land$ | ![]() |
$\ \ \exists t_1 \in E:$ | for all such $t$ there exists $t_1$ | $\ \ \ \ t_1.\text{salary} = f.\text{salary} \land$ | $\ \ \ \ t_1.\text{from} < t.\text{from} \land f.\text{from} \leqslant t1.\text{to}$ | ![]() |
$\land$ | $\lnot \exists t_2 \in E $ | $\ \ t_1.\text{salary} = f.\text{salary} \land$ | $\ \ \Big[ (t_2.\text{from} < f.\text{from} \land f.\text{from} \leqslant t_2.\text{to}) \lor \ \ \ \ (t_2.\text{from} \leqslant l.\text{to} \land l.\text{to} < t_2.\text{to}) \Big]$ | ![]() |
But there’s no $\forall$ operator in SQL:
- so need to replace the condition $\forall t … \exists t_1$ onto $\not \exists t … \not \exists t_1$
Here’s a SQL version:
CREATE VIEW Temp(Salary, FromDate, ToDate) AS
SELECT Salary, FromDate, ToDate
FROM Employee
WHERE Name = 'John';
SELECT DISTINCT F.Salary, F.FromDate, L.ToDate
FROM Temp AS F, Temp AS L
WHERE F.FromDate < L.ToDate
AND F.Salary = L.Salary
-- SUBQUERY 1: ensure continuity
AND NOT EXISTS
(SELECT * FROM Temp AS T
WHERE T.Salary = F.Salary
AND F.FromDate < T.FromDate
AND T.FromDate < L.ToDate
AND NOT EXISTS
(SELECT * FROM Temp AS T1
WHERE T1.Salary = F.Salary
AND T1.FromDate < T.FromDate
AND T.FromDate <= T1.ToDate))
-- SUBQUERY 2: make sure it's maximal
AND NOT EXISTS
(SELECT * FROM Temp AS T2
WHERE T2.Salary = F.Salary
AND ((T2.FromDate < F.FromDate AND F.FromDate <= T2.ToDate)
OR (T2.FromDate <= L.ToDate AND L.ToDate < T2.ToDate)));
Temporal Join
Alternative solution:
- the schema before had two independent attributed we want to capture
- can’t we reorganize the schema - split the information about changes in salary and title?
So we split and have this schema:
Employee(Name, BirthDate)
EmployeeSal(Name, Salary, FromDate, ToDate)
EmployeeTitle(Name, Title, FromDate, ToDate)
Showing the history of changes in John’s salary is easy now:
SELECT Salary, FromDate, ToDate
FROM EmployeeSal
WHERE Name = 'John'
However how we now obtain a table of salary and title intervals?
- we need a temporal join
EmployeeSal | Name | Salary | FromDate | ToDate | John | 60.000 | 1/1/95 | 1/6/95 || John | 70.000 | 1/6/95 | 1/1/97 |
EmployeeTitle | Name | Title | FromDate | ToDate | John | Assistant | 1/1/95 | 1/10/95 || John | Lecturer | 1/10/95 | 1/2/96 || John | Professor | 1/2/96 | 1/1/97 |
EmployeeSal $\Join$ EmployeeTitle | Name | Salary | Title | FromDate | ToDate | John | 60.000 | Assistant | 1/1/95 | 1/6/95 || John | 70.000 | Assistant | 1/6/95 | 1/10/95 || John | 70.000 | Lecturer | 1/10/95 | 1/2/96 || John | 70.000 | Professor | 1/2/96 | 1/1/97 |
The idea:
- to find overlapping intervals and join on them
- there are 4 possible ways in which intervals can overlap
- so we need to try all of them
| |```sql – (1) SELECT S.Name, Salary, Title, S.FromDate, S.ToDate FROM EmployeeSal S, EmployeeTitle T WHERE S.Name = T.Name AND T.FromDate <= S.FromDate AND S.ToDate <= T.ToDate UNION ALL – (2) SELECT S.Name, Salary, Title, S.FromDate, T.ToDate FROM EmployeeSal S, EmployeeTitle T WHERE S.Name = T.Name AND S.FromDate > T.FromDate AND T.ToDate < S.ToDate AND S.FromDate < T.ToDate UNION ALL – (3) SELECT S.Name, Salary, Title, T.FromDate, S.ToDate FROM EmployeeSal S, EmployeeTitle T WHERE S.Name = T.Name AND T.FromDate > S.FromDate AND S.ToDate < T.ToDate AND T.FromDate < S.ToDate UNION ALL – (4) SELECT S.Name, Salary, Title, T.FromDate, T.ToDate FROM EmployeeSal S, EmployeeTitle T WHERE S.Name = T.Name AND T.FromDate >= S.FromDate AND T.ToDate <= S.ToDate
| |<img src="https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/adb/td-temporal-join-idea.png" alt="Image">
### Aggregation Functions
Suppose we want to apply some aggregation function to each period in the valid time table
Example:
- compute the maximal salary for each period
- <img src="https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/adb/td-aggregations.png" alt="Image">
This is done in 3 steps:
- compute all possible periods
- compute the maximal salary for these periods
- coalesce the periods with the same salary
Step 1:
```sql
-- all moments of time when there was some change
CREATE VIEW SalChanges(Day) as
SELECT DISTINCT FromDate
FROM Salary
UNION
SELECT DISTINCT ToDate
FROM Salary;
-- now we construct periods from these moments
CREATE VIEW SalPeriods(FromDate, ToDate) as
SELECT P1.Day, P2.Day
FROM SalChanges P1, SalChanges P2
WHERE P1.Day < P2.Day
AND NOT EXISTS
(SELECT * FROM SalChanges P3
WHERE P1.Day < P3.Day
AND P3.Day < P2.Day);
Step 2:
CREATE VIEW TempMax(MaxSalary, FromDate, ToDate) as
SELECT MAX(E.Amount), I.FromDate, I.ToDate
FROM Salary E, SalPeriods I
WHERE E.FromDate <= I.FromDate
AND I.ToDate <= E.ToDate
GROUP BY I.FromDate, I.ToDate;
Step 3
- coalesce these intervals
’'’Count’’’ is a little bit different
- we need to identify gaps in the history and put there zeros
Step 2 for count
CREATE VIEW TempCount(NbEmp, FromDate, ToDate) as
SELECT COUNT(*), P.FromDate, P.ToDate
FROM Salary S, SalPeriods P
WHERE S.FromDate <= P.FromDate
AND P.ToDate <= S.ToDate
GROUP BY P.FromDate, P.ToDate
UNION ALL
SELECT 0, P.FromDate, P.ToDate
FROM SalPeriods P
WHERE NOT EXISTS
(SELECT * FROM Salary S
WHERE S.FromDate <= P.FromDate
AND P.ToDate <= S.ToDate);
Links
- Exercises from the ADB course: [https://www.dropbox.com/s/es8a430fhgg55ow/202-temporal_exercices.pdf]