ML Wiki

Inverse-Rules Algorithm

This is an approach for query rewriting used in LAV Mediation

Overview

Steps:

• constructing inverse rules from a set of LAV mappings
• unfolding a global query to obtain local queries

Important thing:

• 1st step of the algorithm is independent from queries
• it only needs to consider a set of LAV mappings

One LAV Mapping is replaced by several GAV Mappings

• one for each atom in the body of the rule
• but need to keep bindings between different occurrences of the same existential variables in the body
• done by using First Order Logic function - Skolem Function

Illustraction by Example

Here we will use the following example to illustrate the algorithm

Global query:

• $Q(x) \leftarrow U(y,z), R(x,z), T(z,y), R(y',x)$

And two LAV mappings:

• $V_1(u,v) \subseteq T(w,u), U(v,w), R(v,u)$
• $V_2(u,v,v') \subseteq T(w,u), U(v,w), R(v',w)$

Skolem Function

Logical Intuition

For $V_1$, FOL meaning is

• $\forall \ u, v \Big[ V_1(u, v) \Rightarrow \exists \ w \ : \ T(w, u) \land U(v, w) \land R(v, u) \Big]$
• suppose a tuple $(a, b)$ belongs to the data source that backs $V_1$
• so we have a fact $V_1(a, b)$
• from this fact $V_1(a, b)$ can infer that $R(b, a)$
• $V_1(a, b) \Rightarrow R(b, a)$
• (all conjuncts have to be true for a statement to be true, so it means the last conjuncts holds true)
• so $b$ can be an answer to a global query Q(x) = $R(x, y)$

But we can infer other things as well

• e.g. $V_1(a, b) \Rightarrow \exists \ d_1 \ : \ T(d_1, a) \land U(b, d_1)$
• where $d_1$ is some constant
• we don't know its value, but we know it exists (since it's existentially qualified) and
• it depends on constants $a$ and $b$
• so we can denote this dependency as $d_1 = f_1(a, b)$
• the symbol $f_1(u, v)$ is a Skolem Function of arity 2
• $f_1(u, v)$ denotes that there exists some constant that depends on values of $u$ and $v$
• given two distinct Skolem terms, e.g. $f_1(1, 2)$ and $f_1(2, v_3)$ we never can say if they belong to the same constant or not

Creating Inverse Rules

So we can create these GAV mappings and their FOL translations

Example 1

Inverse Rules of the 1st LAV mapping:

• $V_1(u,v) \subseteq T(w,u), U(v,w), R(v,u)$

Rule Mapping FOL
$\text{IN}_{1,1}$ $V_1(u,v) \subseteq T \big(f_1(u,v), u \big)$ $\forall \ u, v \Big[ V_1(u,v) \Rightarrow T \big( f_1(u, v), u \big) \Big]$
$\text{IN}_{1,2}$ $V_1(u,v) \subseteq U \big(v, f_1(u,v) \big)$ $\forall \ u, v \Big[ V_1(u,v) \Rightarrow U \big(v, f_1(u, v) \big) \Big]$
$\text{IN}_{1,3}$ $V_1(u,v) \subseteq R \big(v, u \big)$ $\forall \ u, v \Big[ V_1(u,v) \Rightarrow R \big(v, u \big) \Big]$

Algorithm

Algorithm for creating the inverse rules

• for each LAV mapping $M_i$ with $n_i$ atoms
• create $n_i$ GAV mappings (i.e. inverse rules)
• for each existential variable introduce a Skolem function of all distinguished variables

Example 2

For example, for the second mapping the result is

• mapping: $V_2(u,v,v') \subseteq T(w,u), U(v,w), R(v',w)$
• $\text{IN}_{2,1} \ : \ V_2(u,v,v') \subseteq T \big(f_2(u,v,v'), u \big)$
• $\text{IN}_{2,2} \ : \ V_2(u,v,v') \subseteq U \big(v, f_2(u,v,v') \big)$
• $\text{IN}_{2,3} \ : \ V_2(u,v,v') \subseteq R \big(v', f_2(u,v,v') \big)$

So, for each existentially qualified variable we define some Skolem function

• this is a function of all distinguished variables
• for $V_2$ we define $w = f_2(u,v,v')$

Step 2: Query Unfolding

Query Unfolding

• The process of unfolding query is different from GAV Mediation
• Reason = Skolem terms

GAV vs Inverse-Rules

Before (in GAV Mediation)

• for each query atom $G_i(x_1, ..., x_m)$
• match with some GAV mapping atom of the form $G_i(z_1, ..., z_m)$
• with mapping $\forall i: \ x_i \mapsto z_i$
• here each atom is unfolded in isolation
• but here it doesn't work: we have Skolen Functions

Now:

• need unification of atoms with functions
• more complex than simple unfolding
• may require substitution of some variables with function terms - Skolem terms

Unification Algorithm

Unification

• let $\sigma$ be an empty set of substitutions
• let $PR$ be a partial rewriting of $Q$, $PR \leftarrow Q$
• For each atom $G_i(...) \in Q$ (in turn)
• unify $G_i \in Q$ with some atom $V_j \subseteq G_i$ from the set of rules
• backtrack when we have a Skolem term in $V_j$
• $\sigma_i$ - the substitution that made this unification possible
• in $PR$, replace $G_i$ with the head $V_j$ of $G_i$
• apply obtained $\sigma_i$ to the partial rewriting $PR$: $PR \leftarrow \sigma_i(PR)$
• keep all mappings from $\sigma_i$: $\sigma \leftarrow \sigma \cup \sigma_i$
• let $R$ be a rewriting of $Q$: $R \leftarrow PR$
• return $R$

Example

Consider this query

• $Q(x) \leftarrow \underbrace{U(y,z)}_\text{(1)}, \underbrace{R(x,z)}_\text{(2)}, \underbrace{T(z,y)}_\text{(3)}, \underbrace{R(y',x)}_\text{(4)}$

Inverse Rules:

• $\text{IN}_{1,1} \ : \ V_1(u,v) \subseteq T \big(f_1(u,v), u \big)$
• $\text{IN}_{1,2} \ : \ V_1(u,v) \subseteq U \big(v, f_1(u,v) \big)$
• $\text{IN}_{1,3} \ : \ V_1(u,v) \subseteq R \big(v, u \big)$
• $\text{IN}_{2,1} \ : \ V_2(u,v,v') \subseteq T \big(f_2(u,v,v'), u \big)$
• $\text{IN}_{2,2} \ : \ V_2(u,v,v') \subseteq U \big(v, f_2(u,v,v') \big)$
• $\text{IN}_{2,3} \ : \ V_2(u,v,v') \subseteq R \big(v', f_2(u,v,v') \big)$

Now try to unify each atom of $Q$ with some atom from the rules

• we use most general unifier (mgu)

Atom 1: $U(y, z)$

• $U(y, z)$ can be unified with $U \big(v, f_1(u,v) \big)$ (from $\text{IN}_{1,2}$)
• the mgu is a substitution $\sigma = \{ y \mapsto v_1, v \mapsto v_1, z \mapsto f_1(v_2,v_1), u \mapsto v_2 \}$
• $v_1$ and $v_2$ are new fresh variables to avoid conflicts
• $y$ and $v$ map to the same variable $v_1$ because they are at the same position
• substitution $\sigma$ is called a unifier of two expressions and $U \big(v, f_1(u,v) \big)$
• it's a unifier because applying $\sigma$ results in identical expressions:
• $\sigma \Big( U(y, z) \Big) \equiv \sigma \Big( U \big(v, f_1(u,v) \big) \Big) \equiv U \big(v_1, f_1(v_2, v_1) \big)$
• apply $\sigma$ to the rest of atoms in $Q$ and unfold the first query atom of $Q$
• the head of $U \big(v, f_1(u,v) \big)$ is $V_1(u,v)$
• because of the substitution $v \mapsto v_1, u \mapsto v_2$ it becomes $V_1 \big( v_2, v_1 \big)$
• the remaining atoms are $R(x,z), T(z,y), R(y',x)$, and by substitution we obtain
• $R \big(x, z \big) \to R \big(x, f_1(v_2,v_1) \big)$
• $T \big(z, y \big) \to T \big(f_1(v_2,v_1), v_1 \big)$
• $R \big(y',x \big) \to R \big(y', x \big)$
• so we have:
• $PR_1(x) \leftarrow V_1 \big( v_2, v_1 \big), R \big(x, f_1(v_2,v_1) \big), T \big(f_1(v_2,v_1),v_1 \big), R \big(y',x \big).$

Atom 2: $R(x, z) \to R \big(x, f_1(v_2,v_1) \big)$

• (taking into account the substitutions made in previous iterations)
• $R(x,z)$ can be unified with $\ V_1(u,v) \subseteq R \big(v, u \big)$ ($\text{IN}_{1,3}$)
• unfolding of atom $(2)$ gives us the following partial rewriting
• $PR_2(x) \leftarrow V_1 \big( v_2, v_1 \big), V_1 \big( f_1(v_2,v_1), x \big), T \big(f_1(v_2,v_1), v1 \big), R \big(y', x \big)$
• note that now we have a Skolem term in the matched head
• so it's useless to continue unfolding
• there's no way to match $V_1 \big( f_1(v_2,v_1), x \big)$ with any fact in our data source
• because we don't know what is the constant resulting from $f_1(v_2,v_1)$, we know only that it exists
• so, we backtrack to atom 1

Atom 1: $U(y, z)$

• now try to unfold it using $\text{IN}_{2,2}$
• with substitution
• $\sigma_1^{(2)} = \{ y \mapsto v_1, v \mapsto v_1, z \mapsto f_2(v_2,v_1,v_3), u \mapsto v_2, v' \mapsto v3 \}$
• using this $\sigma_1^{(2)}$ we obtain the following partial rewriting
• $PR'_1(x) \leftarrow V_2 \big(v_2,v_1,v_3 \big), R \big(x,f_2(v_2,v_1,v_3) \big), T \big(f_2(v_2,v_1,v_3),v_1 \big), R \big(y',x \big)$

Atom 2: $R \big(x, z \big) \to R \big(x, f_2(v_2,v_1,v_3) \big)$

• try to use $\text{IN}_{2,3}$
• with substitution
• $\sigma_2^{(2)} = \{ v' \mapsto x, v_3 \mapsto x, u \mapsto v_2, v \mapsto v_1 \}$
• $PR'_2(x) \leftarrow V_2 \big(v_2,v_1,v_3 \big), \underbrace{V_2 \big(v_2,v_1,x \big)}_{\color{blue}{\text{redundant}}}, T \big(f_2(v_2,v_1,v_3),v_1 \big), R \big(y',x \big)$
• $PR'_2(x) \leftarrow V_2 \big(v_2,v_1,v_3 \big), T \big(f_2(v_2,v_1,v_3),v_1 \big), R \big(y',x \big)$

Atom 3: $T(z,y) \to T \big(f_2(u,v,v'),u \big)$

• need to match with $T \big( f_2(v_2,v_1,x), v_1 \big)$ from $\text{IN}_{2,3}$
• substitution $\{ v_2 \mapsto v_3, u \mapsto v_3, v_1 \mapsto v_3, v \mapsto v_3, v' \mapsto x \}$
• so we have this partial rewriting
• $PR'_3(x) \leftarrow V_2(v_2,v_1,x), V_2(v_3,v_3,x), R(y',x)$
• first atom is redundant, so have $PR'_3(x) \leftarrow V_2(v_3,v_3,x), R(y',x)$

Atom 4: $R(y',x)$

• match with $R \big(v, u \big)$ from $\text{IN}_{1,3}$
• and have $RP'_4(x) \leftarrow V_2(v_3,v_3,x), V_1(y', x)$

So, final rewriting

• $R_1(x) \leftarrow V_2(v_3,v_3,x), V_1(y', x)$