GAV Mediation
There are two main approached for Mediating in Data Integration
- GAV Mediation - defining global relations in terms of local
- LAV Mediation - defining local relations in terms of global
GAV - Global-as-View Mediation
GAV Mapping
A GAV mapping is an expression of the form
- $R(x_1, ..., x_n) \supseteq Q(x_1, ..., x_n)$
- where $Q(x_1, ..., x_n)$ is a Conjunctive Query of the same arity as $R$
- since $Q(x_1, ..., x_n) \leftarrow A_1(...), \ ..., \ A_k(...)$, can rewrite as $R(x_1, ..., x_n) \supseteq A_1(...), \ ..., \ A_k(...)$
- so a mapping is some query over some source relations, also called a view
FOL Semantics of this mapping
- $\forall \ x_1, ..., x_n \ \exists \ y_1, ..., y_m \ : \ A_1(...), \ ..., \ A_k(...) \Rightarrow R(x_1, ..., x_n)$
- $x_1, ..., x_n$ - distinguished variables,
- $y_1, ..., y_n$ - existential variables
GAV Mapping Example
Data sources:
- S1.Catalogue(nomUniv, programme). - programs in French universities
- S2.Erasmus(student, course, univ). - European Erasmus students
- S3.CampusFr(student, program, university). - foreign students in France
- S4.Mundus(program, course). - international master programs
Global Schema:
- MasterStudent(studentName),
- University(uniName),
- MasterProgram(title),
- MasterCourse(code),
- EnrolledIn(studentName,title),
- RegisteredTo(studentName, uniName).
The GAV mapping for the global schema is the following
- MasterStudent(N) $\supseteq$ S2.Erasmus(N, C, U), S4.Mundus(P, C)
- MasterStudent(N) $\supseteq$ S3.CampusFr(N, P, U), S4.Mundus(P, C)
- University(U) $\supseteq$ S1.Catalogue(U, P)
- University(U) $\supseteq$ S2.Erasmus(N, C, U)
- University(U) $\supseteq$ S3.CampusFr(N, P, U)
- MasterProgram(T) $\supseteq$ S4.Mundus(T, C)
- MasterCourse(C) $\supseteq$ S4.Mundus(T, C)
- EnrolledIn(N, T) $\supseteq$ S2.Erasmus(N, C, U), S4.Mundus(T, C)
- EnrolledIn(N, T) $\supseteq$ S3.CampusFr(N, T, U), S4.Mundus(T, C)
- RegisteredTo(N, U) $\supseteq$ S3.CampusFr(N, T, U)
- left side: global; right side: local
Query Answering
To evaluate a query
- for answering some query against the global schema, need to find the relevant data sources
- then we issue queries for each data source and combine the result
GAV Unfolding (informal)
- for each atom $A_i(...)$ of the query
- if this atom can be matched to a head of some mapping $R_j(...)$
- replace the atom $A_i(...)$ by the body of the mapping $R_j(...)$
Illustration
Illustration by example
- Consider this query:
- $Q(x) \leftarrow \underbrace{\text{RegistersTo}(s, x)}_\text{(1)}, \underbrace{\text{MasterStudent}(s)}_\text{(2)}$
- for $\text{(1)}$, one mapping can be found, for $\text{(2)}$ - two mappings
- so we can have two unfoldings:
- $Q_1(x) \leftarrow S_3.\text{CampusFr}(s,v_1,x), S_2.\text{Erasmus}(s,v_2,v_3), S_4.\text{Mundus}(v_4,v_2)$
- $Q_2(x) \leftarrow S_3.\text{CampusFr}(s,v_5,x), S_3.\text{CampusFr}(s,v_6,v_7), S_4.\text{Mundus}(v_6,v_8)$
- note that $Q_2$ can be simplified (by removing a redundant join)
- so we have the following two rewritings:
- $R_1(x) \leftarrow S_3.\text{CampusFr}(s,v_1,x), S_2.\text{Erasmus}(s,v_2,v_3), S_4.\text{Mundus}(v_4,v_2)$
- $R_2(x) \leftarrow S_3.\text{CampusFr}(s,v_6,v_7), S_4.\text{Mundus}(v_6,v_8)$
- the final result: $R_1(x) \cup R_2(x)$
GAV Unfolding
def: GAV Query unfolding (or GAV rewriting)
- let $Q(\vec{x}) \leftarrow G_1(\vec{z}_1), \ ..., \ G_n(\vec{z}_n)$ be a query over global schema
- $\forall \ G_i \ \exists$ GAV mapping $G_i \supseteq q_i(\vec{x}_i, \vec{y}_i)$
- where in $q_i(\vec{x}, \vec{y})$: $\vec{x}$ - distinguished variables, $\vec{y}$ - existential
- an unfolding of $Q(\vec{x})$ is a query $U$ that is obtained by
- replacing each conjunct $G_i(\vec{z}_i)$ by $q_i \big( \Psi_i(\vec{x}, \vec{y}) \big)$
- $\Psi_i(\vec{x}, \vec{y})$ maps
- variables $\vec{x}$ of $q_i$ to $\vec{z}$ and
- existential variables $\vec{y}$ to some new variables (needed to avoid naming conflicts - and therefore unnecessary constraints)
Simplification
- each unfolding then simplified (redundant joins/conjuncts are removed)
- and obtain rewritings
Main Limitations of GAV Mediation
- Adding and removing data sources is costly
- it may require revising all the mappings
- for Web, servers may come and go
- so another approach is needed
- thus, for Semantic Web, LAV Mediation is more preferred
See Also
Source
- Web Data Management book [1]