

Line 113: 
Line 113: 
   
 == Source ==   == Source == 
−  * Web Data Management book [http://webdam.inria.fr/Jorge]  +  * [[Web Data Management (book)]] 
   
 [[Category:Data Integration]]   [[Category:Data Integration]] 
Latest revision as of 15:47, 23 November 2015
GAV Mediation
There are two main approached for Mediating in Data Integration
 GAV Mediation  defining global relations in terms of local
 LAV Mediation  defining local relations in terms of global
GAV  GlobalasView Mediation
GAV Mapping
A GAV mapping is an expression of the form
 $R(x_1, ..., x_n) \supseteq Q(x_1, ..., x_n)$
 where $Q(x_1, ..., x_n)$ is a Conjunctive Query of the same arity as $R$
 since $Q(x_1, ..., x_n) \leftarrow A_1(...), \ ..., \ A_k(...)$, can rewrite as $R(x_1, ..., x_n) \supseteq A_1(...), \ ..., \ A_k(...)$
 so a mapping is some query over some source relations, also called a view
FOL Semantics of this mapping
 $\forall \ x_1, ..., x_n \ \exists \ y_1, ..., y_m \ : \ A_1(...), \ ..., \ A_k(...) \Rightarrow R(x_1, ..., x_n)$
 $x_1, ..., x_n$  distinguished variables,
 $y_1, ..., y_n$  existential variables
GAV Mapping Example
Data sources:
 S1.Catalogue(nomUniv, programme).  programs in French universities
 S2.Erasmus(student, course, univ).  European Erasmus students
 S3.CampusFr(student, program, university).  foreign students in France
 S4.Mundus(program, course).  international master programs
Global Schema:
 MasterStudent(studentName),
 University(uniName),
 MasterProgram(title),
 MasterCourse(code),
 EnrolledIn(studentName,title),
 RegisteredTo(studentName, uniName).
The GAV mapping for the global schema is the following
 MasterStudent(N) $\supseteq$ S2.Erasmus(N, C, U), S4.Mundus(P, C)
 MasterStudent(N) $\supseteq$ S3.CampusFr(N, P, U), S4.Mundus(P, C)
 University(U) $\supseteq$ S1.Catalogue(U, P)
 University(U) $\supseteq$ S2.Erasmus(N, C, U)
 University(U) $\supseteq$ S3.CampusFr(N, P, U)
 MasterProgram(T) $\supseteq$ S4.Mundus(T, C)
 MasterCourse(C) $\supseteq$ S4.Mundus(T, C)
 EnrolledIn(N, T) $\supseteq$ S2.Erasmus(N, C, U), S4.Mundus(T, C)
 EnrolledIn(N, T) $\supseteq$ S3.CampusFr(N, T, U), S4.Mundus(T, C)
 RegisteredTo(N, U) $\supseteq$ S3.CampusFr(N, T, U)
 left side: global; right side: local
Query Answering
To evaluate a query
 for answering some query against the global schema, need to find the relevant data sources
 then we issue queries for each data source and combine the result
GAV Unfolding (informal)
 for each atom $A_i(...)$ of the query
 if this atom can be matched to a head of some mapping $R_j(...)$
 replace the atom $A_i(...)$ by the body of the mapping $R_j(...)$
Illustration
Illustration by example
 Consider this query:
 $Q(x) \leftarrow \underbrace{\text{RegistersTo}(s, x)}_\text{(1)}, \underbrace{\text{MasterStudent}(s)}_\text{(2)}$
 for $\text{(1)}$, one mapping can be found, for $\text{(2)}$  two mappings
 so we can have two unfoldings:
 $Q_1(x) \leftarrow S_3.\text{CampusFr}(s,v_1,x), S_2.\text{Erasmus}(s,v_2,v_3), S_4.\text{Mundus}(v_4,v_2)$
 $Q_2(x) \leftarrow S_3.\text{CampusFr}(s,v_5,x), S_3.\text{CampusFr}(s,v_6,v_7), S_4.\text{Mundus}(v_6,v_8)$
 note that $Q_2$ can be simplified (by removing a redundant join)
 so we have the following two rewritings:
 $R_1(x) \leftarrow S_3.\text{CampusFr}(s,v_1,x), S_2.\text{Erasmus}(s,v_2,v_3), S_4.\text{Mundus}(v_4,v_2)$
 $R_2(x) \leftarrow S_3.\text{CampusFr}(s,v_6,v_7), S_4.\text{Mundus}(v_6,v_8)$
 the final result: $R_1(x) \cup R_2(x)$
GAV Unfolding
def: GAV Query unfolding (or GAV rewriting)
 let $Q(\vec{x}) \leftarrow G_1(\vec{z}_1), \ ..., \ G_n(\vec{z}_n)$ be a query over global schema
 $\forall \ G_i \ \exists$ GAV mapping $G_i \supseteq q_i(\vec{x}_i, \vec{y}_i)$
 where in $q_i(\vec{x}, \vec{y})$: $\vec{x}$  distinguished variables, $\vec{y}$  existential
 an unfolding of $Q(\vec{x})$ is a query $U$ that is obtained by
 replacing each conjunct $G_i(\vec{z}_i)$ by $q_i \big( \Psi_i(\vec{x}, \vec{y}) \big)$
 $\Psi_i(\vec{x}, \vec{y})$ maps
 variables $\vec{x}$ of $q_i$ to $\vec{z}$ and
 existential variables $\vec{y}$ to some new variables (needed to avoid naming conflicts  and therefore unnecessary constraints)
Simplification
 each unfolding then simplified (redundant joins/conjuncts are removed)
 and obtain rewritings
Main Limitations of GAV Mediation
 Adding and removing data sources is costly
 it may require revising all the mappings
 for Web, servers may come and go
 so another approach is needed
 thus, for Semantic Web, LAV Mediation is more preferred
See Also
Source