Mediator
This is an approach to Data Integration (opposite to Data Warehousing)
- data remains in the data sources (so it’s sometimes called “virtual data integration”)
- also better if you want to access “fresh” data
- but way harder to implement - need to transform data during the query time
- need to use Ontologies for that, no ETLs
Architecture
Global Schema
- start by designing a global (‘‘mediated’’) schema - an unique entry point for all the queries
- need to have semantic mappings between the mediated schema and data sources
Querying:
- user queries the global schema
- based on the mappings, queries are converted to local queries for the data sources
- all queries are executed
- then the results are combined (e.g. using some Ontologies - which is why this approach is useful for OBDA)
Semantic Mapping
Let $S_1, …, S_n$ be ‘‘local schemas’’
- assume that each $S_i$ has only one relation, also denoted $S_i$
- these $S_1, …, S_n$ are ‘‘local relations’’
'’Global schema’’ $G$
- consists of ‘‘global relations’’ $G_1, …, G_m$
Goal of ‘‘Semantic Mappings’’:
- specify ‘‘mappings’’ between ${ S_i }$ and ${ G_i }$ - relationships between local and global schemas
- examples of such relationships:
- $G_1 \equiv S_1$ - equality
- $G_2 \equiv S_1 \cup S_2$
- $G_3 \equiv S_1 \Join S_3$
- so global $G_j$ can be seen as views of local relationships (example of GAV Mediation)
But better to use containment instead of equality
- to be able to express the usage of multiple sources
- example (GAV Mediation)
- $G_3 \supseteq S_1 \Join S_3$
- $G_3 \supseteq \sigma_{A = \text{yes}} ( S_4 )$
- example (LAV Mediation)
- $S_4 \subseteq G_1 \Join G_3$
Notation
- $v(S_1, …, S_n)$ - local view (a view/query built on local schemas)
- $v(G_1, …, G_m)$ - global view (on global schemas)
Mediation Approaches
GAV Mediation - Global-as-View
global is constrained by views of the local relations
GAV are mappings of the following form
- $v_i(S_1, …, S_n) \subseteq G_i$
- or, equivalently, $G_i \supseteq v_i(S_1, …, S_n)$
LAV Mediation - Local-as-View
- contribution of each data source $S_i$ is specified independently of other data source
- typical for Semantic Web based systems
GAV are mappings of the following form
- $S_i \subseteq v_i(G_1, …, G_m)$
Main algorithms for query rewriting in LAV Meditation:
Discussion
- all these algorithms have the same complexity
- but in experiments (from the book) show that Minicon outperforms others
- no algorithm handles additional knowledge (ontologies)
Ontology Based Data Access
- Typically LAV is used along with OBDA
Sources
- Web Data Management book [http://webdam.inria.fr/Jorge]