|
|
Line 188: |
Line 188: |
| | | |
| | | |
− | == ODBC in Practice == | + | == ODBA in Practice == |
| === ONTOP === | | === ONTOP === |
| ONTOP: http://ontop.inf.unibz.it | | ONTOP: http://ontop.inf.unibz.it |
Line 203: |
Line 203: |
| | | |
| == Source == | | == Source == |
− | * Web Data Management book [http://webdam.inria.fr/Jorge] | + | * [[Web Data Management (book)]] |
| * [[XML and Web Technologies (UFRT)]] | | * [[XML and Web Technologies (UFRT)]] |
| * [[Semantic Web for the Working Ontologist (book)]] | | * [[Semantic Web for the Working Ontologist (book)]] |
Line 210: |
Line 210: |
| | | |
| [[Category:Semantic Web]] | | [[Category:Semantic Web]] |
| + | [[Category:Databases]] |
Latest revision as of 15:43, 23 November 2015
Ontology Based Data Access
For querying ontologies
- typically using SPARQL
- keep data in a db (or a triple store), but access it via ontologies
- as a bonus, have inference capabilities during query answering, since it's based on Logic
- also useful for Data Integration
Query Answering
Difference: Ontologies and traditional Databases
- In DBMS all facts are explicit, but in Semantic Web, there are inferred tuples
- Constraints: can't violate in RDBMs, additional facts are inferred in SW to satisfy the constraints
Use SPARQL for querying ontologies
Example:
SELECT ?x WHERE {
?x :EnrolledIn ?y .
?z :Leads ?y .
?z rdf:type :Professor .
}
Translation:
- FOL: $Q(x) \equiv \forall x \ \exists \ y, z \ : \ \text{EnrolledIn}(x, y) \land \text{Leads}(z, y) \land \text{Professor}(z)$
- CQ: $Q(x) \leftarrow \text{EnrolledIn}(x, y), \text{Leads}(z, y), \text{Professor}(z)$
Inference Approaches
But since the ontologies are backed by some storage,
- need to make sure that inference happens
- otherwise we will just query facts not backed by TBox of our ontology
Main Approaches
Cached Inference
- inferred triples are stored along with asserted
- risk an explosion of the triple store
- also, change management is important - how to propagate changes and deletes
- for deletes - same inferred tuple can be due to several facts, so need to be careful when deleting
Just-In-Time Inference
- To respond to queries only
- no inferred triples retained
Compromise
- can be materialization of some inferences tuples
Just-In-Time Inference
Query
- (for ABox and TBox, see Descriptive Logic)
- using both ABox (facts - RDF graph) and TBox (rules - Ontology)
- a triple is in an answer set either
- because it's in the ABox
- or it's a consequence of some fact from ABox inferred by the TBox
Note:
So, Answer set evaluation:
- consists of two phases
- query reformulation (rewriting)
- translate the original query $q$ into a set of queries $Q$
- reasoning happens here: Only TBox is accessed
- algorithm for rewriting: #Perfect Rewriting
- query execution
- for each $q_i \in \{ q \} \cap Q$
- execute $q_i$ against the ABox
- (simply evaluating $q$ will give us an incomplete result)
Example
Consider this query $Q(x)$:
- $Q(x) \leftarrow \text{EnrolledIn}(x, y), \text{Leads}(z, y), \text{Professor}(z)$ can become:
Set of rules (FOL notation)
- $\text{AcademicStaff} (X) \Rightarrow \text{Staff} (X)$
- $\text{Professor}(X) \Rightarrow \text{AcademicStaff} (X)$
- $\text{Lecturer}(X) \Rightarrow \text{AcademicStaff} (X)$
- $\text{PhDStudent}(X) \Rightarrow \text{Lecturer}(X)$
- $\text{PhDStudent}(X) \Rightarrow \text{Student}(X)$
- $\text{TeachesIn}(X, Y) \Rightarrow \text{AcademicStaff}(X)$
- $\text{TeachesIn}(X, Y) \Rightarrow \text{Course}(Y)$
- $\text{ResponsibleOf} (X, Y) \Rightarrow \text{Professor}(X)$
- $\text{ResponsibleOf} (X, Y) \Rightarrow \text{Course}(Y)$
- $\text{TeachesTo}(X, Y) \Rightarrow \text{AcademicStaff} (X)$
- $\text{TeachesTo}(X, Y) \Rightarrow \text{Student}(Y)$
- $\text{Leads}(X, Y) \Rightarrow \text{AdministrativeStaff} (X)$
- $\text{Leads}(X, Y) \Rightarrow \text{Dept}(Y)$
- $\text{RegisteredIn}(X, Y) \Rightarrow \text{Student}(X)$
- $\text{RegisteredIn}(X, Y) \Rightarrow \text{Course}(Y)$
- $\text{ResponsibleOf}(X, Y) \Rightarrow \text{TeachesIn}(X, Y)$
- $\text{Professor}(X) \Rightarrow \exists Y \ : \ \text{TeachesIn}(X, Y)$
- $\text{Course}(X) \Rightarrow \exists Y \ : \ \text{RegisteredIn}(Y, X)$
- $\text{Student}(X) \Rightarrow \lnot \text{Staff} (X)$
The following are reformulations of $Q(x)$
- $q_{1}(x) \leftarrow \text{ResponsibleOf}(x,y), \text{RegisteredIn}(z,y), \text{Student}(z)$
- $q_{2}(x) \leftarrow \text{TeachesIn}(x,y), \text{RegisteredIn}(z,y), \text{PhDStudent}(z)$
- $q_{3}(x) \leftarrow \text{TeachesIn}(x,y), \text{RegisteredIn}(z,y), \text{TeachesTo}(_,z)$
- $q_{4}(x) \leftarrow \text{TeachesIn}(x,y), \text{RegisteredIn}(_,y)$
- $q_{5}(x) \leftarrow \text{ResponsibleOf}(x,y), \text{RegisteredIn}(z,y), \text{PhDStudent}(z) $
- $q_{6}(x) \leftarrow \text{ResponsibleOf}(x,y), \text{RegisteredIn}(z,y), \text{TeachesTo}(_,z)$
- $q_{7}(x) \leftarrow \text{ResponsibleOf}(x,y), \text{RegisteredIn}(_,y)$
- $q_{8}(x) \leftarrow \text{ResponsibleOf}(x,y), \text{RegisteredIn}(z,y), \text{PhDStudent}(z)$
- $q_{9}(x) \leftarrow \text{ResponsibleOf}(x,y), \text{RegisteredIn}(z,y), \text{TeachesTo}(_,z)$
- $q_{10}(x) \leftarrow \text{ResponsibleOf}(x,y), \text{RegisteredIn}(_,y)$
- $q_{11}(x) \leftarrow \text{TeachesIn}(x,y), Course(y)$
- $q_{12}(x) \leftarrow \text{TeachesIn}(x,_)$
- $q_{13}(x) \leftarrow \text{ResponsibleOf}(x,_)$
- $q_{14}(x) \leftarrow \text{Professor}(x)$
And the result is
- union of all queries:
- $q^*(x) \leftarrow q(x) \cup q_1(x) \cup ... \cup q_{14}(x)$
Algorithm
Evaluating a query
- given a (Union of) CQs q and DL ontology $O = \langle T, A \rangle$
- compute the perfect rewriting of $q$ over $T$
- evaluate over $A$
Computing the Perfect Rewriting
- start from $q$
- iteratively get $q'$ and collect a union of queries $\text{PR}$
- unify an atom of $q$ using inclusion
- unity an atom on $q'$ to obtain more specific CQ to expand further
Reference:
- Web Data Management book, section 9.4
- "Answering queries through DL-LITE ontologies"
- 9.4.3 Answer set evaluation
- PerfectRef algorithm - page 170
ODBA Architecture
There are 3 main components
- Ontology - unified conceptual view of managed information
- Data Sources - external, possible heterogeneous
- Mappings - map data from DS to ontology
Formalization
A OBDA is $O = \langle T, S, M \rangle$ where
- $T$ - is a DL Tbox
- $S$ - (federated) database that represents the sources
- $M$ - mapping assertions
- each of the form $\Phi(\vec{x}) \mapsto \Psi(\vec{x})$
- $\Phi(\vec{x})$ - FOL query over $S$, returns facts - values for $\vec{x}$
- $\Psi(\vec{x})$ - FOL over $T$
- so mappings from $M$ translates queries over $S$ to queries over $T$
Mappings
Mappings set $M$
- $M$ is crucial in OBDA
- it encodes how to use data from $S$ to populate elements of $T$
Mappings:
- each mapping $m \in M$ of the form $m: \Phi(\vec{x}) \mapsto \Psi(\vec{x})$
- $\Phi(\vec{x})$ - FOL query over $S$, returns facts - values for $\vec{x}$
- $\Psi(\vec{x})$ - FOL over $T$
- so mappings from $M$ translates queries over $S$ to queries over $T$
Virtual Data Layer (VDL) - virtual ABox
- $S$ and $M$ define a VDL $V = M(S)$
- so, queries are answered using $T$ and $V$
- but we don't materialize data in $V$ - it's virtual
- and information in $T$ and $M$ is used to translate queries over $T$ into queries over $S$
- queries over $V$ are answered in the same way:
ODBA in Practice
ONTOP
ONTOP: http://ontop.inf.unibz.it
ONTOP:
- Translates SPARQL to SQL
- Can work as a SPARQL endpoint
-
- Quest [1] is a component that does the translation
Source