Data integration

  • suppose we have a distributed database across many servers
  • each row is some entity, a column represents some property of this entity, and the cell contains a value described by this property
  • inside a cell we can refer to another entity, and the meaning of the relationship is described by the name of the column
  • so each cell of this database can be seen as a triple row column value
    • row = resource/subject
    • column = predicate
    • value = object
  • since the database is distributed, how to know if a resource on one server is the same resource from another?
    • describe resources with a global ID - URI (uniform resource identifier_
  • this is the main idea of RDF


RDF - resource description framework, a way to represent knowledge for the Semantic Web

  • knowledge representation based on triples $\langle \text{subject}, \ \text{predicate}, \ \text{object} \rangle$
  • the triples can form a graph
    • nodes - resources
    • edges - predicates
    • both represented with URIs

Descriptive Logic

  • there's a strong link between RDF and logic
  • a set of RED triples can be interpreted as a conjunction of positive literals


one word can have several meaning

  • e.g. Washington - state, city, person
  • how to tell them apart?
  • use namespaces

namespaces are typically URIs (like in XML)

Default namespaces in RDF

  • xsd: for primitive XML types
  • rdf: for default things in rdf
  • rdfs: for RDFS
  • owl: for OWL


Example 1

  • suppose we have these statements
    • doc.html is written by Fabien
    • doc.html is about music
  • so we have these tripes
    • doc.html isWrittenBy fabien
    • doc.html about music
  • it can be represented by the following graph
    • rdf-ex1.png
    • every edge in this graph is an RDF triple

Example 2: Modeling with RDF

Types and Properties

rdf:type predicate provides basic typing system

Blank Nodes

RDF allows resources to have no id at all

  • Sometimes we know that something exists
  • And even know something about it
  • but don't know its identity

For example,

  • we know that Shakespeare had a mistress, but we don't know her
  • and that she was the source of the inspiration for one of his works
  • try to model as follows
"unknown" rdf:type bio:Woman
"unknown" bio:livedIn geo:England
lit:Sonnet79 lit:hasInspiration "unknown"

We should interpret it as

  • there exists a woman who lived in England and is the source of inspiration for "Sonnet 79"
  • so blank nodes interpreted as existential variables

In Turtle it's

  • lit:Sonnet78 lit:hasInspiration [a bio:Woman; bio:livedIn geo:England]

Semantic Web

RDF is a basis for the Semantic Web

  • RDFS is schema for RDF that allows some basic inference
  • RDFS-Plus extension of RDFS, and subset of OWL
  • OWL - Web Ontologies Language

All of them use RDF to express the language constructs


  • SPARQL is used for querying RDF graphs

RDF Serialization

Default is triplets - not very compact and user friendly

  • rdf-tripples.png
  • need different representation

There are several:

See Also