ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

XML Schema

xml

XML Schema

This is a better alternative to DTD for validating XML documents

  • it is XML itself
  • it can express complex types
  • it supports namespaces

Types

Predefined Primitives

There are a lot of predefined primitives

Type Example of Value string any Unicode string   boolean true, false, 1, 0   decimal 3.1415   float 6.02214199E23   double 42E970   dateTime 2004-09-26T16:29:00-05:00   time 16:29:00-05:00   date 2004-09-26   hexBinary 48656c6c6f0a   base64Binary SGVsbG8K   anyURI http://0agr.ru/wiki/   QName rcp:recipe, recipe
<simpleType name="integerList"> 
  <list itemType="integer"/> 
</simpleType>

Simple Types: Constrains

We also can add additional constraints with so-called facets

  • length
  • minLength
  • maxLength
  • pattern
  • enumeration
  • whiteSpace

| |

  • maxInclusive
  • maxExclusive
  • minInclusive
  • minExclusive
  • totalDigits
  • fractionDigits
<simpleType name="score_from_0_to_100"> 
  <restrictionbase="integer"> 
    <minInclusivevalue="0"/> 
    <maxInclusivevalue="100"/> 
  </restriction> 
</simpleType> 

<simpleType name="percentage"> 
  <restriction base="string"> 
    <patternvalue="([0-9]| [1-9][0-9]|100)%"/>  |  </restriction> 
</simpleType>

So as you see it’s also possible to use Regular Expressions

  • to define constraints, etc

Derived Simple Types

We can derive a simple type from two simple types:

<simpleType name="boolean_or_decimal"> 
  <union> 
    <simpleType> 
  <restriction base="boolean"/> 
  </simpleType> 
  <simpleType> 
    <restriction base="decimal"/> 
  </simpleType> 
  </union> 
</simpleType>

There are some built-in derived types

  • normalizedString
  • unsignedLong

Complex Types

Elements:

  • reference to already defined element:

We can use Regular Expressions for restricting sequences of tags we can have:

Concatenation ...   Union ...   All ...   Element wildcard   ? minOccurs=”0” maxOccurs=”1”   + minOccurs=”1” maxOccurs=”unbounded”   * minOccurs=”0” maxOccurs=”unbounded”

Attributes

  • also can refer to already defined attributes

Extensions

Suppose we want to have an integer with a parameter

  • we can extend from integer and add an attribute
<complexType name="category"> 
  <simpleContent> 
    <extension base="integer"> 
      <attribute ref="r:class"/>
    </extension> 
  </simpleContent> 
</complexType>

Same way we can extend one more time

  • and add another attribute
<complexType name="extended_category"> 
  <simpleContent> 
    <extension base="n:category"> 
      <attribute ref="r:kind"/> 
    </extension> 
  </simpleContent> 
</complexType>

Also we can restrict some class

  • adding a restriction is a little bit different from extension
<complexType name="restricted_category"> 
  <simpleContent> 
    <restriction base="n:category"> 
      <totalDigits value="3"/> 
      <attribute ref="r:class" use="required"/> 
    </restriction> 
  </simpleContent> 
</complexType>

Local vs Global Declaration

Usually we define an element and reference it

  • this is called ‘‘global declaration’’

However it’s possible to describe an element with an anonymous type

<element name="card"> 
  <complexType> 
    <sequence> 
      <element name="name" type="string"/> 
      ... 
    </sequence> 
  </complexType> 
</element>

Note that it can cause some problems

  • locally defined elements must belong to the namespace - but they don’t
  • they are ‘‘unqualified’’ by default (no namespace is associated)
  • such elements belong to the element they’re associated with
  • so we have to set elementFormDefault="qualified"
  • some links: [and http://www.w3.org/TR/xmlschema-0/#NS

Examples

Solar System

Consider the following XML:

<solar_system>
  <star>
    <name>Sun</name>
    <spectral_type>G2</spectral_type>
    <age unit="billions years">5</age>
  </star>
  <planet type="telluric">
    <name>Earth</name>
    <distance unit="km">149600000</distance>
    <mass unit="kg">5.98e24</mass>
    <diameter unit="km">12756</diameter>
    <satellite number="1"/>
  </planet>
  <planet ring="yes" type="gaseous">
    <name>Saturn</name>
    <distance unit="UA">5.2</distance>
    <mass unit="Earth mass">95</mass>
    <diameter unit="Earth diameter">9.4</diameter>
    <satellite number="18"/>
  </planet>
  <planet ring="yes" type="gaseous">
    <name>Uranus</name>
    <distance unit="UA">19.2</distance>
    <mass unit="Earth mass">14.5</mass>
    <diameter unit="Earth diameter">4</diameter>
    <satellite number="15"/>
  </planet>
</solar_system>

Here’s the XML schema for it

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" 
  targetNamespace="http://foo.fr/solar_system" xmlns:s="http://foo.fr/solar_system">

  <xs:element name="solar_system">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="s:star"/>
        <xs:element maxOccurs="unbounded" ref="s:planet"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="star">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="name" type="xs:NCName"/>
        <xs:element name="spectral_type" type="xs:NCName"/>
        <xs:element name="age" type="s:measure_type"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="planet">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="name" type="xs:NCName"/>
        <xs:element name="distance" type="s:measure_type"/>
        <xs:element name="mass" type="s:measure_type"/>
        <xs:element name="diameter" type="s:measure_type"/>
        <xs:element ref="s:satellite"/>
      </xs:sequence>
      <xs:attribute name="ring" type="xs:NCName"/>
      <xs:attribute name="type" use="required" type="xs:NCName"/>
    </xs:complexType>
  </xs:element>

  <xs:complexType name="measure_type">
    <xs:simpleContent>
      <xs:extension base="xs:float">
        <xs:attribute name="unit" use="required" type="xs:NCName"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
  <xs:element name="satellite">
    <xs:complexType>
      <xs:attribute name="number" use="required" type="xs:integer"/>
    </xs:complexType>
  </xs:element>
  
</xs:schema>

Note the usage of xs:NCName

  • it is like string but doesn’t allow certain characters
  • see more here: [http://stackoverflow.com/questions/1631396/what-is-an-xsncname-type-and-when-should-it-be-used]

See Also

Sources