ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

DTD

xml

DTD

DTD, or Document Type Definition, is a language for defining schemas for XML

  • to validate the content of XML documents
  • it uses Regular Expressions for validation
  • it is an integral part of XML specification

Regular Expression

The following defines the valid content of table element in XTML

  • caption? ( col* | colgroup* ) thead? tfoot? ( tbody+ | tr+ ) - with this expression, the following is a minimal possible word that matches it
  • tbody tr

Schema Definition

Declaration

To declare schema

  • add <| DOCTYPE {name} SYSTEM "http://{uri}.dtd"> to the beginning of an XML document   <?xml version=”1.1”?>  
    < DOCTYPE collection SYSTEM “http://foo.fr/example.dtd”>

</collection>

| ELEMENT |The declaration for elements (tags or titles in our XML documents) is the following |- <| ELEMENT {name} {content_mode}> where |- name - a name of some tag from the document |- content_mode EMPTY, ANY or

  • #PCDATA “parsed character data”, only one text element is allowed in the content
  • some regular expression over the tag names

Example:

  • <| ELEMENT table (caption?,(col*|colgroup*),thead?,tfoot?,(tbody+|tr+)) > - colon (,) is used to express concatenation, pipe (|) - to express union

| ATTLIST |Also we need to declare attributes for elements |- syntax: <| ATTLIST {tag} {attribute} {type} {#REQUIRED|#IMPLIED}> | |

<|  ATTLIST input maxlength CDATA #REQUIRED  |                tabindex CDATA #IMPLIED> |
  • #IMPLIED = optional, #REQUIRED = not optional
  • CDATA: any value
<|  ATTLIST p align (left|center|right|justify) #IMPLIED>
``` |- here we enumerate possible values of attributes |- for optional attributes can also put the default value 

```f#
<|  ATTLIST form method (get|post) "get">
``` |- the default value of this attribute is <code>get</code> |

## Example
### Solar System
Consider the following XML:

```xml
<solar_system>
  <star>
    <name>Sun</name>
    <spectral_type>G2</spectral_type>
    <age unit="billions years">5</age>
  </star>
  <planet type="telluric">
    <name>Earth</name>
    <distance unit="km">149600000</distance>
    <mass unit="kg">5.98e24</mass>
    <diameter unit="km">12756</diameter>
    <satellite number="1"/>
  </planet>
  <planet ring="yes" type="gaseous">
    <name>Saturn</name>
    <distance unit="UA">5.2</distance>
    <mass unit="Earth mass">95</mass>
    <diameter unit="Earth diameter">9.4</diameter>
    <satellite number="18"/>
  </planet>
  <planet ring="yes" type="gaseous">
    <name>Uranus</name>
    <distance unit="UA">19.2</distance>
    <mass unit="Earth mass">14.5</mass>
    <diameter unit="Earth diameter">4</diameter>
    <satellite number="15"/>
  </planet>
</solar_system>

DTD-schema for this examples is:

<|  ELEMENT solar_system (star,planet+)> | |<|  ELEMENT star (name,spectral_type,age)> | |<|  ELEMENT name (#PCDATA)> |<| ELEMENT spectral_type (#PCDATA)> |<| ELEMENT age (#PCDATA)> |<| ATTLIST age unit CDATA #REQUIRED> | |<|  ELEMENT planet (name,distance,mass,diameter,satellite?)> |<| ATTLIST planet ring CDATA #IMPLIED> |<| ATTLIST planet type CDATA #REQUIRED> | |<|  ELEMENT distance (#PCDATA)> |<| ATTLIST distance unit CDATA #REQUIRED> | |<|  ELEMENT mass (#PCDATA)> |<| ATTLIST mass unit CDATA #REQUIRED> | |<|  ELEMENT diameter (#PCDATA)> |<| ATTLIST diameter unit CDATA #REQUIRED> | |<|  ELEMENT satellite EMPTY> |<| ATTLIST satellite number CDATA #REQUIRED> |
``` |
Note the limitation:
- we cannot have two different <code>name</code> elements
- i.e. can have only one definition per one name


For example, the following is not possible to validate with DTD

```xml
<planet>
  <name language="English">Earth</name>
</planet>

<star>
  <name>Sum</name>
</star>

Limitations

  • Specification of attribute values is too limited
  • Element and attribute declarations are context insensitive
  • Character data cannot be combined with the regular expression content model
  • It does not itself use an XML syntax
  • No support for namespaces

XML Schema

  • More expressive
  • XML itself
  • support namespaces
  • many other things
  • http://en.wikipedia.org/wiki/Document_type_definition

See Also

Sources