DTD
DTD, or Document Type Definition, is a language for defining schemas for XML
- to validate the content of XML documents
- it uses Regular Expressions for validation
- it is an integral part of XML specification
Regular Expression
The following defines the valid content of table
element in XTML
-
caption? ( col* | colgroup* ) thead? tfoot? ( tbody+ | tr+ )
- with this expression, the following is a minimal possible word that matches it tbody tr
Schema Definition
Declaration
To declare schema
-
add <| DOCTYPE {name} SYSTEM "http://{uri}.dtd">
to the beginning of an XML document<?xml version=”1.1”?> < DOCTYPE collection SYSTEM “http://foo.fr/example.dtd”> …
</collection>
| ELEMENT |The declaration for elements (tags or titles in our XML documents) is the following |- <| ELEMENT {name} {content_mode}> where |- name - a name of some tag from the document |- content_mode EMPTY
, ANY
or
#PCDATA
“parsed character data”, only one text element is allowed in the content- some regular expression over the tag names
Example:
-
<| ELEMENT table (caption?,(col*|colgroup*),thead?,tfoot?,(tbody+|tr+)) >
- colon ( ,
) is used to express concatenation, pipe (|
) - to express union
| ATTLIST
|Also we need to declare attributes for elements |- syntax: <| ATTLIST {tag} {attribute} {type} {#REQUIRED|#IMPLIED}>
| |
<| ATTLIST input maxlength CDATA #REQUIRED | tabindex CDATA #IMPLIED> |
#IMPLIED
= optional,#REQUIRED
= not optional- CDATA: any value
<| ATTLIST p align (left|center|right|justify) #IMPLIED>
``` |- here we enumerate possible values of attributes |- for optional attributes can also put the default value
```f#
<| ATTLIST form method (get|post) "get">
``` |- the default value of this attribute is <code>get</code> |
## Example
### Solar System
Consider the following XML:
```xml
<solar_system>
<star>
<name>Sun</name>
<spectral_type>G2</spectral_type>
<age unit="billions years">5</age>
</star>
<planet type="telluric">
<name>Earth</name>
<distance unit="km">149600000</distance>
<mass unit="kg">5.98e24</mass>
<diameter unit="km">12756</diameter>
<satellite number="1"/>
</planet>
<planet ring="yes" type="gaseous">
<name>Saturn</name>
<distance unit="UA">5.2</distance>
<mass unit="Earth mass">95</mass>
<diameter unit="Earth diameter">9.4</diameter>
<satellite number="18"/>
</planet>
<planet ring="yes" type="gaseous">
<name>Uranus</name>
<distance unit="UA">19.2</distance>
<mass unit="Earth mass">14.5</mass>
<diameter unit="Earth diameter">4</diameter>
<satellite number="15"/>
</planet>
</solar_system>
DTD-schema for this examples is:
<| ELEMENT solar_system (star,planet+)> | |<| ELEMENT star (name,spectral_type,age)> | |<| ELEMENT name (#PCDATA)> |<| ELEMENT spectral_type (#PCDATA)> |<| ELEMENT age (#PCDATA)> |<| ATTLIST age unit CDATA #REQUIRED> | |<| ELEMENT planet (name,distance,mass,diameter,satellite?)> |<| ATTLIST planet ring CDATA #IMPLIED> |<| ATTLIST planet type CDATA #REQUIRED> | |<| ELEMENT distance (#PCDATA)> |<| ATTLIST distance unit CDATA #REQUIRED> | |<| ELEMENT mass (#PCDATA)> |<| ATTLIST mass unit CDATA #REQUIRED> | |<| ELEMENT diameter (#PCDATA)> |<| ATTLIST diameter unit CDATA #REQUIRED> | |<| ELEMENT satellite EMPTY> |<| ATTLIST satellite number CDATA #REQUIRED> |
``` |
Note the limitation:
- we cannot have two different <code>name</code> elements
- i.e. can have only one definition per one name
For example, the following is not possible to validate with DTD
```xml
<planet>
<name language="English">Earth</name>
</planet>
<star>
<name>Sum</name>
</star>
Limitations
- Specification of attribute values is too limited
- Element and attribute declarations are context insensitive
- Character data cannot be combined with the regular expression content model
- It does not itself use an XML syntax
- No support for namespaces
XML Schema
- More expressive
- XML itself
- support namespaces
- many other things
Links
- http://en.wikipedia.org/wiki/Document_type_definition