XML Schemas

Introduction

We now know how to author an XML document.

Let's learn how to structure a XML document.

DTD - Document Type Definition, is a mechanism used to describe the structure of a document.

DTD lays down all the rules of how an XML document should look like. The use of XML tag document and DTD is sometimes called document modeling.

 

Element Type Declarations

Use the tag <!ELEMENT tag

<!ELEMENT elementName keyWord>

Example:

<!ELEMENT Name (FirstName, MiddleName, LastName)>

An element of type Name must contain three subelements in the order of FirstName, MiddleName and LastName.

-----

<!ELEMENT Name (FirstName, MiddleName?, LastName)>

MiddleName is optional

-----

<!ELEMENT language (English | Chinese)>

A element of type language contains either a single element English or a single element Chinese.

-----

<!ELEMENT people (male | female)+>

<people>
<male> ... </male>
<male> ... </male>
<female> ... <female>
</people>

-----

<!ELEMENT printer (deskjet | laser)*>

Zero or more set

-----

<!ELEMENT Fax EMPTY>

fax does not contain anything. 

<Fax/>

-----

<!ELEMENT text (#PCDATA | picture)*>

PCDATA - character data

-----

<!element printer (laser | deskjet)>

Error: element has to be upper case (ELEMENT)

 

-----

<!ELEMENT address (street, aptNum?, city, state, zip, country?)>

Optional element

 

------

<!ELEMENT name (firstName | LastName | #PCDATA)*>

Components of mixed content must always be separated by |

 

-----

 

Occurrence Indicators

+ * ?

Connectors

, |

Terms and Special Character

PCDATA - A mixture of character data. It is usually used for leaf elements (elements with no child elements).

EMPTY - indicates an element is an empty element (leaf element).

Regular expression: + (One or more of a kind)

Regular expression: * (Zero or more of a kind)

Regular expression: ? (Optional element)

 

Attribute List Declarations

<!ATTLIST Address type (HOUSE|APT) "APT">

An element of type Address has an attribute name called type with either HOUSE or APT. APT is the default value.

-----

<!ATTLIST owner type (STUDENT|PROFESSIONAL) "STUDENT">

Element owner has attribute type with default value STUDENT.

-----

<!ATTLIST owner age CDATA #REQUIRED>

 

-----

<!ATTLIST MiddleName init (A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z) #IMPLIED>

 

-----

<!ATTLIST workplace location CDATA #FIXED "Cupertino">

 

Terms and Special Character

CDATA - Any string of characters except for <, > and &.

ID for identifier. It is a name that is unique in the document

IDREF - value of an ID reference elsewhere in the same document

IDREFS is a list of IDREF separated by spaces

ENTITY - name of an external entity. It is like a macro.

ENTITIES is a list of ENTITY separated by spaces

NMTOKEN - a word without spaces

NMTOKENS is a list of NMTOKEN separated by spaces

#REQUIRED - A value must be supplied.

#IMPLIED - If a value is not supplied, the XML application decides what to put in.

#FIXED - A value is fixed, otherwise, error is rasied.

 

Entity Declarations

Use like a C macro

<!ENTITY rsinn "Richard Sinn">

The entity is called rsinn. When referenced in an XML document, the parser will insert the replacement text "Richard Sinn". Example:
<author>
&rsinn
</author>

-----

<!ENTITY chapter2 SYSTEM "http://www.openloop.com/xml/chapter2.xml">

<toc>
&chapter2
</toc>

 

Internal and External DTD Subsets

Internal - DTD inserted into the document itself

 

<!DOCTYPE address [

<!ELEMENT address (street, aptNum?, city, state, zip, country?)>

<!ATTLIST address primary (yes | no) "yes">

<!ELEMENT street (#PCDATA)>

<!ELEMENT aptNum (#PCDATA)>

<!ELEMENT city (#PCDATA)>

<!ELEMENT state (#PCDATA)>

<!ELEMENT zip (#PCDATA)>

<!ELEMENT country (#PCDATA)>

]>

 

External - DTD is not stored in the document.

 

<!DOCTYPE address-format SYSTEM "http://www.openloop.com/dtd/address-format.dtd">

 

Terms and Special Character

SYSTEM - system identifier, a Universal Resource Identifier (URI) pointing to the DTD. URI is a superset of URL.

PUBLIC - public identifier pointing to DTD by ISO with rules from ISO 9070.

 

 

Putting all of the stuffs together

sinn.xml

<?xml version="1.0"?>

<!DOCTYPE profile SYSTEM "profile.dtd">

<profile>
<owner type = "STUDENT" age = "20">
	<Name>
		<FirstName>Richard</FirstName>
		<MiddleName init = "P">Pong Nam</MiddleName>
		<LastName>Sinn</LastName>
	</Name>

	<Phone>
		<Home>(000)000-0000</Home>
		<Work>(000)000-0000</Work>
		<Fax/>
		<Pager/>
		<Cell/>
	</Phone>

	<Address type = "HOUSE">
		<StreetAddr>555 Bailey Avenue</StreetAddr>
		<City>San Jose</City>
		<State>Ca</State>
		<ZipCode>95141</ZipCode>
	</Address>

	<Email>
		<ul>
			<li>sinn@us.ibm.com</li>
			<li>sinn@mathcs.sjsu.edu</li>
			<li>webmaster@openloop.com</li>
		</ul>
	</Email>

	<Education>
		<Institution>
			<GraduationDate>1998</GraduationDate>
			<schoolName>University of Minnesota-Twin Cities</schoolName>
			<degree type = "MS" major =  "CS" gpa = "3.97"/>
		</Institution>

		<Institution>
			<GraduationDate>1994</GraduationDate>
			<schoolName>University of Wisconsin-Madison</schoolName>
			<degree type = "BS" major =  "CS" gpa = "3.80"/>
		</Institution>
	</Education>

	<techSkills>
		<Languages>Java</Languages>		
		<Languages>C++</Languages>
		<Languages>C</Languages>
		<Languages>JavaScript</Languages>
		<Languages>XML</Languages>
		<Languages>HTML</Languages>
		<Languages>SQL</Languages>

		<System>Windows</System>				
	</techSkills>

</owner>
</profile>

 

profile.dtd

<!-- ----------------------------------------------------
  --
  -- Document type Definition for the Profile Application 
  --
  -- ---------------------------------------------------- -->

<!-- An profile document contains one or more owners -->
<!ELEMENT profile (owner)+>

<!-- an owner contains these six sessions in this sequence -->
<!ELEMENT owner (Name, Phone, Address, Email, Education, techSkills)>

<!-- ---------------------------------------------------
  -- Every owner is either a STUDENT or PROFESSIONAL 
  -- This is indicated by its type attribute.
  -- If a value is not supplied for this attribute,
  -- it defaults to STUDENT 
  -- ---------------------------------------------------- -->
<!ATTLIST owner type (STUDENT|PROFESSIONAL) "STUDENT">

<!-- Every owner must also has a age attribute.-->
<!ATTLIST owner age CDATA #REQUIRED>

<!ELEMENT FirstName ANY>
<!ELEMENT LastName ANY>

<!ELEMENT Name (FirstName, MiddleName, LastName)>
<!ELEMENT MiddleName ANY>
<!ATTLIST MiddleName init (A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z) #IMPLIED>

<!ELEMENT Home ANY>
<!ELEMENT Work ANY>
<!ELEMENT Fax ANY>
<!ELEMENT Pager ANY>
<!ELEMENT Cell ANY>

<!ELEMENT Phone (Home, Work, Fax, Pager, Cell)>

<!ELEMENT StreetAddr ANY>
<!ELEMENT City ANY>
<!ELEMENT State ANY>
<!ELEMENT ZipCode ANY>

<!ELEMENT Address (StreetAddr, City, State, ZipCode)>
<!ATTLIST Address type (HOUSE|APT) "APT">

<!ELEMENT Email (ul)+>

<!ELEMENT li ANY>
<!ELEMENT ul (li)+>

<!ELEMENT Education (Institution)+>

<!ELEMENT GraduationDate ANY>
<!ELEMENT schoolName ANY>
<!ELEMENT degree ANY>

<!ELEMENT Institution (GraduationDate, schoolName, degree)> 

<!ATTLIST degree 
            type (BS|MS|PhD) "BS"
            major (CS|Math|Other) "CS"
            gpa CDATA #REQUIRED>

<!ELEMENT System ANY>
<!ELEMENT Languages ANY>
<!ELEMENT techSkills (System|Languages)+>

 

Validation of XML and DTD documents

xmlint action

C:\sinn\book\xml\programs\resume>xmlint  sinn.xml
sinn.xml
        The element 'FirstName' is used but not declared in the DTD/Schema.
        URL: file:///C:/sinn/book/xml/programs/resume/sinn.xml
        Line 00008:   <FirstName>Richard</FirstName>
        Pos  00014: -------------^

C:\sinn\book\xml\programs\resume>xmlint  sinn.xml
sinn.xml

 

Modeling XML document

 

Copyright 1996-2001 OpenLoop Computing. All rights reserved.