 |
Bottom
XML by example
by Richard Sinn
hile
humans communicate verbally, companies communicate with documents.
In a traditional setting, companies have application forms, memos,
account receivable documents, purchase orders, and so on, to
communicate within the organization or outside to other companies.
In todays e-business world, most companies have Web sites and they
use HTML documents as their main communication vehicle for their
customers and business partners.
Documents are traditionally in human readable or what you see is
what you getEformat to enable the communication between humans. Some
of the most popular document editing tools for editors are formatted
documents. While formatted documents look and print extremely well
on paper, they often are not suitable for Web publishing. Most Web
documents such as HTML are in ASCII text format. For comparison,
Figure 1 shows a document named Profile00 in Word format and Figure
2 in XML format.
Table of contents
In need of a structured document - XML
Businesses need a form of document that can be understood by
humans and computers. In addition, the document must contain enough
added information to enable the understanding of its underlying
structure as well as the meaning of the data (meta data). XML was
created to do that. In general, an XML document consists of the
following three parts:
- Structure: Structure is the document type and the
organization of its elements. For example, memos, application
forms, resume, etc. A set of rules is in place to enforce what
kind of elements it contains, in what order they occur, and what
additional attributes of elements are allowed.
- Presentation: This is the way information is presented
to the reader on the Web, on a piece of paper or via voice
synthesis. Whether a block of text is in bold or italic, which
fonts to use, etc also are specified.
- Data content: The informational data contained in a
document.
This article presents an example to show how XML divides
structure, presentation and data content. Before that, let's take a
brief look at the history of XML as well as some well-known
applications.
Table of contents
History and applications of XML
XML has been in development since the 1960s through its parent
called SGML. SGML was set as an international standard in 1986 as
the way for structured document publishing. Although SGML contains a
lot of useful concepts and abilities for performing complex
publishing, it found little application in publishing other then
working as a difficult technology for high-end systems used by
corporations with deep pockets. In the mid-1990s, an SGML
application called HTML emerged as the main publishing method for
large-scale electronic documents on the World Wide Web. In 1996 a
working group in the World Wide Web Consortium started developing
XML as a streamlined version of SGML. In a way, XML is a cleaned
upEversion of SGML retaining its very powerful structured concept
but removing portions that are very complex and have limited
application. In other words, XML is a streamlined version of SGML
designed for transmission of structured data over the Web.
In order to provide better customer service, most of the
financial institutions provide online banking for their customers.
Users can purchase goods online with their credit card, download
their credit card statements and pay their bills without any concern
on how data are represented in different financial transactions by
different institutions. Online banking like this can be done by any
OFX-compliant application. OFX stands for Open Financial Exchange,
which is an XML application developed jointly by Microsoft, Intuit,
and Checkfree. (For more information, go to http://www.ofx.net/ofx/). An OFX
transaction in XML using Microsoft Money might look like the
following:
<RequestStatement>
<BankAccount>
<BankID>888</BankID>
<AccountID>9394</AccountID>
<AccountType>CHECKING</AccountType>
</BankAccount>
</RequestStatement>
|
Another common XML application initiative is Channel Definition
Format (CDF). It is an XML application that enables the timely
delivery of business-critical information. Users of ChannelsE locate
and register channel information of interest to them and their
business. After registration, any changes to the selected
information appears automatically rather than having to revisit and
download again. CDFs are used in WindowsEActive Channel, Active
Desktop and Microsoft Software Update.
Table of contents
Where to start
There is a lot of diverse XML information out on the Web. One of
the most popular places to get started is IBM developerWorks at http://www.ibm.com/developer/.
The developerWorks Web site is a good source of information on the
latest technology. Under the XML Zone, articles on different XML
topics including XML tutorials, development tools and sample codes
are available to download.
Table of contents
XML example
Figure 3 shows how XML data is developed. The first component is
an XML document that the contains content character data and marked
up with XML tags. Next, an XML document optionally can be associated
with a set of rules known as Document Type Definition (DTD). The DTD
specifies rules such as ordering of elements, default values, and so
on. The third component is the XML Parser that checks the XML
document against the DTD and then splits the document up into markup
regions and character-data regions. After processing with the XML
parser, the data now is in a structured format and can be processed
by any XML application.
Lets make a personal profile as our first XML example. XML
documents can be edited by any text editor such as notepad in
Windows or vi in UNIX. However, if you are using a plain text
editor, you have to manually type in all the tags. There are some
XML-specific editors, like the one shown in Figure 4, that help
eliminate the need to type the tags manually. However, most of the
editors today are not as functionally rich as your everyday word
processor counterparts.
<?xml version="1.0"?>
<!DOCTYPE profile SYSTEM "profile.dtd">
<Profile>
<Owner type = "STUDENT" age = "20">
<Name>
<FirstName>Richard</FirstName>
<MiddleName init = "P">Pong Nam</MiddleName>
<LastName>Sinn</LastName>
</Name>
<Phone>
<Home>(000)000-0000</Home>
<Work>(000)000-0000</Work>
</Fax>
</Pager>
</Cell>
</Phone>
<Address type = "HOUSE">
<StreetAddr>555 Bailey Avenue</StreetAddr>
<City>San Jose</City>
<State>Ca</State>
<ZipCode>95141</ZipCode>
</Address>
<Email>
<ul>
<li>sinn@us.ibm.com</li>
<li>sinn@mathcs.sjsu.edu</li>
<li>webmaster@openloop.com</li>
</ul>
</Email>
<Education>
<Institution>
<GraduationDate>1998</GraduationDate>
<schoolName>University of Minnesota-Twin Cities</schoolName>
<degree type = "MS" major = "CS" gpa = "3.97">
</Institution>
<Institution>
<GraduationDate>1994</GraduationDate>
<schoolName>University of Wisconsin-Madison</schoolName>
<degree type = "BS" major = "CS" gpa = "3.90"/>
</Institution>
</Education>
<TechSkills>
<Languages>Java</Languages>
<Languages>C++</Languages>
<Languages>C</Languages>
<Languages>JavaScript</Languages>
<Languages>XML</Languages>
<Languages>HTML</Languages>
<Languages>SQL</Languages>
<System>Windows</System>
</TechSkills>
</Owner>
</Profile>
|
In the above example, the processing instructions <?xml
version="1.0"?> indicate to the parser that we are using standard
XML version 1.0. The second line indicates that we are using
profile.dtd as our Document Type Definition. The current XML
document is checked against the rules stated in profile.dtd. The
example also shows how the start- and end-tag are used to contain
content data. All valid XML documents include a start-tag and an
end-tag. (For example, the start-tag <profile> is ended with
</profile>.)
In the following code sample, the element address has an
attribute called "type" that is set to the value HOUSE.E Address
also contains four sub-elements in this order: StreetAddr, City,
State, ZipCode.
<Address type = "HOUSE">
<StreetAddr>555 Bailey Avenue</StreetAddr>
<City>San Jose</City>
<State>Ca</State>
<ZipCode>95141</ZipCode>
</Address>
|
Table of contents
Document type definition (DTD) example
In order to ensure authors follow certain rules when writing an
XML document, a DTD is used. The following is the profile DTD used
in our example.
<!-- Document type Definition for the Profile Application -->
<!-- An profile document contains one or more owners -->
<!ELEMENT profile (owner)+>
<!-- an owner contains these six sessions in this sequence -->>
<!ELEMENT owner (Name, Phone, Address, Email, Education, techSkills)>
<!-- Every owner is either a STUDENT or PROFESSIONAL
This is indicated by its type attribute.
If a value is not supplied for this attribute,
it defaults to STUDENT -->
<!ATTLIST owner type (STUDENT|PROFESSIONAL) "STUDENT">
<!-- Every owner must also has a age attribute.-->
<!ATTLIST owner age CDATA #REQUIRED>
<!ELEMENT FirstName ANY>
<!ELEMENT LastName ANY>
<!ELEMENT Name (FirstName, MiddleName, LastName)>
<!ELEMENT MiddleName ANY>
<!ATTLIST MiddleName init
(A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z) #IMPLIED>
<!ELEMENT Home ANY>
<!ELEMENT Work ANY>
<!ELEMENT Fax ANY>
<!ELEMENT Pager ANY>
<!ELEMENT Cell ANY>
<!ELEMENT Phone (Home, Work, Fax, Pager, Cell)>
<!ELEMENT StreetAddr ANY>
<!ELEMENT City ANY>
<!ELEMENT State ANY>
<!ELEMENT ZipCode ANY>
<!ELEMENT Address (StreetAddr, City, State, ZipCode)>
<!ATTLIST Address type (HOUSE|APT) "APT">
<!ELEMENT Email (ul)+>
<!ELEMENT li ANY>
<!ELEMENT ul (li)+>
<!ELEMENT Education (Institution)+>
<!ELEMENT GraduationDate ANY>
<!ELEMENT schoolName ANY>
<!ELEMENT degree ANY>
<!ELEMENT Institution (GraduationDate, schoolName, degree)>
<!ATTLIST degree
type (BS|MS|PhD) "BS"
major (CS|Math|Other) "CS"
gpa CDATA #REQUIRED>
<!ELEMENT System ANY>
<!ELEMENT Languages ANY>
<!ELEMENT techSkills (System|Languages)+>
|
Most of the rules are documented with comments. Lets take a
closer look at the rules regarding address.
<!ELEMENT Address (StreetAddr, City, State, ZipCode)>
<!ATTLIST Address type (HOUSE|APT) "APT">
|
The first line states that an element of type Address can contain
four sub elements. The Address must have a StreetAddr element, then
City, State and finally ZipCode. The second line states that an
element of type Address has an attribute called "type" that is
either HOUSE or APT. The default value for attribute type is
APT.
Table of contents
Checking out XML document
There are many XML parsers available. If you visit ,
there are more than 10 free parsers available for download in the
XML section. In this article, I used a Microsoft command line
validation tool called XMLINT.EXE. It is an updated version of the
XMLINT command line tool that shipped in the Internet Explorer 4
SDK. The tool checks whether a given XML file is well formed. It
also uses the XML DOM to check that the document is valid according
to the DTD.
Figure 5 shows two error messages caused by a missing MiddleName
a </Name> tag. When correct, no error messages are returned by
the parser.
Table of contents
Viewing your XML document
You can view your XML document with any graphical user interface
(GUI). Before the release of Microsoft Internet Explorer 5.0 Web
browser, the only way of viewing an XML document was by using a Java
applet like the one shown in Figure 6. With Internet Explorer 5.0,
you could view your XML document natively in a browser, as shown in
Figure 7. Clicking the EEsign expands the XML session details, as
shown in Figure 8.
Table of contents
Conclusion
The Information Technology industry is full of buzzwords such as
groupware, directory system, Internet, intranets and extranets. Most
of the technologies have been hyped to death with very little
thought on what the Internet was designed for Eimproving how
information and resources are shared. XML can help you organize your
information and resources. It is the future of e-business
communication and Web publishing. I hope this article gives you a
quick introduction on where to download useful XML related tools and
helps to give you a jump start on learning XML.
Table of contents
Author
Richard Sinn is a Staff Software Engineer in IBM Santa
Teresa Laboratory, San Jose California. He is also a lecturer in San
Jose State University and a freelance writer for different magazines
and journals. He can be reached via e-mail at webmaster@openloop.com or
at his Web site at http://www.openloop.com/.
This document is maintained by devcon@us.ibm.com. |