Goals
After today's session, the successful learner will be able to:
- understand the relationship between SGML, HTML, and XML
- identify and use the parts of an XML document
- construct a well-formed XML document
- parse an XML document
Reality check!
class makeup
the buzz
- xml as wundermarkup
- xml as replacement component for EDI
- xml as component of web services: (java/windows context) http carrying SOAP/XML-RPC, described by UDDL, listed in registries.
- problems: standards? security/authentication? micropayments?
XML, SGML, and HTML
- SGML (Standard Generalized Markup Language) - metalanguage for system-independent text processing. 60s IBM, 1986 intl standards.
- SGML application: HTML (HyperText Markup Language) - content-related but used primarily for presentation.
- SGML application: XML (Extensible Markup Language) - extensible, object-oriented, easily parsed, distributed, international, content-oriented [p.12|12]
Current uses: b2b transfer of data, interapplication communication, "data islands" in non-XML documents, real life example
Inherent challenges
- most xml does not have a human end-user; there is no concept of a display device, or display markup. We simulate an application that would use the xml content.
- most xml is not directly written by humans; it is usually generated by an application
- XML requires you to think abstractly and in an organized/hierarchical manner. Related disciplines: database design, taxonomies.
- cannot install software locally
- designed by you (a dozen reserved elements)
- intelligence is in the document - parser falls over
- structured/strict! cf HTML
- How do we write xml? BYOTE
- How do we see xml? Processors/parsers
Goofy first example
The structure of an XML document
- xml declaration <?xml version="1.0" standalone="yes/no"?> What symbols are syntactical? [p.28|47]
- elements (made up of tag[s]) - contain character data or other elements (nested). Case sensitive! Close those tags! aka "information objects" [p.30|28]
- root element
- open/closed pairs or empty
- name them yourselves: naming elements: [a-zA-Z_][a-zA-Z0-9._]*. Case sensitive: Look forward to namespaces.
a word on style: someElement, SomeElement, some_element, some.element
- nesting: strict, indent them [p.32|32]
another word on style and indentation.
- attributes [p.31|38] - always in the opening tag. More to follow in DTD section.
- comments - like html, no reflexive nesting, cannot break other elements [p.43|42-43]
- entity references for amp, apos, gt, lt, quot (character references) [p.45|423]
- CDATA to hold chunks of non-parsed elemental data
- <emptytags/>
Design considerations
- topic
- audience (some b2b app? a business partner? a web browser?
- atomic or combined information - e.g. phone numbers, names
- attributes v. a more detailed structure - atts severely limit the ability to transform and search data. Good for useful but unnecessary information, and for empty elements. Tokens are an exception.
- meaningful element names (like var names)
- DOM compliance
Our first xml document
[class project: write, parse, debug. Simple internal DTD?]
declaring an entity nn|425
- 'entity as macro' or constant.
- Entities can be defined in the document thusly:
<!DOCTYPE entity.example [
<!ENTITY foo "hey, this is expanded from foo">
]>
Note that the internal.subset ("entity.example") above must match the root element of the document!
- example
- try it out on your own petstore document
internal DTDs nn|330-332
- the entity declaration above was placed inside an (admittedly minimal and incomplete) "internal DTD".
- a more complete internal dtd example What's wrong with this picture?
- Then, check validity: freeware X* Validator
or Msoft validator if available.
Homework
Leaf through the upcoming sections on
- DTD (Document Type Declarations): pp97-108|320
- entities: p151-155|423-425
http://www.mousetrap.net/syllabus/xml/day1.html
$Id: day1.orb,v 1.8 2002/04/28 17:20:28 mouse Exp $
Remember, your login is based on your machine's hostname, not on any other number.
~/[initials] refers to the subdirectory under your homedir, named after your initials. Everything except for .dotfiles will be stored in your ~/[initials] directory.