squeak!
Syllabus Homepage
Course Overview
Course resources
Day 1
Day 2
Day 3
Day 4
Common errors
Internet Glossary
About Your Instructor
Credits: This site powered by the vi text editor, apache webserver, perl scripting, and Debian linux.
squeak!

XML - Day 1

Admin

Texas state survey and Roll.

Goals

After today's session, the successful learner will be able to:
  • understand the relationship between SGML, HTML, and XML
  • identify and use the parts of an XML document
  • construct a well-formed XML document
  • parse an XML document

Reality check!

class makeup

the buzz

  • xml as wundermarkup
  • xml as replacement component for EDI
  • xml as component of web services: (java/windows context) http carrying SOAP/XML-RPC, described by UDDL, listed in registries.
  • problems: standards? security/authentication? micropayments?

XML, SGML, and HTML

  • SGML (Standard Generalized Markup Language) - metalanguage for system-independent text processing. 60s IBM, 1986 intl standards.
    • SGML application: HTML (HyperText Markup Language) - content-related but used primarily for presentation.
    • SGML application: XML (Extensible Markup Language) - extensible, object-oriented, easily parsed, distributed, international, content-oriented [p.12|12]

Current uses: b2b transfer of data, interapplication communication, "data islands" in non-XML documents, real life example

Inherent challenges

  • most xml does not have a human end-user; there is no concept of a display device, or display markup. We simulate an application that would use the xml content.
  • most xml is not directly written by humans; it is usually generated by an application
  • XML requires you to think abstractly and in an organized/hierarchical manner. Related disciplines: database design, taxonomies.
  • cannot install software locally

  • designed by you (a dozen reserved elements)
  • intelligence is in the document - parser falls over
  • structured/strict! cf HTML
  • How do we write xml? BYOTE
  • How do we see xml? Processors/parsers

Goofy first example

The structure of an XML document

  • xml declaration <?xml version="1.0" standalone="yes/no"?> What symbols are syntactical? [p.28|47]
  • elements (made up of tag[s]) - contain character data or other elements (nested). Case sensitive! Close those tags! aka "information objects" [p.30|28]
    • root element
    • open/closed pairs or empty
    • name them yourselves: naming elements: [a-zA-Z_][a-zA-Z0-9._]*. Case sensitive: Look forward to namespaces.
      a word on style: someElement, SomeElement, some_element, some.element
    • nesting: strict, indent them [p.32|32]
      another word on style and indentation.
    • attributes [p.31|38] - always in the opening tag. More to follow in DTD section.
  • comments - like html, no reflexive nesting, cannot break other elements [p.43|42-43]
  • entity references for amp, apos, gt, lt, quot (character references) [p.45|423]
  • CDATA to hold chunks of non-parsed elemental data
  • <emptytags/>

Design considerations

  • topic
  • audience (some b2b app? a business partner? a web browser?
  • atomic or combined information - e.g. phone numbers, names
  • attributes v. a more detailed structure - atts severely limit the ability to transform and search data. Good for useful but unnecessary information, and for empty elements. Tokens are an exception.
  • meaningful element names (like var names)
  • DOM compliance

Our first xml document

[class project: write, parse, debug. Simple internal DTD?]

declaring an entity nn|425

  1. 'entity as macro' or constant.
  2. Entities can be defined in the document thusly:
    <!DOCTYPE entity.example [
    	<!ENTITY foo "hey, this is expanded from foo">
    ]>
    
    Note that the internal.subset ("entity.example") above must match the root element of the document!
  3. example
  4. try it out on your own petstore document

internal DTDs nn|330-332

  • the entity declaration above was placed inside an (admittedly minimal and incomplete) "internal DTD".
  • a more complete internal dtd example What's wrong with this picture?
  • Then, check validity: freeware X* Validator or Msoft validator if available.

Homework

Leaf through the upcoming sections on
  • DTD (Document Type Declarations): pp97-108|320
  • entities: p151-155|423-425


http://www.mousetrap.net/syllabus/xml/day1.html
$Id: day1.orb,v 1.8 2002/04/28 17:20:28 mouse Exp $

Remember, your login is based on your machine's hostname, not on any other number.
~/[initials] refers to the subdirectory under your homedir, named after your initials. Everything except for .dotfiles will be stored in your ~/[initials] directory.


© 1995-2001 jason carr
Distributed under the terms of the GNU Free Documentation License.