squeak!
Syllabus Homepage
Course Overview
Course resources
Day 1
Day 2
Day 3
Day 4
Common errors
Internet Glossary
About Your Instructor
Credits: This site powered by the vi text editor, apache webserver, perl scripting, and Debian linux.
squeak!

XML - Day 2

Admin

Texas state survey and Roll.
Print out today's syllabus

POST

  1. Are these element names legal? If not, why not?
    • HospitalName
    • hosPitalAddress
    • hospital_director
    • _hospitalWard
    • number of beds
    • 9th_floor
    • Floor_7
  2. What information goes on the first line of an xml document?
  3. What is the markup for a comment?
  4. Start a new file in today's directory called day2post.xml. Write a well-formed xml of 5 lines or less (no need to be creative). How do you know it's well-formed?
  5. Leave the parsed xml file on screen to prove you've done it.

Goals

After today's session, the successful learner will be able to:
  • make an external DTD
  • use the DTD to learn more about elements and attributes
  • validate an xml document against a DTD
  • define and use a namespace
  • understand the role of an xml schema

Preliminaries

external DTSs nn|332+

  • external DTDs are reusable
  • a simple example of an xml document with an external dtd
  • external DTDs can be on the same SYSTEM or be PUBLICally available.
  • Declaring elements (nn|339):
    • PCDATA: Text not contained in a child element. nn|340-41
    • general format: <!ELEMENT name content> (p.57|338)
    • content = content-model | EMPTY | ANY.
    • Child elements under ANY still must be declared... (works in progress)
    • a content model is like a "regex" (regular expression)
    • Order (aka "Sequences" nn|341): (foo, bar, baz) element foo THEN element bar THEN element baz
    • Alternation (aka "choice" nn|344): (foo | bar | baz) element foo OR element bar OR element baz
    • Grouping ( (foo|bar),(dag|nabbit) ) nn|341
    • Repetition (aka "Occurrence indicators","cardinality" nn|345):
      • ? = 0 or 1 instance of the element
      • + = 1 or more instance of the element
      • * = 0 or more instances of the element
      (foo+, bar?, baz*) at least one instance of element foo THEN one or no instances of element bar, THEN any number (or no) instances of element baz.
    • mixed-content-models nn|340,342: (#PCDATA | foo | bar | baz)* parseable text, THEN element foo, THEN element bar, THEN element baz.
      Note that mixed-content elements are restricted to this content form.
    • can combine internal and external -- the internal is defined first and overrides
    • Ambiguity example: pp60-61|??
    • difficult to specify numbers beyond the wildcards
  • Hands-on:
    • make an external DTD for the previous .xml document
    • validate it - add a constraint - break it - change the DTD or xml document
    • here are some constraints to work with
      1. elements in a given order
      2. elements from alternation
      3. element using +
      4. element using ?
      5. element using *
      6. element declaration with mixed content (tricky!)
    • a working example with copy of the DTD
  • Declaring attributes nn|349-51:
    • general format: <!ATTLIST element.name attribute.definitions> Why is element's name specified?
    • Attribute defaults nn|356-7
      • #REQUIRED "default text"
      • #FIXED "text" Must be this exactly!
      • #IMPLIED - left up to the receiving app
    • Attribute types
      • strings -
        • CDATA, no built-in entities
      • tokenized - restrict the kind of content
        • ID unique identifier for an element
        • IDREF must be a reference to an existing ID
        • ENTITY/ENTITIES"
        • NMTOKEN/NMTOKENS more strict: [a-zA-Z0-9._-:]
      • enumerated -
        • <!ATTLIST element.name attribute.name (option1 | option2 | etc ) "option2")>
    • Consider this xml fragment:
      <frog color="green"> la la mr. frog</frog>

      and this dtd fragment:
      <!ELEMENT frog (#PCDATA)>
      <!ATTLIST frog color #REQUIRED "spotted">

  • Hands-on:
    1. add an ID to one of your elements. Validate.
    2. add an IDREF to another element. Validate. Verify it breaks if the IDREF does not reference an existing ID.
  • a working example with DTD
  • Parameter entities - DTD only
  • DTD conditionals - pp 136-7.

Namespace in xml p.139|293-301

"Namespace collisions" occur when two objects have the same name. The xml author can control this with use of name prefixes.
  • likelihood of collision in an extensible language
  • re-use of subdocuments
  • namespace as pure extraction
  • namespace:name format (c.f. packages or classes)
  • problem of unique prefices: solution is to relay on a URI
    • URL: uniform resource locator, familiar to you
    • URN: uniform resource name,
    • declare the namespace in your root element (inheritance):
      <mySpace:myRootElement xmlns:mySpace="http://www.myDomain.com/unique"> (explicit namespace definition, allows multiples and mixed default/explicit) example nn|298-299
      or just <myRootElement xmlns:mySpace="http://www.myDomain.com/unique"> (singular default namespace) example nn|300
      cancel namespace with<myChildElement xmlns:mySpace=""> local to that element and children

Schemas

  • DTD is an SGML legacy tool (EBNF grammar) - XML Schema is XML: "schema valid"
  • written in XML
  • element content validation; nn|385-386.
  • able to reuse sections: ElementTypes
  • derived datatypes
  • associating a document with a schema nn|395
  • a partial example
  • a simple xml with its schema that uses Microsoft schema. Note the backwards tree construction!

Reading

  • CSS - p377-394|61-75
  • XSL - 453-477|133
  • XSLT nn|133-140


http://www.mousetrap.net/syllabus/xml/day2.html
$Id: day2.orb,v 1.11 2002/04/28 17:51:33 mouse Exp $

Remember, your login is based on your machine's hostname, not on any other number.
~/[initials] refers to the subdirectory under your homedir, named after your initials. Everything except for .dotfiles will be stored in your ~/[initials] directory.


© 1995-2001 jason carr
Distributed under the terms of the GNU Free Documentation License.