Dear Learner About Inheritable XML Architectures,
                   Architectural Forms (AFs),
                   Meta-DTDs, 
                   and all that kind of stuff.

If you're looking for an explanation of the syntax of PI-based
"Base Architecture Declarations", search below for the string:

   Introduction: How to declare a base architecture

I regret that the following material doesn't hang together well, and
it's definitely in the wrong order.  It would have to be rewritten
completely to put it in a "right" order, I think.  But, even so, it
should be orders of magnitude more understandable than the ISO
standard that standardizes all this stuff.  The standard has no
examples!  (Even an example with mistakes, and these examples do have
mistakes, is better than no example.  Here's a cheery thought, though:
You can consider it an exercise for you to fix the errors!)

The materials below were designed to help people understand why AFs
are important to the future of XML as a business-to-business
information interchange standard.  Therefore, I tried to use XML
syntax, rather than SGML's less restricted syntax, although I wasn't
completely consistent about it.  In terms of AFs, the biggest impact
is that I consistently used processing instructions to declare
architectures (per ISO JTC1/SC34's N1957 amendment to 10744:1997)
rather than the NOTATION declarations normally used in SGML.

Have fun!

Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com
http://www.techno.com

By the way, I do assert ownership of this material, such as it is,
including copyright.  You have my permission to download it, read it,
use it in your work, and to prepare derivative work from it.

Copyright (C) 1997-2000 Steven R. Newcomb

******************************************************************************

Here's an eye-opening AF-learning mental exercise: 

  In any XML or SGML element, in any XML or SGML document instance,
  the generic identifier (i.e., the element type name) is the value of
  the one-and-only nameless attribute.  This attribute is #REQUIRED
  to have an explicit value in all element start tags.  In other
  words, the generic identifier is an attribute value like any other,
  except for some additional privileges, and the fact that there's no
  name for the attribute that it's the value of.

If that little goodie sank in properly, the rest is easy:  

  Other generic identifiers, defined in many (other) DTDs, can be the
  values of other attributes.  We call such attributes "architectural
  form name attributes".  Their values are "architectural form names".
  The "architectural forms" to which such names refer are <!ELEMENT
  type definitions in those other DTDs; the architectural form names
  are the corresponding element type names in those other DTDs.  The
  DTDs that are used as such vocabulary resources are called
  "meta-DTDs".

Many claim that generic identifiers are not attribute values, nor can
they ever be considered attribute values.  This is incomprehensible to
me.  If the generic identifier is not an attribute of an element, what
is it?  It certainly isn't content.  Maybe these people just need to
look up the word "attribute" in the dictionary.  If elements were
people, we would call their generic identifiers "family names" or
"surnames" or "last names", and we would call their unique identifiers
"given names" or "first names".  So a surname is just as much an
attribute as a given name is, right?  An element that says it conforms
to an architectural form is like an Asian American who has an English
name in addition to his actual family name and actual given names.
The architectural form name is a name for use in a certain processing
context.

The syntax of architectural forms does not constrain the generic
identifier (the element type name) in any way; indeed, the generic
identifier is pretty much ignored, for purposes of architectural
processing.  As far as architectural processing is concerned, the main
purpose of the generic identifier is to provide a hook for markup
minimization.  The generic identifier is relegated to a role in which
it serves as a kind of macro call: it brings in the default values of
all the attributes declared in the DTD, if any, for that element type,
as we'll see shortly.

********************************************************************************

Each architecture is referenced by means of an attribute name, and the
value of that attribute is the name of the element type within the
architecture to which the element is claiming both syntactic
conformance and semantic equivalence.  In other words, considering
the Dublin Core (DC) as an architecture, a <Creator> could be
identified as follows:

   <foo DC="Creator">...</foo>
  
A single element can claim conformance with more than one
architecture:

   <foo DC="Creator" LCCC="Author">...</foo>

The following is a digression (but nonetheless a significant
digression) about markup minimization: the above looks pretty verbose,
and, given the reasonable expectation of decentralized control over
architectures, and the increasing need for documents to be
useful in a variety of contexts, verbosity may get a lot worse.  For
example:
  
   <foo DC="Creator" LCCC="Author" DEA="Officer" NAWCAD="TextAuth"
   USGS="Surveyor" Ford="ietmAuthor" Paramount="Creator">...</foo>
  
We can completely conquer this verbosity by using a DTD to cause all
the architectural form attributes to be present and to have the
necessary values by default, for all instances of the element type
"foo":

  <!ELEMENT foo - - ( whatever )>
  <!ATTLIST foo
     DC         NAME   "Creator"
     LCCC       NAME   "Author"
     DEA        NAME   "Officer"
     NAWCAD     NAME   "TextAuth"
     USGS       NAME   "Surveyor"
     Ford       NAME   "ietmAuthor"
     Paramount  NAME   "Creator"
  >

Now the same element instance can be expressed as:

  <foo>...</foo>

and still be processed in terms of all those different architectures
in exactly the same way, because all the architectural form attributes
are still implicitly present, and they will be reported by the parser
as if they were explicit.

   (Note: XML documents that do not have DTDs cannot take advantage of
   this technique, but they can still take full advantage of the
   architectural form paradigm.  The only difference is that such
   documents must specify, in each element instance, all the
   architectural form attributes needed to process that element in
   terms of all the desired architectures.  As we have just seen,
   doing without a DTD can make documents that use architectural forms
   extremely verbose.  It's exactly like the question of whether

   (a) to store a PostScript document with fonts that describe each
       glyph's curve set, and then reference the glyphs whenever they
       are to be used, or

   (b) to store each glyph as an explicit set of curves.  

   If the document contains only a dozen characters, it may be more
   sensible not to include the font(s) from which they were selected,
   and simply to be explicit about the curves that make up each glyph.
   If the document contains many characters, a huge efficiency
   advantage is gained by including the font and referencing the
   glyphs in the font by means of the characters.  Similarly, if we
   include a DTD with our document, we can, in effect, reference any
   number of attributes and their default values simply by uttering an
   element's generic identifier (<foo>, in the example above).  If we
   have a lot of elements in our document, using a DTD offers a big
   efficiency advantage.  But it's not strictly necessary to use a
   DTD.

   It should also be noted that it's not strictly necessary to include
   a DTD with every document, even if you wish to use one.  It's only
   necessary that the recipient of your document also have a copy of
   the same DTD (or something with equivalent ability to drive the
   parsing process) that you intend the document to be used with.
   Again, it's exactly like the situation with fonts in PostScript:
   you don't have to include the font in a PostScript document if you
   know that the recipient's printer has that font already inside it
   (or can load it).)

It is also not always necessary to be explicit, even in a DTD, about
all the architectures to which an element conforms, if one
architectural form is already a subtype of another.  For example, we
can take advantage of the fact that, in the NAWCAD architecture, the
"TextAuth" architectural form (remember that "architectural form" ==
"element type") is declared in the NAWCAD architecture as a subtype of
the "Creator" architectural form in the "DC" architecture:

    Assuming that in the NAWCAD architecture's DTD:

    <!ELEMENT TextAuth - - (whatever)>
    <!ATTLIST TextAuth
       DC  NAME  #FIXED  "Creator"
    >

...then every NAWCAD <TextAuth> is by definition also a DC <Creator>.

********************************************************************
********************************************************************
**  In the architectural form paradigm, the rule is: An instance  **
**  of an element that claims conformance to any architectural    **
**  form may not violate any of the constraints on the            **
**  architectural form to which it presumably conforms.           **
********************************************************************
********************************************************************

   * The rule applies to an element's *context*, in that no element
     can appear where its architectural context (the architectural
     forms of its surrounding elements) would not allow it to appear.

   * The rule applies to an element's *content,* in that no
     architectural elements can appear inside it unless those
     architectural elements are permitted by the architecture.

     (Note: the above are two aspects of the same idea: that an
     element's content must be consistent with all of the
     architectural forms to which it declares conformance.  If you
     have guessed by now that the document element must always conform
     to the architectural forms of the document elements of all the
     architectures used in the document, you guessed correctly.)

   * The rule also applies to the element's *attributes,* in that any
     attributes that are required by the architecture must be present
     in the element instance, and if they are not required and not
     present, they are assumed to have their architecturally-defined
     default values and/or #IMPLIED effects on applications of that
     architecture.  If there are attributes present that do not appear
     in the architecture, they are ignored.  The presence of such
     non-architecturally-defined attributes is regarded as implying
     additional constraints, but not as violating any existing
     constraints.  No architecture has the authority to prevent
     additional, non-architectural attributes from appearing on
     elements.  From each architecture's perspective, the attributes
     that are present but not defined by the given architecture are
     invisible.

   * Finally, the rule applies to any *other constraints* on element
     content and attributes, even if they cannot necessarily be
     detected by a generic parser.  These are detectable by any
     validating semantic processor engine for that architecture.  For
     example, the HyTime varlink architecture (from which XLink was
     derived) does not allow the number of anchors to exceed 2 unless
     the "manyanch" option is supported and is specified with no value
     or a value greater than "2".  No generic parser can check the
     conformance of an element to this constraint, but a validating
     XLink or varlink architecture processing engine can.  When we
     consider the boundless variety of architectures, we must admit
     that there is probably a boundless variety of such constraints,
     and the best way to handle them is to relegate all
     architecture-specific constraint checking to a re-usable engine
     for that architecture.)

Since the above NAWCAD DTD fragment constrains all NAWCAD <TextAuth>
elements to conform to all the constraints and requirements of DC
<creator> elements, it is therefore unnecessary to mention the "DC"
architectural form attribute in the <foo> element, because it is
already there!  By definition, a subtype always conforms to the
constraints and requirements of its supertype(s).

In a NAWCAD-oriented application, <foo>'s "NAWCAD" attribute means not
only that our <foo> element can be extracted into a valid NAWCAD
document as a valid <TextAuth> element, but also that it can be
extracted into a valid DC document as a valid <Creator> element.  (In
the jargon of the SGML Extended Facilities, we say that the output of
the parser, conceptually speaking, includes a "grove" -- a parse tree
-- for each of the architectures used by the document.  There is no
requirement that any application actually produce groves; groves are a
concept developed to explain, in abstract terms, the effects of
parsing, processing, and component addressing.)

********************************************************************************

What do we do about the content of an element whose semantic is
borrowed from one architecture, when its content's semantics are
borrowed from one or more other architectures?"  In the architectural
forms paradigm, this question becomes, "What is the containing
element's role in the contained elements' architecture, and/or what
are the contained elements' roles, if any, in the containing element's
architecture?  In the architectural forms paradigm, any element
instance can play several distinct and unambiguous roles in as many
distinct architectures, so it becomes possible for the contained
elements not only to have NAWCAD-defined semantics, but also DC
semantics, too.  In fact, all the elements can have a role to play in
every architecture, provided that when, conceptually speaking, each
architectural instance is extracted from the document, it meets the
structural and semantic constraints imposed by its architecture.

There is more than one way to handle the puzzle, but first, let's see
what happens if we don't take advantage of anything of the special
facilities of architectural forms.  In the following example:

  <auth DC="Creator">
    <authInfo RDF="Description">
      <persName IBMPerson="Name">Bob Schloss</persName>
      <email IBMPerson="Email">schloss@watson.ibm.com</email>
    </authInfo>
  </auth>

the <persName> and <email> elements are not architectural with respect
to the RDF architecture.  From the RDF architecture's perspective,
therefore, the <authInfo> element looks like this:

    <Description>Bob Schlossschloss@watson.ibm.com</Description>

In other words, the markup of the contained non-architectural elements
has been deleted altogether, leaving Bob Schloss with a very strange
surname, indeed.

   (Digression: Why does it work that way?  It's because, in the case
   of mixed content (which is not the situation in our puzzle
   example), the deletion of non-architectural markup still leaves the
   data in pretty good shape.  For example:

    <authInfo RDF="Description">Bob Schloss's e-mail address is
    <email>schloss@watson.ibm.com</email>, but you can also use
    <email>rschloss@us.ibm.com</email>.</authInfo>

   becomes, from RDF's perspective:

    <Description>Bob Schloss's e-mail address is
    schloss@watson.ibm.com, but you can also use
    rschloss@us.ibm.com.</Description>

   To handle cases other than mixed content, there is no one algorithm
   that can be automatically applied in such a way as to give
   universally acceptable results.  In any case, no such algorithms
   are built into the SGML Extended Facilities.)

Probably the best way to handle the puzzle of how to make the
<Description> element get back a simple string is *not* to give it
a simple string, but instead to make the contained elements meaningful
in RDF terms, as well as IBMPerson terms.  For example:

 <auth DC="Creator">
   <authInfo RDF="Description">
     <persName IBMPerson="Name" RDF="PersonName">Bob Schloss</persName>
     <email IBMPerson="Email" RDF="PersonEmail">schloss@watson.ibm.com</email>
   </authInfo>
 </auth>

Note that in the above example, I've taken the liberty of equipping
the RDF architecture with the architectural forms <PersonName> and
<PersonEmail>.  Obviously, very few people will have the authority to
do any such thing, so I'm assuming that the creators of RDF
anticipated this particular need and provided these architectural
forms, and all I needed to do was reference them.  I can do that
without affecting the usefulness of my references to the <Name> and
<Email> forms of the IBMPerson architecture; again, in the
architectural forms paradigm, any element instance can conform
explicitly to architectural forms in more than one architecture.

Now let's imagine that the RDF architecture provides a <PersonName>
architectural form, but not a <PersonEmail> form.  We're still ok,
because now, from an RDF architectural perspective:

      <authInfo RDF="Description">
        <persName IBMPerson="Name" RDF="PersonName">Bob Schloss</persName>
        <email IBMPerson="Email">schloss@watson.ibm.com</email>
      </authInfo>
  
becomes:

    <Description><PersonName>Bob Schloss</PersonName>
    schloss@watson.ibm.com</Description>

... and this leaves our RDF engine in a position to at least
distinguish between some well-understood data and some raw data, in
mixed content.  At the very least, the boundary between the data
contents of the two contained elements has been preserved.

Now let's imagine that there is neither a <PersonName> nor a
<PersonEmail> in the RDF architecture, and that the string

    Bob Schlossschloss@watson.ibm.com

is unacceptably Delphic as the content of an RDF <Description>.  What
can we do?

One way to handle the problem is to ignore, from an RDF perspective,
the data content of all but one of the contained elements.  For
this, we must turn to one of the deeper facilities of the AFDR: the
"ArcIgnD" (architecture ignore data) architectural control attribute,
which allows us to prevent the data content of an element (i.e., the
data consisting of all of its leaves in the parse tree) from being
considered to be part of the document, from the perspective of any
particular architecture.  If, for example, we wanted to ignore the
<persName> element's content for all purposes of RDF processing, we
could say:

  <authInfo RDF="Description">
    <persName IBMPerson="Name" RDFIgDat="ArcIgnD">Bob Schloss</persName>
    <email IBMPerson="Email">schloss@watson.ibm.com</email>
  </authInfo>

From an RDF perspective, the above looks like this:

  <Description>schloss@watson.ibm.com</Description>

To explain the above example, the following is a digression about
"architecture control attributes", and how they are being used in the
above example.

The names of all "architectural control attributes" used to control
architectural processing in any document instance are declared in
certain special processing instructions (see "References" below).
There is one processing instruction per architecture.  Each such
processing instruction identifies the architecture, and provides,
among other things, the names of the architectural control attributes
whose values will control the architectural processing of each
element.  The most basic attribute is the "Architectural Form
Attribute", examples of which have appeared in most of the above
examples (as the "DC", "RDF" and "IBMPerson" attributes).  We have
been assuming, in the above examples, that in our document, the RDF
architecture's architectural control attribute's name is "RDF".
However, it could have been any XML name.  Similarly, we have been
assuming that the Dublin Core architecture's architectural form
attribute name is "DC", and the IBMPerson architecture's is
"IBMPerson".

Another architectural processing attribute that can be declared in the
same processing instruction is the "Architecture Ignore Data"
attribute.  In our above example, we are assuming that for the RDF
architecture, in this document, the name of the "Architecture Ignore
Data" attribute has been declared in the relevant processing
instruction to be "RDFIgDat".  In the above example, the value
"ArcIgnD" is an ISO-defined string that means "data is always
ignored."

   (Note: The other possibilities are:

    "nArcIgnD", which means that data is not ignored, and it is an
                error if data occurs where the architecture does not
                allow it, and

    "cArcIgnD", which means data is conditionally ignored (data will
                be ignored only when it occurs where the architecture
                does not allow it.)

If all this seems rather complex, please remember that there is no
requirement that anyone use the "architecture ignore data" attribute,
but it's nice that it's there when it's really needed and nothing less
will do.

There are other architecture control attributes, and there are still
other things that can be declared in the processing instructions that
define architecture control attributes.

(Here ends the digression about architectural control attributes.)


********************************************************************************
Some references
********************************************************************************

Architectural Forms / (Multiple) Inheritance ("Architectural Form
   Definition Requirements" or "AFDR"):
   http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.3.html
   This standard is being amended to provide for XML's use of
   architectural forms by means of processing instructions (which XML
   supports) instead of #NOTATION attributes (which XML does not
   support).  See http://www.ornl.gov/sgml/wg8/document/1957.htm for
   the details of this amendment.

See http://www.hytime.org for more papers and references.

********************************************************************************
Acknowledgement
********************************************************************************

As the reader may have guessed, this material would not have been
possible without the patient substantive help of Robert J. "Bob"
Schloss of IBM's Thomas J. Watson Research Center.


********************************************************************************


From the perspective of any architecture, non-architectural markup 
simply disappears.


********************************************************************************

Within the processing instruction that declares the names of the
architecture control attributes (etc.) for each architecture, the
architecture itself (or, anyway, its DTD) is referenced by means of
the public identifier for that architecture.  The public identifier
will normally be a formal public identifier (FPI).  Here are some
examples of formal public identifiers for existing architectures:

-//GCA//DTD GCAPAPER.DTD 19980204 Vers 4.0//EN
-//SGML Open//DTD Exchange Table Model 19960430//EN
-//W3C//DTD HTML 3.2//EN

You've probably seen the FPIs for the HTML DTDs, buried in the
following DOCTYPE declarations, on many occasions:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

As in the case of ISBNs, FPIs have registered authorities like W3C,
IBM, etc.  As I understand it, the way in which each authority uses
the third field is pretty much its own business; each authority is
itself responsible for maintaining the registry for the public
identifiers for which it is the authority.

How do you turn an FPI into a file name or (for example) a URL?  The
standards don't say; it's a system-specific thing.  SGML Open,
recently renamed "OASIS", has for some years now promulgated a
standard syntax for tables that perform this mapping, called "SGML
Open Catalogs".  Here's an entry from one in my notebook, used when
I'm running Linux:

PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "/windows/bin/SP/pubtext/ISO/ISOlat1"

 ...and here's an entry for the same entity in the catalog that
I use when I'm running Windows:

PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "c:\bin\SP\pubtext\ISO\ISOlat1"

The system address of an entity to which an FPI resolves can
alternatively be a URL, URN, or URI.


********************************************************************************


Introduction: How to declare a base architecture in an XML document.
--------------------------------------------------------------------

The following is a heavily commented, full-featured architecture
declaration.  It declares that HTML-in-XML is used as a base
architecture, in all possible gory detail, so that we're in a position
to take advantage of the full power of the architectural forms
paradigm.  Don't be alarmed, though; see next section for typical
declarations.  The defaulting rules allow us to be very brief in
normal situations.

Note: according to XML's production for processing instructions (PIs),
there is no such thing as a "PI attribute", per se.  However, the
XML-relevant amendment to ISO 10744:1997 (see
http://www.ornl.gov/sgml/wg4/document/1957.htm) provides for the
attribute name/value -like syntax used below.  There is nothing wrong
with that; the syntax of PIs is application-specific, and the
application here is standards-based generic architecture processing.
Neither XML nor n1957 provides for comment delimiters, either; in
order to make the following explanation as clear as possible, I have
chosen to pretend that they do, so the explanatory text is
distinguishable from the exemplary text.

<?IS10744 arch 

   name="HTML"
         -- The name of this architecture, regarded as a notation.
            Required.  This name will be used by default as:

            * The name of the entity in which the architecture's DTD
              can be found.
 
            * The name of the "architectural form name" architecture
              control attribute (aka the "ArcForm" attribute).

            * The generic identifier of the document element. --


   public-id="-//W3C//ARCHDEF HTML-in-XML 0.000001//EN"
         -- The public identifier of the architecture definition
            document.  This is where all the "other constraints" and
            processing requirements to be fulfilled by validating
            implementations (probably reusable engines) of this
            architecture are documented.  Optional.  --


   dtd-public-id="-//W3C//DTD HTML-in-XML 0.000001//EN"
         -- The public identifier of the formal, machine processable,
            validating-parser-configuring DTD for this architecture.
            Optional.  If no value is provided for this attribute or
            for the dtd-system-id attribute, the value of the "name"
            attribute is taken to be the name of the entity in which
            the DTD is found.  One way or another, there must be a DTD
            for the architecture.  --


   dtd-system-id="/etc/architectures/HTML-in-XML.dtd"
         -- The local system address of the storage object containing
            the architecture's DTD.  Optional.  This information may
            be provided by means of a catalog, or by any other
            system-specific means.  The storage object may be
            addressed as a URL, etc. --


   form-att="html"
         -- The name of the "architectural form name" architecture
            control attribute.  The value of this PI attribute is the
            name of the common attribute whose value is the
            architectural form name (if any) to which each element in
            this document conforms.  In other words, in this case, if
            any element has an attribute whose name is "html", the
            value of that attribute is the name of the HTML
            architectural form (e.g., "DL", "DT", "DD") to which
            that element conforms.  Optional.  If not specified, the
            value of the "name" attribute (the first attribute in this
            list) is used.  --


   renamer-att="htmlNames"
         -- This is the name of the "architectural attribute renamer"
            architectural control attribute.  The value of this
            attribute remaps the names of architectural attributes to
            other names.  (The data content of the element can also be
            remapped by means of this attribute.)  This attribute is
            nice because it allows people to use architectures without
            having the architectures dictate the names of attributes.
            This attribute is essential when an element is a client of
            two architectural forms of two different architectures,
            and the two architectural forms use the same attribute
            name, or content, for different purposes.  Optional. --


   suppressor-att="suppressHtml"
         -- This is the name of the "architecture suppressor"
            architecture control attribute.  It is used to suppress
            and restore architectural recognition for the descendants
            of the element on which it appears.  It has three possible
            values: 

              sArcAll    : Completely and irrevocably suppress all
                           architectural processing of descendants.

              sArcForm :   Don't recognize the architectural forms of
                           descendants, except for elements that have
                           an explicit value for the architecture
                           suppressor attribute.  Continue to
                           recognize common attributes of this
                           architecture, if any.

              sArcNone   : Recognize the architectural forms of
                           descendants.

              (no value) : Inherit the state of the parent element.

            Optional.  The default is to recognize all architectural
            form attributes. --


   ignore-data-att="htmlIgnoreData"
         -- This is the name of the "architecture ignore data"
            attribute.  It controls whether the ultimate data content
            of the element will be regarded as part of the document,
            from the perspective of this architecture.  

            The possible values of the architecture ignore data
            attribute are:

              ArcIgnD  : Data is always ignored.

              nArcIgnD : Data is not ignored, and it is an error if
                         data occurs where the architecture does not
                         allow it.

              cArcIgnD : Data is conditionally ignored (data will be
                         ignored only when it occurs where the
                         architecture does not allow it.)

            Optional.  The default value is taken to be cArcIgnD.  --


   doc-elem-form="html"
         -- This is the name of the architectural form which will be
            regarded as the root element (the "document element") for
            purposes of architectural processing.  In other words, the
            value of this PI attribute has the same effect as the
            first name specified in a DOCTYPE declaration:

                         <!DOCTYPE HTML ...
                                   ^^^^

            Optional.  If it's not explicit, the name of the
            architectural form to be regarded as the document element
            will be taken to be the same as the value of the "name" PI
            attribute (the first attribute described in this list). --


   bridge-form="htmlBridge"
         -- This is the name of the default architectural form.  It
            must be declared as an element type in the architecture.
            The purpose of this form name is to allow elements to be
            regarded as architectural whenever necessary, and when
            they don't conform to any other architectural form.  (It
            might be "necessary" because any element that has a unique
            identifier should be regarded as architectural in all the
            base architectures, in order to guarantee that, from the
            perspective of any architecture, references to those
            unique identifiers will be valid.)  In all cases where an
            element is promoted to de facto architectural status, its
            generic identifier, from the perspective of the
            architecture, will be the value specified for this
            "bridge-form" PI "attribute".  Optional.  If not specified,
            no defaulting of architectural forms will be done. --


   auto="ArcAuto"
        -- If the value is "ArcAuto", no arcform attribute is needed
           on elements whose generic identifiers are already the same
           as their architectural forms.  In other words, the generic
           identifier will be regarded as the name of the
           architectural form iff it is the same as an architectural
           form name in this architecture.  The only other possible
           value is "nArcAuto", which prevents such automatic mapping
           from occurring during process.  Optional.  The default
           value is effectively "ArcAuto". --


  -- options="" 
          The names of additional, architecture-specific PI
          "attributes" in this list that can be used to provide
          architecture-specific parameters to engines for this
          architecture.  Optional.  Default: no such attributes. --

?>




Example 1: A document that uses two base architectures.
-------------------------------------------------------

<?IS10744 arch name="HTML">
<?IS10744 arch name="MathML">
<FOO>
  <DL>
    <DT>Mass-energy conversion equation</DT>

    <DD><VAR>E</VAR><EQUAL/><VAR>m</VAR>
        <ELIDABLEMULTIPLICATION/><POWER><VAR>c</VAR>
        <EXP>2</EXP></POWER>
    </DD>
  </DL>
</FOO>



Architectural Markup Minimization.  
----------------------------------

In Example 1, simply by virtue of the fact that <DL>,
<DT>, and <DD> happen to have the same generic identifiers as their
corresponding architectural forms in the HTML architecture, they
declare their conformance to those architectural forms.  Similarly,
<VAR>, <EQUAL>, <ELIDABLEMULTIPLICATION>, <POWER> and <EXP> are
recognized as conforming to their identically-named architectural
forms in the MathML architecture.

In the case of the document element in the above architecture, it is
automatically regarded as claiming conformance to the document element
form declared for each of the two architectures.  (From the
perspective of the HTML architecture, the document form is <HTML>, and
from the perspective of the MathML architecture, it is <MathML>.)




Mixed-architecture markup
-------------------------


From the perspective of the HTML architecture, Example 1 looks like
this:

<HTML>
  <DL>
    <DT>Mass-energy conversion equation</DT>
    <DD>Emc2</DD>
  </DL>
</HTML>

Here we have lost some important markup boundaries, but this loss is
not critical for purposes of validating the document against the HTML
architecture.  Were it the case that #PCDATA ("Emc2") was not
permitted in <DD> elements, it would be automatically ignored (remember
that the ignore-data-att attribute defaults to "cArcIgnD").



From the perspective of the MathML architecture, Example 1 looks
like this:

<MathML>
<VAR>E</VAR><EQUAL/><VAR>m</VAR>
<ELIDABLEMULTIPLICATION/><POWER><VAR>c</VAR>
<EXP>2</EXP></POWER>
</MathML>

There is no problem here, as long as <VAR>, <EQUAL>, etc.  are
permitted in the content of <MathML>.  I've made the assumption that
#PCDATA is not permitted in the content of <MathML> (which is pretty
likely, I think), so the string "Mass-energy conversion equation" was
automatically ignored.


Conclusion: If your available processing capabilities only include one
of the two architecture engines that are necessary for a full
understanding of the document, the failure is as graceful as it can
possibly be, and the failure modes are as controllable as they can be
by the author of the document.  This is greatly-to-be-desired
goodness, so say I.


There is a third, implicit architecture in effect, in which Example 1
looks like, well, like Example 1.  In other words, the document can be
viewed through the implicit architecture of the original document,
where that architecture uses a combination of other architectures,
along with its own vocabulary of element types (in this case, "FOO"),
to express its information content.

Which of the two engines do I call when I have both engines, and a
single element conforms to architectural forms in both architectures?
I think there is no single answer to this question; it's up to
applications to make such decisions.  One thing seems clear: the APIs
of engines are going to need some serious engineering attention if
they are going to work together in arbitrary ways.  (I think full
arbitrariness is going to come eventually, but slowly, and not for a
few years yet.)  Engines are going to need to call each other, maybe
sometimes in funny ways; I don't know.  Here's an example (the <A>) to
cogitate about:

    <DD><VAR>E</VAR><EQUAL/><A HREF="..." MathML=VAR>m</A>
        <ELIDABLEMULTIPLICATION/><POWER><VAR>c</VAR>
        <EXP>2</EXP></POWER>
    </DD>

In simpler, near-term cases, in which architectures shift back and
forth during traversals of the element hierarchy, engines are going to
have to delegate and re-delegate to each other.  For example, supposed
we wanted to make an HTML hotspot out of the letter "m":

    <DD><VAR>E</VAR><EQUAL/><VAR><A HREF="...">m</A></VAR>
        <ELIDABLEMULTIPLICATION/><POWER><VAR>c</VAR>
        <EXP>2</EXP></POWER>
    </DD>

One scenario for rendering the above example is that the HTML engine
is rendering the <DD>, and it delegates rendition to the MathML engine
when the <VAR> is encountered, only to have the rendition of the
letter "m" re-delegated to it by the MathML engine.

Another scenario is that the application makes all renditional
assignments a priori, and it does not authorize engines to descend the
hierarchy into non-architectural areas.

A third scenario is that the HTML engine merely informs the MathML
engine that whatever else it's doing with the "m", it should also
color it purple.

With architectural forms, the really hard semantic processing problems
are still there; it's just that they are not surrounded by certain
unnecessary minefields.


********************************************************************************

Why can't people understand the SGML Extended Facilities as written
and as standardized by the ISO?

ISO standards are very hard to understand because they describe very
technical things in an abstruse techno-legal vocabulary and
reduced-redundancy style.  In short, despite having great things to
say, even the deathless prose of the HyTime standard tends to be
unreadable and, quite frankly, to suck as informative literature.
(I'm a co-editor of it; may God have mercy on us.)

********************************************************************************

Is the IS10744 PI part of XML 1.0 ?

No, it's a draft amendment to ISO/IEC 10744:1997 (aka HyTime), found
in JTC1/WG4 document n1957.  The PI is an alternative syntax designed
to allow architectural forms to be used with XML 1.0.  (The regular
ISO standard syntax uses SGML NOTATION declarations with #NOTATION
attributes, which XML 1.0 doesn't support.)  It's been approved at the
WG4 level for six months or so, it's not controversial, and so it's
sure to pass the final rubberstamp process.  So, it's a member of the
SGML family of standards, which includes XML under the name "WebSGML".
If W3C chooses not to recommend such use of XML 1.0, it only hurts W3C
and the people who trust W3C's advice.  Many people will use AFs
anyway, just not nearly as many as should.  PIs, DTDs, and their
applications are all user-definable in XML, so AFs are certainly
not illegal in any sense.

However, my take on it is that if W3C adopted the AF paradigm but
wanted to call the PI "dogbert", what we in WG4 would do would be to
amend the standard again and make "dogbert" an alternative identifier
for "IS10744".  We're pretty relaxed about naming and re-naming the
baby; after 13 years of this, our egos are kind of worn down.  We're
pretty relentless about bringing up baby the best way we know how,
though.

********************************************************************************

If you say:

        <?IS10744 arch name="RDF">
 
 ...then there must also be an entity declaration that gives the
system address of the meta-DTD:

        <!ENTITY RDF SYSTEM "http://www.w3.org/RDF/rdf.dtd">

 ...or the Formal Public Identifier (FPI) of the meta-DTD:

        <!ENTITY RDF PUBLIC "-//W3C//DTD Resource Description Framework (RDF) 1.0//EN">

 ...or both:

   <!ENTITY RDF 
     PUBLIC "-//W3C//DTD Resource Description Framework (RDF) 1.0//EN"
     SYSTEM "http://www.w3.org/RDF/rdf.dtd"
   >

However, there is no need for any such <!ENTITY declaration if you put
the necessary information in the PI, using either the system address
of the meta-DTD:

   <?IS10744:arch name="RDF" dtd-system-id="http://www.w3.org/RDF/rdf.dtd">

 ... or its FPI:

   <?IS10744:arch name="RDF" dtd-public-id="-//W3C//DTD Resource Description Framework (RDF) 1.0//EN">

 .. or both:

   <?IS10744:arch name="RDF"
     dtd-system-id="http://www.w3.org/RDF/rdf.dtd" 
     dtd-public-id="-//W3C//DTD Resource Description Framework (RDF) 1.0//EN"
   >

Presumably, the FPI alone should give the receiving system enough
information to find out what is meant by <TITLE> in the case of the
Dublin Core architecture.

There are two possibilities:

(1) If the DC architecture is already a base architecture of the RDF
    architecture, and RDF has a form which is based on DC's <TITLE>
    form, there is no need even to declare or worry about DC in RDF
    document instances; DC's form is already present and accounted for
    in the RDF architecture, so uttering the name of the RDF
    architectural form that is a subtype of DC's <TITLE> is
    sufficient.  (And there is no need for

       DC="TITLE"

    , either, because it's already understood to be there.  In other
    words, DC is already the name of an attribute of the corresponding
    RDF element type, whose value is #FIXED at "TITLE".  Repeating it
    in the document instance is OK, but it's totally redundant to do
    so.)

(2) If the DC architecture is *not* a base architecture of the RDF
    architecture, then

      DC="TITLE"

    is sufficient, because the PI that declares the DC architecture
    references the DC meta-DTD, and presumably this reference will
    allow the receiving application to understand what is meant by a
    Dublin Core <TITLE>.

Digression about FPIs and Web addresses: Webheads generally don't
recognize the distinction between FPIs and web addresses.  Typically
they are just not cognizant of the probability that information that
is on the Web may also have a life (or possible lives) in other
contexts.  In order to allow Webheads to understand what architectures
are all about, my considered opinion is that it's better just to tell
them about system storage identifiers, and just not even mention
public identifiers.  For these guys, the system is always the Web,
and nothing else exists, so there's no reason to use FPIs, and their
purpose is inexplicable.

Digression about meta-DTDs and architecture definition documents
(ADDs): In order to perform any syntactic validation on an XML
document, there must be a DTD and/or one or more meta-DTDs.  However,
even though a document can be shown to conform to one or more DTDs,
such validity provides no warranty of meaningfulness.  That's why
there's a way to point to a document (probably written in some natural
language) that explains the semantics of the architecture: the
"Architecture Definition Document" (ADD).  Ideally, the PI should
reference the ADD as well as the meta-DTD.  That's what the
"public-id" PI "attribute" is for.  Aside from one of our bank
customers and one aerospace customer, I know of no one who bothers to
reference the ADD explicitly in their documents.  The ADD is something
that only a programmer would need, and only when writing an
application that conforms to an architecture.  The application, once
written, would have no use for any ADD.  I think ADDs are something we
don't need to mention to the Webheads, lest they worry that there's
some reason their applications need to access the ADD.  (Anyway, at
least in ISO 10744:1997, there's no "system-id" attribute, so you
can't specify the Web address of an ADD; you can only specify its
public identifier.)  For me it's a quasi-religious issue: I prefer to
make a point of making my information as self-describing as possible,
and referencing ADDs is a very sound and easy way to greatly enhance
its potential usefulness, should it someday arrive in the in-box of
someone living in the Lesser Magellanic Cloud, for example.

********************************************************************************

We don't generally allow a single element to conform to two
architectural forms of the same architecture.  No element can be both
a DC <TITLE> and a DC <CREATOR>.  There is a trick for getting around
this problem when necessary, which is to cause the architecture to
subtype itself.  But only the creator of the architecture can arrange
for that to happen, and it's not a trick for beginners.  Ordinary
people who use architectures created by others just don't get to
create valid elements that are subtypes of two architectural forms in
the same architecture.  It's a Reportable Architecture Error (RAE) to
do that.

********************************************************************************

The AF paradigm allows us to use an attribute value in place of
content.

In order to show how this is done, I need to make some assumptions
about the RDF and DC architectures.  This is because the syntax used
to accomplish the above requires remapping element content to
attribute values, and without knowing where things are, I can't
remap them to new places.

Let's assume that the DC architecture's meta-DTD defines a <CREATOR>
element type:

<!-- Here is the DC architecture. -->
<!ELEMENT DC - - (CREATOR*)>
<!ELEMENT CREATOR - - (#PCDATA)>


 ...and that the RDF architecture's meta-DTD defines a <DESCRIPTION>
 element type:

<!-- Here is the RDF architecture. -->
<!ELEMENT RDF - - (DESCRIPTION*)>
<!ELEMENT DESCRIPTION - - (#PCDATA)>


We can use the renaming feature of the AF paradigm to cause
an RDF engine to regard the value of the "RDFdesc" attribute
as the content of the <DESCRIPTION>, and we can use the
same facility to cause a DC engine to regard the value of
the DCCreator attribute as the content of the <CREATOR>:

<!-- Here is our document, in verbose form. -->
<?IS10744:arch name="RDF" renamer-att="RDFrenamer">
<?IS10744:arch name="DC" renamer-att="DCrenamer">
<mydocument>
  <mydesc RDF="DESCRIPTION" DC="CREATOR" 
          RDFRenamer="#ARCCONT RDFDesc"
          DCRenamer="#ARCCONT DCCreator"
          RDFDesc="This doc tells all about pumpkins."
          DCCreator="Ora Lassila"
  />
</mydocument>


<!-- Same document, using a DTD to reduce verbosity. -->
<!DOCTYPE mydocument [
  <!ELEMENT mydocument - - ( mydesc)>
  <!ELEMENT mydesc - - EMPTY>
  <!ATTLIST mydesc
      RDF        NAME  #FIXED "DESCRIPTION"
      DC         NAME  #FIXED "CREATOR"
      RDFRenamer CDATA #FIXED "#ARCCONT RDFDesc"
      DCRenamer  CDATA #FIXED "#ARCCONT DCCreator"
      RDFDesc    CDATA #IMPLIED
      DCCreator  CDATA #IMPLIED
  >
]>

<?IS10744:arch name="RDF" renamer-att="RDFRenamer">
<?IS10744:arch name="DC" renamer-att="DCRenamer">
<mydocument>
  <mydesc RDFDesc="This doc tells all about pumpkins."
          DCCreator="Ora Lassila"/>
</mydocument>

The grove created by the parser that becomes input to the
RDF engine will correspond to:

<RDF>
<DESCRIPTION>This doc tells all about pumpkins.</DESCRIPTION>
</RDF>


The grove created by the parser that becomes input to the DC engine
will correspond to:

<DC>
<CREATOR>Ora Lassila</CREATOR>
</DC>



********************************************************************************


Documents should be processed on
the basis of what groves turn out to be extractable from them, rather
than on the basis of what architectures they declare explicitly.  In
my mind, this application design rule illustrates a strength of the AF
paradigm, not a weakness.  It means that documents are always as
interpretable as they can be, given local semantic processing
resources, regardless of who created the instances, or who created
their meta-DTDs, or what other meta-DTDs those meta-DTDs were derived
from, or how they were (validly) mixed together.  Thus, AFs allow
complete decentralization of architectural authority, while providing
for perfect reusability of all architectures.

********************************************************************************

The main reason to allow remapping of inherited content and attributes
is to allow us to use two architectural forms from two different
architectures to define the same element at the same time.  Let's say
architecture A is for books and architecture B is also for books, and
we want to create an architecture C that, when documents are created
using architecture C, they can automatically be understood in terms of
A or B or both.

    <!-- arch A -->
    <!ELEMENT A - - ( #PCDATA) -- This is where the author's name
                                  goes. -->
    <!ATTLIST A
        book  CDATA  #REQUIRED  -- This poor little attribute's value
                                   is the whole book! -->



    <!-- arch B -->
    <!ELEMENT B - - EMPTY -- This is a pretty stupid architecture. -->
    <!ATTLIST B
       author CDATA  #REQUIRED 
                          -- This is where the author's name goes --

       stuff  CDATA  #REQUIRED
                          -- This is where the book goes. -->


    <!-- arch C, where we harmonize A and B -->
    <?IS10744:arch name="A" renamer-att="Arenamer" ... ?>
    <?IS10744:arch name="B" renamer-att="Brenamer" ... ?>
    <!ELEMENT BOOK - - ( #PCDATA) -- here is where the book is -->
    <!ATTLIST BOOK
        authname  CDATA #REQUIRED  -- author's name here --
        A         NAME  #FIXED  "A"
        B         NAME  #FIXED  "B"
        Arenamer  CDATA #FIXED "#ARCCONT authname book #CONTENT"
        Brenamer  CDATA #FIXED "author authname stuff #CONTENT"
    >

So here's an instance of a document that uses the C architecture:

    <?IS10744:arch name="C" ... ?>
    <BOOK authname="Margaret Mitchell">There was this plantation
    called Tara...</BOOK>

From the perspective of architecture A, the above instance looks like:

    <A book="There was this plantation called Tara...">Margaret
    Mitchell</A>

From the perspective of architecture B, the above instance looks like:

    <B author="Margaret Mitchell" stuff="There was this plantation
    called Tara..."/>

So, as you can see, one reason why we do this remapping is so we can
create new documents that conform to multiple incompatible existing
architectures.

********************************************************************************

> From: Andrew Layman <andrewl@microsoft.com>
> 
> How does one go about using Architectures to solve the following problem.
> 
> Suppose in version one of my documents, I have instances that look like
> 
> <Book>Gone With the Wind</Book>
> 
> In version 2, I have instances that look like
> 
> <Book>
>  <Title>Gone With the Wind</Title>
>  <Author>
>  <Person>
>  <Firstname>Margaret</Firstname>
>  <Lastname>Mitchell</Lastname>
>  </Person>
>  </Author>
> </Book>
> 
> How do I write my architectures so that the V2 instance is mapped to
> the V1 architecture?

Andrew --

You've asked a good question.  I think it has a good answer.  In order
to explain this, I have to define the V2 and V2 architectures, and
turn your example fragments into complete documents.  Then I'll
discuss what problems arise, and what to do about them.


*************************
** The V1 Architecture **
*************************
<!-- the V1 architecture -->
<!ELEMENT V1 - - (Book)>
<!ELEMENT Book - - (#PCDATA)>


*************************
** The V2 Architecture **
*************************
<!-- the V2 architecture -->
<?IS10744:arch 
   name="V1" 
   dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
>
<!ELEMENT V2 - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>  
        <!-- note: auto name mapping is on, so elements of the above type
             will be regarded as conforming to the V1 <Book> architectural
             form -->
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>


*****************
** Instance I1 **
*****************
<!-- instance #I1 -->
<Mydoc>
    <Book>Gone With the Wind</Book>
</Mydoc>


*****************
** Instance I2 **
*****************
<!-- instance #I2 -->
<Mydoc>
    <Book>
     <Title>Gone With the Wind</Title>
     <Author>
     <Person>
     <Firstname>Margaret</Firstname>
     <Lastname>Mitchell</Lastname>
     </Person>
     </Author>
    </Book>
</Mydoc>


***************************
** Parsing I1 against V1 **
***************************
If we parse I1 against V1, we get a grove that, if it were
re-expressed in XML, would look like this:

<V1>
    <Book>Gone With the Wind</Book>
</V1>

I.e., No problem.  (And no surprise.)  Note that the
document element has automatically become the document
element of the architecture.


***************************
** Parsing I2 against V2 **
***************************
If we parse I2 against V2, we get:

<V2>
    <Book>
     <Title>Gone With the Wind</Title>
     <Author>
     <Person>
     <Firstname>Margaret</Firstname>
     <Lastname>Mitchell</Lastname>
     </Person>
     </Author>
    </Book>
</V2>

I.e., again, no problem.  (And, again, no surprise.)


***************************
** Parsing I2 against V1 **
***************************
If we parse I2 against V1, taking no other measures, we get:
<V1>
    <Book>Gone With the WindMargaretMitchell</Book>
</V1>

Clearly, this is a mess, but it illustrates the principle that, by
default, markup that does not belong in a given architecture simply
disappears, from the perspective of that architecture.  What to do
about the mess, though?

It's reasonable to assume that the person who writes the V2
architecture intends for V2 documents to be usable with V1 browsers
(or other applications equipped with V1 engines).  In other words, we
want the title of the book to become the content of the <Book>
element, as was the case in the V1 architecture, and we want Margaret
Mitchell's name to disappear, since the V1 architecture made no
provision for an author's name.  This can be done as follows:

<!-- the V2 architecture, as amended -->
<?IS10744:arch 
   name="V1" 
   dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
   ignore-data-att="V1IgnoreData"
>
<!ELEMENT V2 - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>  
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ATTLIST Author
    V1IgnoreData  CDATA  "ArcIgnD"
>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>

Note that we have declared that the name of the "Architecture Ignore
Data Attribute" for the V1 architecture is "V1IgnoreData".  When this
attribute appears on an element instance, its value controls whether
the ultimate data content of the element will be regarded as part of
the document, from the perspective of this architecture.  We have also
declared, above, that the V1IgnoreData attribute has a default value
of "ArcIgnD" on instances of the <Author> element.  This means that,
from the perspective of the V1 architecture, the data content of
the <Author> element, and the data contents of all of the elements
that it contains, will be ignored (will disappear).

  Digression: The possible values of any "architecture ignore data
              attribute" are:

              ArcIgnD  : Data is always ignored.

              nArcIgnD : Data is not ignored, and it is an error if
                         data occurs where the architecture does not
                         allow it.

              cArcIgnD : Data is conditionally ignored (data will be
                         ignored only when it occurs where the
                         architecture does not allow it.)

              The default value is taken to be cArcIgnD.


***********************************************************
** Parsing I2 against V1 via the amended V2 architecture **
***********************************************************
If we parse I2 against V1 via the amended V2 architecture, we get:

<V1>
    <Book>Gone With the Wind</Book>
</V1>


Q.E.D., right?

A possible source of confusion is the fact that the V2 architecture
specifies that the V1 architecture is a "base" architecture with
respect to the V2 architecture, which is "derived" from (or is a
"client" of) V1.  The V2 architecture does not specify itself in any
way; it *is* the V2 architecture.  At the risk of belaboring the
obvious, let me say that the only things that specify the V2
architecture are:

* document instances that are clients of the V2 architecture, and

* architectures that regard the V2 architecture as a base
  architecture.

The structure of such V2 client instances and architectures are
defined by the V1 architecture through the lens (so to speak) of the
V2 architecture.




***************************
** Parsing I1 against V2 **
***************************
If we parse I1 against V2, taking no other measures, we get:

<V2>
    <Book></Book>
</V2>

What happened to the title of the book?  It disappeared because the
default value of the ignore-data-att is "cArcIgnD", which means that
when data is not allowed in the content of an element, it will be
ignored.  The V2 architecture does not permit #PCDATA in the content
of <Book> elements, so the data "Gone With the Wind" disappeared
automatically.  If we don't want the data to be ignored, we can force
the data to appear by setting V2IgnoreData to "nArcIgnD".  However,
making the data appear where it's not allowed to appear will create a
parsing validation error, so, if we really need to use the same
meta-DTD for both V1 and V2 documents (we don't), this
solution is not so good.

If we must use the same meta-DTD for both older V1 documents and newer
V2 documents, in order to maintain the upward compatibility of older
V1 documents it would be best, when creating the V2 architecture, to
anticipate this problem as follows:

(1) Allow #PCDATA in the content of V2 <Book> elements, in addition to
    the <Title> and <Author> elements, and

(2) Provide instructions to V2 application developers (in the V2
    Architecture Definition Document [ADD]) indicating that V2
    application engines must expect #PCDATA in <Book> instances, and
    that they must treat such data content as if it were in a V2
    <Title> element.  The ADD might also advise that V2 systems should
    not create documents that put #PCDATA in the content of <Book>
    elements, even though it's allowed there, and that book titles
    should always appear in <Title> elements.


*******************************************************************************

But how can we do all this without a meta-DTD of any kind?

Well, first, a caveat: you can't check an instance for conformance to
a model unless you have both the instance and the model.  So
validation of instances by means of a general-purpose parser is not
possible unless you have a meta-DTD.  

And a second caveat: you can't create an application with an
information-interchange feature unless you have a model for the
information to be interchanged.  So, at some level, there's no such
thing as an architecture without some sort of model, somewhere.

Even if there's no meta-DTD available, however, you can still enjoy
essentially all of the virtues of AFs, assuming you have an engine
capable of recognizing the architectural forms that pertain to it, and
capable of performing the processing required by those architectural
forms.  (Such an engine would probably incorporate at least some of
the logic necessary to validate the forms that it recognizes, in any
case.)  The only really noticeable disadvantage of not having the
meta-DTD handy is that you don't get the markup minimization you can
get from DTDs and meta-DTDs.  This disadvantage would not affect our
instance #I1 at all:

<!-- instance #I1; no change -->
<?IS10744:arch name="V1">
<Mydoc>
    <Book>Gone With the Wind</Book>
</Mydoc>

But it would affect instance #I2 to the extent that we'd have to make
the use of the "Architecture Ignore Data Attribute" explicit in order
for #I2 to be usefully parsable against Architecture V1:

<!-- instance #I2 without meta-DTDs -->
<?IS10744:arch 
   name="V1" 
   public-id="-//Andrew Layman//ADD Andrew Layman's V1 Architecture Definition Document//EN"
   ignore-data-att="V1IgnoreData"
>
<?IS10744:arch
   name="V2"
   public-id="-//Andrew Layman//ADD Andrew Layman's V2 Architecture Definition Document//EN"
>
<Mydoc>
    <Book>
     <Title>Gone With the Wind</Title>
     <Author V1IgnoreData="ArcIgnD">
     <Person>
     <Firstname>Margaret</Firstname>
     <Lastname>Mitchell</Lastname>
     </Person>
     </Author>
    </Book>
</Mydoc>

Note: Just for fun, I used the "public-id" pseudo-attribute to give
the formal public identifiers of the Architecture Definition Documents
(ADDs) of the V1 and V2 architectures.  These documents are not
meta-DTDs (although they may include meta-DTDs) and they are not
directly machine-processable; they are just explanations of the
architectures, probably written in some natural language (these are
declared to be in English: "//EN").  The purpose of declaring them is
merely to disambiguate the architectures we're declaring from any
others that might be called "V1" or "V2".

Final note #1: With AFs, even when we mix many kinds of semantics and
vocabularies into our documents, we can still have the ability to
verify, simply and directly, that any newly created document that uses
an architecture will be reliably processable by any application of
that architecture.  By the same token, anyone creating an application
of that architecture will not face an indefinitely-long list of
possible configurations of the information.

Final, final note: AFs are an elegant general solution to the problem
of recognizing, processing, and mixing all of the semantic facilities
of XML into arbitrary XML documents, including both RDF and XLink, to
name two, with minimal or no cost to the flexibility of other document
architectures.  They also have the effect of giving people other than
the W3C ability to create similar, but totally arbitrary
metastructures of arbitrary complexity, and to use them for reliable
and robust information interchange.  I remain utterly and passionately
convinced that it's MUCH better to have one, strong, general way of
mixing common semantic constructs into structured documents, than to
have several dissimilar ways of doing so.

***

If (as is the case in our example) one architecture is derived from
another, we will need to use the derived architecture as a map to
understand our clients in terms of the base architecture.  

If, on the other hand, our client instance directly specifies two base
architectures, then either of the corresponding meta-DTDs can be used
directly against the instance, thus meeting the requirement "to be
able to use a V1 meta-DTD, without modification, against a V2
instance".  (See new example below: "Two Directly Specified Base
Architectures".)

Either way, all the relevant meta-DTDs collectively drive a standard
generic parsing process, so we don't need to use any
architecture-specific software to create architecture-specific parse
trees.  (If we did, the economics of AFs would make no sense.)  I'm
trying to show that AFs provide a way to view older documents
through newer architectures, and newer documents through older
architectures, gracefully, simply, and reliably, without too much
ugliness, syntax, or software.  Let me explain architectural parsing
in a step-by-step fashion.

Here's the final version of the V2 client (again):

<!-- instance #I2 -->
<?IS10744:arch name="V2"  ... ?>
<Mydoc>
    <Book>
     <Title>Gone With the Wind</Title>
     <Author>
     <Person>
     <Firstname>Margaret</Firstname>
     <Lastname>Mitchell</Lastname>
     </Person>
     </Author>
    </Book>
</Mydoc>

Note that it's a V2 client, and not a V1 client.  If it were a V1
client, it would say so.  Indeed, there is no direct evidence here
that V1 has anything to do with this document.  (It could declare any
number of base architectures, of course.  See "Two Directly Specified
Base Architectures" below.)

However, a standard, architectures-aware parser will gain access
to the V2 architecture's meta-DTD...

<!-- the V2 architecture, as amended -->
<?IS10744:arch 
   name="V1" 
   dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
   ignore-data-att="V1IgnoreData"
>
<!ELEMENT V2 - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>  
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ATTLIST Author
    V1IgnoreData  CDATA  "ArcIgnD"
>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>

...at which time the parser discovers that the V2 architecture is a
client of the V1 architecture.  So, the parser gets the V1
architecture's meta-DTD...

<!-- the V1 architecture -->
<!ELEMENT V1 - - (Book)>
<!ELEMENT Book - - (#PCDATA)>

...which is not a client of any other architecture.  Thus, the parser
knows that there are three (conceptual) groves that are extractable
from this document: a grove of the instance itself, a grove from the
perspective of the V2 architecture, and a grove from the perspective
of the V1 architecture.

Suppose we are running an application that only knows how to deal with
V1 client instances, and it never heard of V2 architecture.  So how
can it know that V2 client instances are processable by it?  The
answer is that it can't, unless the application was originally
designed to be brought online iff a V1 grove appears as a result of
parsing a document.  It follows that documents should be processed on
the basis of what groves turn out to be extractable from them, rather
than on the basis of what architectures they declare explicitly.  In
my mind, this application design rule illustrates a strength of the AF
paradigm, not a weakness.  It means that documents are always as
interpretable as they can be, given local semantic processing
resources, regardless of who created the instances, or who created
their meta-DTDs, or what other meta-DTDs those meta-DTDs were derived
from, or how they were (validly) mixed together.  Thus, AFs allow
complete decentralization of architectural authority, while providing
for perfect reusability of all architectures.  (There are other
features of AFs, which we haven't talked about yet, that permit
syntactic conflicts between architectures to be resolved in client
architectures and client instances.)

Digression: It is important that W3C embraces the ISO AF paradigm (or
            something mighty similar) sooner rather than later.  The
            longer we wait, the more of today's information will not
            be able to participate in tomorrow's mainstream, in which
            many constantly-evolving systems of semantic markup will
            routinely appear within single documents.  How long do we
            want to exclude Web documents from the mainstream of
            civilization's lifeblood, and civilization's lifeblood
            from the Web?  And how do we explain the reason for the
            delay?


*************************************************
** "Two Directly Specified Base Architectures" **
*************************************************

OK, enough soapboxing.  Here's another way to accomplish the same
goal.  The difference is that in the below example, the instance is a
direct client of both the V1 and V2B architectures, and the V2B
architecture is not a client of the V1 architecture.  I made some
other changes just to demonstrate the fact that the AF paradigm does
not tread on the application's generic identifier namespace; this
means not taking advantage of the automatic name mapping feature, and
therefore, there is much more verbosity in the client instance.


*************************
** The V1 Architecture **
*************************
<!-- the V1 architecture, same as always -->
<!ELEMENT V1 - - (Book)>
<!ELEMENT Book - - (#PCDATA)>


*************************
** The V2B Architecture **
*************************
<!-- the V2B architecture. No base architecture. -->
<!ELEMENT V2B - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>  
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>


*****************
** Instance I2B **
*****************
<!-- instance #I2B. Two base architectures. -->
<?IS10744:arch name="V1" ... ?>
<?IS10744:arch name="V2B" ... ?>
<Mydoc>
    <MyBook V2B="Book">
     <MyTitle V1="Book" V2B="Title">Gone With the Wind</MyTitle>
     <MyAuthor V2B="Author">
     <MyPerson V2B="Person">
     <MyFirstname V2B="Firstname">Margaret</MyFirstname>
     <MyLastname V2B="Lastname">Mitchell</MyLastname>
     </MyPerson>
     </MyAuthor>
    </MyBook>
</Mydoc>


Parsing I2B against V1, directly, without reference to V2B, we get:

<V1>
    <Book>Gone With the Wind</Book>
</V1>

At this point, you may ask, "What happened to 'Margaret Mitchell'?,"
because no ignore-data-att attribute appears in the above example.
It turns out that we don't need one.  The PCDATA would have had to
appear after the <Book> element...

<V1>
    <Book>Gone With the Wind</Book>MargaretMitchell
</V1>

...and the V1 architecture doesn't allow PCDATA there.  Because the
default effective value of the ignore-data-att is "cArcIgnD"
("conditional architecture ignore data"), data that appears where it's
not allowed is ignored.  So, I guess we can truthfully say that
Margaret Mitchell is "Gone with the Ignored Data".



Parsing I2B against V2B, we get:

<V2B>
    <Book>
     <Title>Gone With the Wind</Title>
     <Author>
     <Person>
     <Firstname>Margaret</Firstname>
     <Lastname>Mitchell</Lastname>
     </Person>
     </Author>
    </Book>
</V2B>

(i.e., no surprises)

I think the above example would be more realistic if we weren't
thinking in terms of V2 being a revision of V1.  If they were
completely independent architectures, then the above would make more
sense and be more dramatic.  Anyway, this example meets the criterion
of being able to parse the instance directly from the V1 meta-DTD,
without involving the V2 meta-DTD at all.

Since V2 is a revision of V1, we would normally not want to require
users of the V2 architecture to mark up their documents in terms of
the V1 architecture as well as the V2 architecture; we would expect
users to declare V2 and we would expect V1 software to be able to
comprehend V2 documents to the same extent that it could comprehend V1
documents, as shown in the original example in which V2 is derived
from V1.
