HyTime Engine Peer-Peer Protocol

HEP Cats Jam Java in the Digital Library

Neill A. Kipp

August 16, 1997

Copyright 1997, Neill A. Kipp

Slides for this presentation available at http://etd.vt.edu/hep/.

Abstract

In a computer network, monolithic server architectures have a single point of access and can therefore behave deterministically. Unfortunately, lone servers are prone to be heavily loaded and may suffer from single point of failure errors. I propose the HyTime Engine Protocol (HEP) to facilitate communication between peer-peer HyTime Engine document servers in a distributed document delivery system. In the HEP model, HEP servers will collect locally-authored documents and make them available on the network through HEP connections. For requests and responses, HEP servers will communicate using HyTime as the language for interchange.

Herein, I demonstrate the design and implementation of a HEP system in the context of the Networked Digital Library for Theses and Dissertations, noting particularly the applications of architectural forms for the Digital Library as well as the HyTime architectural forms that become commands in the HyTime Engine Peer-peer Protocol: HEP.

Electronic Thesis and Dissertation Submission

Virginia Tech now requires its graduate students to submit their theses and dissertations electronically. This process is beneficial to the students, their faculty, the Graduate School, and the University Library---it saves time, money, and, by placing these Electronic Theses and Dissertations (ETDs) on the Web, improves service to scholars worldwide.

What began as a regional effort to grow a networked digital library of theses and dissertations blossomed quickly into a national one, and is now an international effort to connect scholars with knowledge on the frontiers of modern science, engineering, art, and the humanities. Finally, due to a grant from the Department of Education, the Virginia Tech ETD Initiative has become the international Networked Digital Library of Theses and Dissertations (NDLTD).

Indeed, each ETD is a hyperdocument. Our least complex submission is a node of plain text with its accompanying metadata node. Our most complex have been fully-connected, authored hypermedia documents, with hundreds of nodes, whose total footprint is hundreds of megabytes!

The challenges we face when we implement the NDLTD are considerable. First, each participating library must prepare to receive hundreds of additional hyperdocuments per semester. Next, they must provide search and retrieval services for these documents. They must implement intranet restrictions for documents that await publication in journals, and they must implement hard security for documents that await patent review.

We have proposed a solution based on international standards that implements the NDLTD as an open system of trusted servers. To this end, we have:

Implemented and tested an SGML DTD for ETD submission (ETD-ML), as well as its accompanying documentation [Kipp, 1996],
Designed an ETD workflow system for the Graduate School and Library using SGML documents and Perl scripts (to be implemented in Fall 1997 and shared with all NDLTD members), and
Prototyped a browsing-oriented hyperdocument delivery system as an application of HyTime: the HyTime Engine Peer-peer Protocol (HEP).

Challenges

Three main challenges face our project team. First, five hundred documents per school per semester will be arriving in the NDLTD. Second, digital library technology can be extended to support a browsing interface, rich with metaphor, to implement cascading tables of contents, cards, pages, and even ``places'' [Kirchenbaum, 1997, ETD in progress]. Foremost, our potential user base is diverse, remote, and financially challenged.

The interface to the NDLTD, therefore, for the common user and for potential authors, must be easy to use, have low adoption hurdles, and be useful to browse the distributed ETD collection. It must be free, it must be friendly, and it must be fun. For the administrator, the digital library system must be portable across architectures, have low impact to implementation, be configurable to serve any electronic collection, and must implement a distributed digital library.

First Eureka. As all good software is written under a concise paradigm, we chose to implement our digital library using the barest of physical library metaphors: READ the BOOK. In digital library terms, this maps to BROWSE the HYPERTEXT. And in the terms of the object programming paradigm, this maps to SEND MESSAGE to OBJECT. In the digital library, to do anything to an object, the user sends a message to that object. (Note: SGML documents are not programs, of course. In our implementation, the HEP server intercepts messages to documents and assigns them to be processed by the appropriate delegate.)

In keeping with this paradigm, and our requirements for a no-pain solution and freely distributable implementation, we chose to use SGML for documents and HyTime architectures for any hypermedia, and the standard array of multimedia types. To this end, we crafted a HyTime DTD for the documents that tie the collection together. Furthermore, we chose to use a HyTime encoding for the messages we send to documents. For these reasons, HyTime Engine software must be present when sending and receiving messages, and therefore we call the method of interchange the HyTime Engine Peer-peer Protocol (HEP).

Engineering comprimise. Because we want the lowest possible adoption hurdles, we opted to use the installed base of client-side HTML browsers instead of inventing, porting, distributing, and maintaining software for our own browsers. For this reason, we implemented most of the HEP under the Common Gateway Interface (CGI) of the ubiquitous HTTP. Indeed, where applicable, we have implemented HEP as a layer over HTTP.

[Note: In a Web-year or so, when text-only connections have receded past two standard deviations from the mean, then we plan to offer Java-based delivery, have HEP servers that cooperate with HTTP servers, and avoid CGI altogether. Having Java in place also means that we can implement and test a variety of hypermedia interfaces, including FCS support.]

In CGI, users send URLs and receive HTML pages. In our paradigm, however, we want users to send HyTime microdocuments and receive HyTime documents. Unfortunately, not even a small SGML document will fit inside a URL. How, then, do we squeeze an entire HyTime document into the space of an ordinary URL?

Second Eureka. Fortunately HyTime is a semantic encoding. We have retained its semantics while reconstructing its syntax for the purpose of layering HEP over HTTP. Using this alternate encoding (and associated functions in our HEP server: hytime2url and url2hytime) we can transmit HyTime microdocuments as URLs.

The design is simple. We have implemented a HEP server as a CGI script (available through ubiquitous HTTP). When the client sends a message to the document, the HEP server intercedes and formats HTML for the user to browse. Furthermore, the HTML that the client receives may contain potential messages (encoded as regular hyperlinks) to other HEP servers.

The output of the HEP server to the HEP client is obtained by a trivial translation of the more semantic-oriented, server-side SGML document type (hep.dtd) into display-oriented HTML. Certain relevant constructs deserve careful attention: ilink, fcs, and location ladders.

Cascading Tables of Contents (ilink)

Hierarchical decomposition is a time-honored method for information organization. Indeed, it is the backbone of organization for ftp and gopher, pre-Web applications. Most Web pages are related collections of links to other Web pages.

The HyTime independent link (ilink) construct gives the architecture for the construction of a table of contents link (toclink) and gives authors the ability to encode its semantics.

Our HEP implementation uses authored ilink information to provide automatic grounding (a usability requirement), even when the target of the ilink is served by a different HEP server than the one on which the ilink itself resides. The automatic grounding is derived from the anchor roles of the ilink element and appears as a row of buttons under the main document space on the user's browser. This automatic feature (based on the ilink architecture) localizes the encoding that creates the connection and thereby minimizes the possibility that any of these links will become stale (404).

Cascading TOCs in the HEP will implement the browsing connections of ETDs across the NDLTD. Furthermore, cascading TOCs, as a HEP construct, can implement the navigation semantics of authored topic maps from the CApH, [Biezunski and Newcomb, ongoing].

HEP supports arbitrary independent link (ilink) and contextual link (clink) elements.

Example ILINK

Below is an example encoding of an ilink with its corresponding formatting.

   <toclink
     id = index
     linkends = "titles authors majors"
     anchrole = "Index Titles Authors Majors"
   >
     <dl>
       <dt><endterm>Titles</endterm>
       <dd>Browse by Title here!

       <dt><endterm>Authors</endterm>
       <dd>Browse for all the Authors here!

       <dt><endterm>Major</endterm>
       <dd>Browse by Major subject here!
     </dl>
   </toclink>

This would be formatted as:

Titles: Browse by Title here!
Authors: Browse for all the Authors here!
Major: Browse by Major subject here!

Finite Coordinate Spaces

The HEP server supports FCS translation into HTML provided the following are true. The FCS must be two-dimensional, be encoded in virtual space units, have monospaced extlists (i.e., the second marker of each pair must be 1). For added flair, we have allowed a color backgrounds attribute on any event in any schedule. We map the FCS schedules on the server and translate the FCS into an HTML table for client delivery. Judging by the ease of development of the FCS formatter prototype, a Java applet implementation of an FCS formatter will be much less complex than writing a flowing hypertext widget.

Please see the prototype (soon to be running on http://etd.vt.edu/hep/) for examples of formatted FCSs.

Location Ladders

The example that follows is a HEP transaction involving a pathological location ladder as the target of an independent link (ilink).

Suppose in our collection, we have a document that explores Entomology and Etymology in Early Science Fiction:

and we want to request

third word of the
first grandchild of the
younger sibling of the
named element in the
remote document?

In our message-to-document paradigm, the remote document is the target of the message, and the remaining commands become parameters in the message itself.

Location Ladder Implementation

<!ENTITY ebugs SYSTEM "bugs.hep" NDATA hep>
....

<link goesto=3rdword>Note how Heinlein has used the term `grok' in
this story about bugs.</link>

<nameloc id=remote>
<nmlist docorsub=ebugs>
buglist
</nmlist>
</nameloc>

<relloc id=nextsib locsrc=remote relation=ysib>
<marklist>
1 1 
</marklist>
</relloc>

<treeloc id=1stgrand  locsrc=nextsib>
<marklist>1 1 1</marklist>
</treeloc>

<dataloc id=3rdword  locsrc=1stgrand quantum=norm>
<marklist>
3 1
</marklist>
</dataloc>

Dataloc

<dataloc id=3rdword  locsrc=1stgrand quantum=norm>
<marklist>
3 1
</marklist>
</dataloc>

as a URL is

d=norm_3_1

Treeloc

<treeloc id=1stgrand  locsrc=nextsib>
<marklist>1 1 1</marklist>
</treeloc>

as a URL is

t=1_1_1

Relloc

<relloc id=nextsib locsrc=remote relation=ysib>
<marklist>
1 1 
</marklist>
</relloc>

as a URL is

r=ysib_1_1

Nameloc

<nameloc id=remote>
<nmlist docorsub=ebugs>
buglist
</nmlist>
</nameloc>

as a URL is

n=buglist

Location Ladder Packing

The same location ladder, connected and delimited becomes the following URL:

bugs.hep?n=buglist&r=ysib_1_1&t=1_1_1&d=norm_3_1
Decrypt:
- entity (bugs.hep, the target of message)
- named location (buglist)
- relation location (younger sibling)
- tree location (first grandchild)
- data location (3rd word)

This allows all read-only client-server HEP traffic to layer on top of the well-established protocol, HTTP. As a result, we need not write and distribute a special HEP browser; any HTML browser will do.

Peer-peer communication

HEP servers need not rely on HTTP for their interchange. Therefore, in the case of distributed anchor notification, the following HEP message may be sent to a remote document, notifying it that it is linked, and the unique identifier of the link itself (for return traffic). The HEP server on the remote system intercedes and applies the message to the target document, in this case, "etd/1997/fall/index.hep."

    <!DOCTYPE hep SYSTEM "hep.dtd" [
      <!NOTATION hep SYSTEM >
      <!ENTITY sourcedoc 
          SYSTEM "hep://nkipp.async.vt.edu/hep/paper.hep" 
          NDATA hep>
    ]>
    <hep>
    <hepmessage fn=append target="etd/1997/fall/index.hep">
    <heplink goesto="source">
    HyTime Engine Peer-peer Protocol
    </heplink>
    <nameloc id=source>
    <nmlist docorsub=sourcedoc>
    </nmlist>
    </nameloc>
    </hepmessage>
    </hep>

The HEP server inserts the contents of the hepmessage element into "index.ank," which is an automatic subdocument of the target "etd/1997/fall/index.hep."

In this way: (1) IDs in the subdocument do not pollute the document namespace, and (2) data in the subdocument does not confuse the regular HEP communication with the client application. The application asks for the anchor information through the HEP API, much in the same way that HTTP requests can subsequently collect inline GIF and JPEG images (with include-me-style hyperlinks).

The application must request the "linked-by" information from the HEP subdocument. It may format any links it receives and merge them into the main page when it delivers to the user, or it may provide "linked-by" information only upon request (footer link, margin anchors, even popup boxes [in Java implementations]).

Note that the resolution of linking in this version is on the quantum of "document" (the nmlist is empty). In the future, we hope to narrow the resolution of the linked-by information to the quantum of "SGML element."

Look to the Future and a Summary

The future of HEP is to implement a Java applet as a browser plug-in. The applet can connect to the HEP server directly, with no need for HTTP and URL encodings. In turn, this will allow a far better FCS implementation. HEP servers of the future will accept HEP connections and can track flow statistics and cross-system browser behavior. They will keep server-side state information to help optimize the connection, particularly so that they may implement security for electronic commerce.

Herein, we have discussed the HyTime Engine Peer-peer Protocol (HEP). Our paradigm is that the user sends a message to the document. Our ilinks are formatted so that the endterms are the hotspots and the anchor roles provide necessary grounding for users so that they do not become lost in hyperspace. We have crafted the necessary translation to squeeze HyTime microdocuments into URLs. Finally we see how HEP servers can chat by exchanging HyTime documents to help implement the distributed digital library, particularly for the Networked Digital Library of Theses and Dissertations.

Biography

Mr. Kipp leads the software team for Virginia Tech's Electronic Thesis and Dissertation (ETD) Initiative and its Networked Digital Library of Theses and Dissertations (NDLTD) Project while he seeks a Doctor of Philosophy degree in Computer Science.

Under the direction of Professor Edward A. Fox, Kipp's studies concentrate on electronic publishing, hypermedia, digital libraries, information retrieval technology, and human-computer interaction. While a student at Tech, Kipp presented an expert-track paper at the SGML'96 Conference and appears with Fox in D-Lib magazine. He has several articles in TAG: The SGML Newsletter covering HyTime, SGML, and the electronic publishing industry.

Kipp's current research includes the continuing development of the Electronic Thesis and Dissertation Markup Language (ETD-ML), the Slides Markup Language (SliML), and the HyTime Engine Peer-peer Protocol (HEP)---all are applications of SGML and HyTime.

Prior to moving to Blacksburg in 1995, Kipp served as Vice President of Software Development at TechnoTeacher, Inc., in Rochester, New York. He was a member of the HyTime committee, directed development of TechnoTeacher's `HyMinder' HyTime Engine product and served as HyTime consultant to the U.S. Navy's Metafile for Interactive Documents (MID) project. Kipp also presented papers at the first two International HyTime Conferences. With Steven Newcomb, President of TechnoTeacher, Kipp appears in the November 1991 Communications of the ACM.

In addition to having both Master's and Bachelor's Degrees in Computer Science from Florida State University, Kipp worked for the FSU Center for Music Research on the Standard Music Description Language (SMDL), the scheduling module of which became HyTime. Kipp's Master's project implemented a prototype processor of SMDL, translating a single-source music encoding into the visual and aural domains.

Kipp's personal interests include performing jazz saxophone, writing and reading modern novels, and gourmet cooking.

These slides were formatted by tag2html on
Sat Aug 16 14:19:19 EDT 1997,
using the Slides Markup Language (SliML) developed by Neill A. Kipp.