Healthcare Informatics Standards:

An Electronic Health Record Developer's Perspective

(working paper)


Jason P. Williams, MS

Clinical Informatics Analyst

Oceania, Incorporated

Palo Alto, California




Changing healthcare informatics models require a rethinking of fundamental information management practices. As emphasis shifts from management of individual data elements to a management of clinical documents that preserve the clinical narrative, new standards-based methodologies must be developed. As an electronic health record vendor, Oceania is committed to standards-based healthcare informatics solutions in order to preserve quality healthcare. Standards in healthcare informatics include vocabulary and language standards, information technology standards, and information or data content representation standards. This paper will describe Oceania's experience working with a major healthcare informatics standard for exchanging clinical information between systems, Health Level Seven (HL7). HL7 has shown interest in SGML within the last two years, having created a special interest group for SGML to investigate possible uses for SGML in healthcare. That group has served as a springboard for other activities, most notably the creation of the Kona Architecture, a HyTime architectural form for the exchange of healthcare documents encoded in SGML. This paper discusses the learning process of Oceania's DTD development efforts and provides perspective on a real world application of architectural forms. It is hoped that sharing these experiences will enrich the communication between standards development organizations and the creators of standards-based solutions in healthcare informatics.




Changing healthcare informatics models require a rethinking of fundamental information management practices. As emphasis shifts from management of individual data elements to the management of clinical documents that preserve the clinical narrative, new standards-based methodologies must be developed. As an electronic health record developer, Oceania is committed to standards-based healthcare informatics solutions in order to preserve quality healthcare. Standards in healthcare informatics include vocabulary and language standards, information technology standards, and information or data content representation standards. This paper will describe the experiences of working within this standards-based framework by Oceania, Incorporated, an electronic health record developer.



Changing Healthcare Informatics Models


There are two different approaches to building a lifetime electronic medical record for a patient. One approach is to save and store only the abstracted, parsed and elemental bits of data that apply to a patient. In this approach, the context of that data generation and the potential granular descriptors of that data may be less important than is the fact that the data itself was generated. Laboratory studies, patient problems, procedures, medications, and hospitalizations can all live in medical records as independent data elements without any construct other then a temporal relationship, and an astute observer can synthesize a story or framework that fits all of those data elements together.


A second approach to building an EHR requires a shift in thinking to a document-centric model that is based on the patient chart. Instead of approaching the patient record as a large collection of randomly connected data elements, one can choose to use the patient chart as the basic building block of the system. Within the chart, there are several smaller units of information, such as laboratory data and clinical notes (see Figure 1. EHR Document Model). Much of this information is text based, especially the clinical notes, and may be managed as documents. The document model allows for the full description of the data, the context of the data, and it can faithfully reproduce the "legal text" representation of the data for any given point in time.




Figure 1. EHR Document Model. The EHR may be considered a collection of documents, many of which are text-based.


Only very recently has technology evolved to the point at which it is possible to create a fully integrated EHR based on a document model. The Standard Generalized Markup Language (ISO 8879:1986, SGML) promises to offer solutions for managing this text-based information because it is a standard that is explicitly designed to provide mechanisms for encoding the contents of a virtually endless array of documents, from the very simple to the very complex (Lincoln). Applying this capability to longitudinal patient health record can enable the EHR to preserve the narrative of the central transaction in medicine--the documented encounter between the caregiver and the patient. The preservation of this information in an open fashion can foster the progress of healthcare informatics, enabling better access not only at the patient level but also at the aggregate level for information processing needs such as understanding trends in disease and disease management.


Healthcare Informatics Standards Arena


As an EHR developer, Oceania is very interested in standards development and in using these standards in any products the company creates for several reasons. The primary reason is for the benefit of healthcare informatics itself and the quality care of patients. More than ever before, patient care demands easy communication of information between multiple partners, and the only way to facilitate this exchange is to develop standards based solutions. These solutions should not only offer easy information exchange and retrieval at the present time, but they should offer methodologies for long-term preservation and access that resist technology obsolescence.


From a business perspective, Oceania believes that standards-based solutions will enable us to closely focus on our core competency, electronic health record software, while at the same time ensuring that our product will work well with other systems. These other systems may be legacy EHR systems within an organization or they may be systems that are ancillary to the EHR such as an appointment and registration system or a diagnostic imaging system. Those who create healthcare informatics solutions must be aware of standards development in many different areas, forging them together in order to create systems that will address as many constituent groups as possible.

A critical component of any EHR is its use of vocabulary and language. Clinicians use a rich set of vocabularies to describe medical concepts, and these often differ across medical specialties. In order to exchange information, it is imperative that some vocabulary and language standards exist. Two such standards include the Systematized Nomenclature of Human and Veterinary Medicine (SNOMED) and the World Health Organization's International Classification of Diseases (ICD). SNOMED, owned and maintained by the College of American Pathologists and the American Veterinary Medical Association, divides medical concepts into eleven modules, or axes. Within the module, concepts are hierarchically arranged and assigned a code. Concepts (or their codes) may be grouped together from the various modules to form a complete medical concept. These codes may be used in an EHR to facilitate the indexing, retrieval, and exchange of information.


ICD-9 (ICD, Ninth Revision) codes are primarily used in the United States to facilitate billing and other medical claims. Nursing and other specialty communities use standard language and vocabularies, and there are other standard vocabularies for classifying biomedical literature. In addition to language and vocabulary standards, Oceania also bases its products on content and data representation standards such as SGML and HL7. Finally, we must merge these standardization efforts with other information technology standards such as CORBA and COM.


Health Level Seven (HL7)


The HL7 standard defines methods for the exchange of "clinical, financial, and administrative data among healthcare oriented computer systems." The standard is based on the OSI Reference Model, and it is conceived as the seventh, or application, level of that model. The HL7 organization, created in 1987, became an American National Standards Institute (ANSI) accredited standards organization in 1994, and its scope and audience has become international, nearly tripling in size during the last three years . HL7 is based on conceptual idea of what is called a "trigger event," a real word event that causes the need for patient data to be exchanged between systems (HL7 Standard).


HL7 is a messaging syntax that defines the messages different systems will send in order to communicate with each other. The standard specifies types of messages that correlate to various functions found in a clinical setting. For example, the message type ADT (Admit/Discharge/Transfer) is used to communicate admissions data about patients. Based on the ASN.1 standard for a messaging syntax, the messages themselves are composed of segments that are in turn composed of fields. The definition of each type of message specifies which segments it contains and in which order they will occur. To build on the previous example, the message definition for an ADT message will specify that it may contain certain segments, such as a PID (Patient Identification) segment. The order of the segments and fields is specified, as are rules for the repetition or optionality of a segment or field. At the data level, the standard has defined several data types, such as address, telephone number, patient name, and coded entry that may be used in any message where they are needed (HL7 Standard).


The format of HL7 messages conform to HL7-specific encoding rules. Briefly, the beginning of a segment is delimited by its three-letter code, such as MSH (the message header) or PID (patient identification); segments are separated by the vertical bar. The MSH segment accompanies every HL7 message, communicating information such as the version of the HL7 standard being used, the type of message being sent, and the encoding characters used for field and segment separators. Each segment is terminated with a carriage return; there is no special termination character to mark the end of the message. See Figure 2, Example HL7 Message.









PID|||PATID1234^5^M11||JONES^WILLIAM^A^III||19610615|M||C|1200 N ELM STREET^^GREENSBORO^NC^27401-1020|GL|(919)379-1212|(919)271-3434 ||S||PATID12345001^2^M10|123456789|987654^NC|<cr>






Patient William A. Jones, III was admitted on July 18, 1988 at 11:23 a.m. by doctor Sidney J. Lebauer (#004777) for surgery (SUR). He has been assigned to room 2012, bed 01 on nursing unit 2000.


The message was sent from system ADT1 at the MCM site to system LABADT, also at the MCM site, on the same date as the admission took place, but three minutes after the admit.


Figure 2. Example HL7 Message, presented in the HL7 Standard.



The advent of the HL7 standard has been very beneficial to the healthcare community. It has enabled communications between systems in a standards-based way where there was not one before. HL7 has been internationally accepted and implemented, though it is used most extensively in the United States. HL7 works best when used to transfer messages containing very atomic data. Like a relational database system, it is very capable of managing information such as patient identification numbers and matching that with lists of prescribed drugs. An extremely valuable aspect of HL7 that should never be overlooked is the amount of work that has been done to define the components of many clinical communications scenarios. For example, the definition for the Patient Visit segment very richly defines the many grains of information needed to completely communicate patient demographics information.


HL7 does not, however, support a document model for healthcare because its flat structure makes it exceedingly difficult to communicate text-based documents. In addition to this, individual implementations of the HL7 standard may diverge widely from each other, hindering information exchange at the institutional level. This is embodied in the HL7 Z segment, a segment users of the standard may define for their own purposes. As HL7 leaves many areas of communication undefined, quite a bit of information exchange takes place in Z segments. Unlike SGML, HL7 also has taken as its scope the task of defining a standardized message syntax as well as standardizing the content of the messages. Interestingly enough, one of the largest communities of interest for the use of SGML in healthcare informatics exists as the SGML Special Interest Group of the HL7 standards organization. This groupís efforts include a set of formal design principles for using SGML for clinical documentation. The group is dedicated to understanding the possible relationship between the HL7 standard and SGML. Debate in the group and its listserver ranges from using the SGML standard to encode HL7 messages to using the HL7 protocol to send valid SGML documents conforming to any DTD. The future suggests that a joint approach, drawing on the strengths of both HL7 and SGML may offer the most promising solutions to the healthcare informatics community.



The Benefits of using SGML in Healthcare Informatics


The benefits of using SGML and its associated technologies for healthcare informatics center around four inter-related themes: information exchange, system and platform independence, information retrieval and reporting, and long-term access and preservation. The ability to exchange information between various healthcare industries is more crucial than ever before due to several reasons. The first of these is simply that patients are increasingly more mobile. In order to treat patients, it is important that their medical records be able to follow them to provide their medical history. This is especially important in emergency situations when immediate access to information may greatly enhance the chances for the patient receiving proper care. In an emergency situation, for example, it would be imperative for a care provider to have knowledge of a patient's drug allergies before administering any drug therapies.


In addition to provider-to-provider based exchange, there are significant amounts of information exchanged between provider and payer organizations. Claims for payment are submitted to the payer organization, either a private insurance company or a government agency, usually in the form of ICD-9 codes. The generation of the codes is often a separate process from creating the clinical documentation, creating yet more paperwork or electronic records for both organizations. It is hoped that the use of SGML will enable the creation of clinical documents that may be exchanged as needed between these two organizations in order to communicate the claims information. An additional need for information exchange between these two parties when a claim is challenged or must otherwise be verified. An EHR utilizing SGML should, by design, have the capabilities of producing documents that could be attached to claims as they are, thereby significantly simplifying the claims attachment process. The ability to electronically exchange claims and claims attachment information is even more imperative with the passage of the Health Insurance Portability and Accountability Act (HIPAA) earlier this year. This bill mandates that the Healthcare Finance Administration (HCFA), the federal US government agency responsible for processing Medicare and Medicaid claims, receive all claims and claims attachments electronically and in a standardized form.


It is possible to consider information exchange scenarios based upon two models: exchange within the entities of one umbrella institution (intra-institutional exchange), such as a health maintenance organization (HMO); and exchange between institutions themselves (extra-institutional exchange). The system and platform independence offered by SGML should greatly facilitate both modes of exchange. The information systems environment within many large medical systems is comprised of many best-of-breed systems that may or may not be coupled together. For example, there may be one system for appointments and facilities scheduling, another for recording patient data, another for storing and accessing radiology images, and yet another for cataloging pharmacy inventories. Additionally, medical monitoring devices produce an array of information that may or may not be stored as part of the EHR. HL7 provides mechanisms for exchanging information between systems in such an environment, but HL7 does not address the management of data at the individual system level. The system and platform independence offered by SGML should offer solutions for the exchange and management of data within institutions as well as offer a common model for exchange between institutions.


Closely related to information exchange is information retrieval and reporting. In patient care it is necessary to retrieve and report on relevant parts of the clinical record. This produces a patient-centric view of the medical record that would most likely be used by an individual clinician. A document model for healthcare will enable the retention of a greater degree of information and its context, but only if there are satisfactory methods for retrieving the information. SGML will allow the creation of a document-based EHR that will preserve the full text of the document with its rich semantic structures, context, and narrative, significantly expanding the amount of information for retrieval and re-use. Additionally, the ability to retrieve individual document elements made possible by the use of SGML will allow for optimized retrieval based on the patient-centric view and the population view. Paired with the use of vocabulary and other language standards, the use of SGML should allow the creation of rich data repositories suitable for analysis and reporting


Finally, it is the goal of the healthcare informatics community to preserve the functionality of the exchange and retrieval functions offered by SGML over time. From a patient care perspective, it is necessary when using an EHR to insure that the data will be in a format that will survive changes in technology over the lifetime of the patient. Aside from that, the benefits to medical research are immense if clinical documentation may be preserved in a machine-readable form for long periods of time. Such a possibility would give researchers the ability to research long-term trends in disease management and also make real the possibility of retrospective medical discoveries in the treatment and understanding of diseases such as AIDS or cancer (Morris 1997).


The Oceania Electronic Health Record: WAVE™


Oceania's WAVE is an electronic health record (EHR) product, one of the functions of which is to allow clinicians to create and access clinical notes. Based on the conventional clinician work practices, WAVE allows the physician to create one of a number of different types of entries into the record, such as an entry describing the results of a typical physical exam or an entry describing a particular symptom the patient may be experiencing.


Currently, primary access to the document contents is dependent upon indexes to the documents created in relational database tables. When the clinician signs the document, certain document elements are abstracted to relational tables, or "posting tables." The information in the posting tables generates a summary view of the contents of the medical record; this information may also be accessed for the purposes of providing a population-centric view for analysis purposes. In addition to abstracting elements in the posting tables, the entire contents of the documents are saved to the repository. Oceania has been able to provide clients with complete access to the entirety of the documents' contents, but this has been done in a non-standardized manner that required clients to be able to parse the documents in the format in which they were generated by the WAVE application.


The most important aspect of the WAVE application is its use of structured data. WAVE operates on the underlying metaphor of a document-based clinical health record. Each new entry into the patient's health record is considered to be a document. WAVE documents have an underlying structure, being composed of a number of sections, each of which reflect a semantic role within the document. Thus, a patient's entire health record is considered to be a series of structured documents. The contents of the documents created by the clinicians are based upon Oceania's Clinical Content Knowledge Base, a collection of refined medical terminology organized into meaningful classifications and hierarchies created and maintained by clinicians. Clinicians may enter data into the document by selecting an existing document template or using WAVE browsers and dialog boxes, all of which enable the captured information to be structured. Information may also be inserted into the record as free text.


When one term is selected using the browser, child terms belonging to the original parent term are generated from the Clinical Knowledge Base and are displayed to the user in the column to the right of the original term. Each term that a clinician selects is also associated with a syntactic role which may be either a subject, a property, or a value. The subject is a clinical concept which may be described by one or more property-value pairs. For example, the subject headache could be further described by the subject-property value pair severity-mild. The use of the subject-property-value model generates a structured patient record whose level of granularity extends to the individual words which comprise the patient data.


Oceania and SGML


Oceania is developing SGML solutions for several reasons. One of these is for representing the data within the application itself. It is hoped that by representing the information within the application in a standards-based way, we will be able to build efficiencies into our production process. Additionally, Oceania envisions using SGML as a method for allowing the application to interact with an editor and vocabulary browser in the future for the generation of the clinical notes themselves. Another aspect of WAVE is its capability to deliver information to clinicians that is outside of the patient record, such as clinical practice guidelines or drug information. The most important reason for Oceania to build SGML-based solutions is so that Oceania clients may reap SGML's benefits.


Early DTD Efforts


Most of Oceania's SGML activities have centered around the creation of a Document Type Definition (DTD) for the documents produced by the WAVE EHR. The first steps of DTD creation centered around mapping the content from the documents unchanged. This process relied on earlier analysis that had taken place to create the structure of the WAVE documents themselves, as based on the vocabulary usage in the CCKB. For example, it was determined that a clinical "problem" may have modifiers such as location, severity, or time duration. The initial DTD organized the document into sections, each of which was composed of many different, specific sentence types. Within each sentence, each word is also encoded according to its syntactic role in the sentence.


The following example, the element declaration for a problem is taken from a draft DTD:


<!ELEMENT problem - - (problem-name, (description | evaluation-by | status | location | onset | episode | resolved | denied | previous-eval | previous-medications | current-medications | current-control | e-m-code | e-m--cpt-office | billing-code | secondary-billing-code | comment | trend | cpt-primary | cpt-secondary | most-recent-occurrence | other)*)>


As we worked with this DTD, it became apparent that even though it represented each component of the document perfectly according to how it was created using the CCKB, some important aspects had been overlooked. Most important of these is that the DTD did not seem to satisfactorily address information retrieval and exchange needs. We determined that this was a function of relying solely upon the vocabulary that had been optimized to allow clinicians to create clinical content using a browsing model. It became clear that if the DTD would be successfully optimized for retrieval and exchange purposes that many things would have to be changed. In order to make these changes in a systematic way, Oceania has developed a set of design questions to guide the DTD creation process.


Oceania Design Questions for DTD Creation


  1. At what level of granularity will we encode documents? The WAVE EMR encodes every word, as was previously discussed in this paper. This is primarily for use within the browsing interface of the application. When clinicians retrieve a document, they may interact with the text of the document and the browsing interface. Encoding the way in which each word was entered into the does have its purposes, but it does not inherently address retrieval and exchange needs. A major question in terms of granularity is whether or not we want to group tokens into larger groups in order to provide for a more pre-coordinated approach for retrieval; or do we want to encode the individual tokens and provide a more post-coordinated approach for retrieval. Consider two possibilities for encoding the clinical documentation for a fracture of the patientís left arm:


    <problem>Fracture arm left.</problem>
  2. How closely does information retrieval relate to information exchange, and how should that be reflected in a DTD? This question is somewhat related to the first design question regarding granularity. If every token is coded for the sake of retrieval, exchange could be compromised: exchange parties will either have to agree to a lossy up-conversion process (in terms of the granularity of the encoded data, not the data itself), or they will have to reach consensus on the use of standard element names.
  3. Which things should be elements, and which things should be attributes? Are there multiple answers to this question based on how codes and vocabulary will be used? It would be entirely possible to represent a complete medical concept with an empty element, using attributes to capture each dimension of the concept; it is also possible to let each concept dimension be represented by its own element. Attributes also provide a powerful and expressive mechanisms for using controlled vocabulary. Consider the documentation for having found a fracture in a patientís left arm:

    <finding code1=123 code2=456 problem=fracture location=arm laterality=left>

    The opposite approach is also entirely feasible:

    <finding code1=123 code2=456><problem code1=35 code2=32>fracture</problem><location code1=76>arm</location

    The first approach may facilitate retrieval by grouping concepts into larger groups, especially for submitting claims using ICD-9 codes. The second approach allows for a more granular use of codes, and may be more useful for research purposes using a coding scheme such as SNOMED.
  4. What is the best way to represent negation in clinical documentation? A significant part of a patientís chart may consist of negations. This occurs when a clinician checks for a given condition and documents its absence. For example, if a clinician asks a patient whether he or she drinks and the patient denies drinking, the medical record would read "Alcohol use denied."
  5. How specific should DTD elements be? The Oceania EHR, as previously mentioned, encodes sections, sentences, and individual words. These elements are currently encoded by their specific identifying names. For example, there are several different kinds of sentences, such as sentences to describe a problem, medication, or procedure. Each sentence may contain certain types of individual words, depending on the sentence type: a problem sentence will contain a problem name, and an adverse reaction sentence will contain a drug name.

    It is yet to be determined whether the DTD should contain a declaration for every kind of sentence possible (i.e., <!ELEMENT problem-sentence . . .) or whether a more generic model should be used (i.e., <!ELEMENT sentence . . .) employing attributes as modifiers (i.e., <!ATTLIST sentence type IMPLIED). Oceania is also considering using an even more generic model based on the use of divisions (div0, div1, div2 . . .) model found in the Text Encoding Initiative DTD. The former approach may facilitate retrieval, but at the same time it will complicate DTD maintenance.



Exploring these and other design questions should allow a healthcare informatics vendor to build a methodology for building a DTD that addresses as many constituent needs as possible. The design team consists of people from multiple functional areas, including but not limited to physicians with interests in information retrieval and analysis as well as engineers with interests in product integration. Multiple needs, including those within the company and those of its clients, suggest the need for two or more DTDs and the appropriate mechanisms for mapping between them.


Oceania and SGML Architectural Forms


This paper has discussed the many benefits that the use SGML will provide to the healthcare informatics community. These benefits may not be fully realized, however, if ways to standardize implementations of SGML. While it may be possible standardize a DTD (or portions of one) for healthcare documents, this is not a likely development. Vendors and clients alike should be able to develop highly customized DTDs for their specific purposes. The use of an SGML architecture seems to be at least a partial solution to these issues, allowing agreements to be made at a level of abstraction higher than the DTD. The use of an architecture should facilitate both intra-institutional exchange and extra-institutional exchange. From a developer perspective, Oceania is also interested in the use of SGML architectural forms for several internal purposes. One of these is to facilitate DTD development, especially if multiple DTDs will be developed for various purposes. One plan under consideration is to develop an architecture for Oceania documents; then the individual DTDs will be derived from the architecture, creating the necessary threading between the DTDs.


Oceaniaís experience with architectural forms began with the meeting that produced the Kona Architecture proposal in July 1997. Since that time, Oceania has worked with its existing DTDs to evaluate and understand the practical application of architectural forms. We find that the SOAP format of the Kona Architecture has not worked well with our DTD in some instances because we are unable to definitively map any of our sections to the SOAP structures. At one point in time, the WAVE application limited the data items clinicians could use in the different document sections, but this practice was discontinued. Now, the names of sections serve primarily as guidelines; the contents of the sections is not as tightly enforced. The result of this is that it may not be possible to definitively map these structures into their appropriate architectural form. For example, a vitals section containing a blood pressure measurement would map naturally to the objective structure of the Kona Architecture; however, within WAVE a clinician could enter information into the vitals section that may not necessarily be objective information , such as a comment that one of the patientís parents suffered from hypertension. If this document was normalized against the Kona Architecture for exchange purposes, the architectural element objective would not necessarily be a correct descriptor for the information.


A second issue to consider pertains to the level of granularity at which Oceania encodes its documents. If the practice of encoding each word is continued, then the Kona Architecture will need to be much more granular in order to accommodate exchange needs at that level. Currently, Oceania has mapped most of its individual words to either the code or mention architectural elements, but this is an overuse of those elements. A solution to this problem is either to work with the appropriate parties to extend the architecture or by creating a version of the DTD designed specifically for use with the Kona Architecture. Despite these minor shortcomings, the use of architectural forms should offer healthcare informatics vendors the ability to offer customized SGML products to individual clients while at the same providing mechanisms for information exchange within the larger community of interest.


In conclusion, the use of SGML in concert with other standards should bring about significant advances in the field of healthcare informatics. SGML will undoubtedly continue to be recognized by the community as it offers so many benefits to the industry. Perhaps more important, discussions of SGML, especially within the HL7 SGML SIG are reaching a much broader audience within healthcare informatics than ever before. The presence of the SGML SIG within the HL7 group along side of the excellent work that HL7 has done during the last decade suggests that the most likely scenario is for there to be convergence between the use of the two standards, using the best that both has to offer. The use of SGML architectural forms and other components of the SGML family of standards should help to encourage the widespread use of SGML by providing needed functionality such as the ability to do SGML-to-SGML transformations using DSSSL.


Eventually, Oceania envisions a healthcare informatics environment in which the information in the patient electronic health record is easily retrieved and exchanged by the appropriate parties; the information will be created by tools intuitive to clinician work practices, and it will resist technology obsolescence, becoming a valuable research asset for the healthcare community. The only way to achieve these goals is to create an environment in which healthcare informatics vendors work in conjunction with provider and other organizations to create standards based solutions.


Selected References


Health Level Seven Standard. The full text of the standard, other information.


Health Level Seven SGML SIG. Working papers, references, HL7-SGML mail group, and other information.


Kona Architecture Proposal to the HL7 SGML SIG.


Lincoln, Thomas L, Daniel J Essin, Robert Anderson, Willis H Hare (1994). The Introduction of a New Document Processing Paradigm into Health Care Computing: A CAIT Whitepaper. Santa Monica, California: Rand Corporation. [Available at the HL7 SGML SIG website,]



Morris, Jonathan A, Rachael Sokolowski, John E Mattison, David Riley (1997). Standard Generalized Markup Language (SGML) in Healthcare. Accepted for panel discussion at the Healthcare Information Management Systems Society (HIMSS) 1998 Conference in Orlando, Florida.