Rawlins EC Consulting

Rawlins EC ConsultingRawlins EC Consulting

Rawlins EC Consulting - Help! Rawlins EC Consulting - Contact Rawlins EC Consulting - Resources Rawlins EC Consulting - Services Rawlins EC Consulting - About



XML Resources

EDI Resources

General EC



ebXML Core Components

The master data dictionary?

The Core Components work was perhaps the only thing that ebXML did that was truly unique and original. In my opinion, it is also the work that ultimately has the most promise. Unfortunately, it is also the work that was, and still is, the least developed and most troubled. In this article I'll describe at a high level the key Core Components concepts, the advantages and limitations of the approach, the troubled development history, and prospects for the future.


Like much of the ebXML and UN/CEFACT business process work, the Core Components work is targeted more toward developers of XML document standards than users of the standards. It is intended to give those developers a common base library of data items that can be used to construct XML schemas. When UN/CEFACT's TMWG presented the proposal for a Core Components team at the first ebXML meeting, the work was mostly thought about in terms of a library of common business objects. These would be building blocks that could be plugged into the appropriate places in a UML model of a business process. This process would ultimately yield schemas for XML messages. However, as the work evolved it took a different direction. The thing that took it in a new direction was the knowledge, reinforced by years of experience in EDI, that a piece of information might mean different things or have different component parts in different business contexts. For example, both "patient" and "carrier" are both examples of a "party" object, but in the U.S. a patient uses Social Security Number as an identifier while a shipping carrier might use a SCAC code. Carried even further, the overall structure of an invoice in grocery and farm equipment manufacturing might be the same. However, the pieces of information that describe a line item are quite different.


The concepts can perhaps be best expressed if we think about them in terms of class hierarchies. Core Components are not objects in the strictest OO sense of the term, since they only have attributes and not behaviors. However, the class model analogy is still useful. A conventional OO way of thinking about, for example, "party", would be to have an abstract, generic party as the parent class, with child classes for each area, or "context", in which they are used. This is shown in Figure 1 for "party" and Figure 2 for "product". These context dependent child classes are, in ebXML terminology, "business information entities", or BIEs. The May 2001 CC technical reports included a preliminary set of "context drivers" that are applied to Core Components to create BIEs. These context drivers form a framework of several categories that are used to define different business contexts. They include things such as business process, geographical region, industry, and official constraints (such as legal or regulatory). In these diagrams we show three very broad context categories: health care, motor carrier freight, and retail grocery (reading left to right).



Figure 1 - Party


Figure 2 - Product


What is unique about the CC way of doing things is that, as shown in these figures, is that we recognize that these "contexts" apply not just to one class hierarchy but to several. To illustrate the power of this idea, we take a very simplified document, such as an invoice, that might be exchanged as one step of a generic payments business process. It is made up of a number of generic components that must appear in the invoice regardless of context. For this simplified example we only show party and product, but a real invoice would of course have several more. The class structures for the generic components that make up the invoice are included under them and then turned 90 degrees on their depth (or z axis) to form a three dimensional model. Each context becomes a different plane in the model. This is shown in Figure 3.



Figure 3 - Context


These concepts allow us to model a generic business process, with generic documents exchanged at specific points in the process. The generic documents can identify placeholders for certain required information. For example, an invoice might have buyer, payee, payment information, and one or more line items. When the "context rules" are applied to these generic placeholders, fully described objects are created with all of the detailed information that is appropriate for the context. This produces a full document description that is tailored specifically for the context.


The CC approach is superior to the approach used in traditional EDI. In that arena, standard messages are defined using the "kitchen sink" approach. Every data item that anyone would ever want to use in a message is included in the standard. Furthermore, while EDI definitions necessarily must have at least some focus on business semantics, there is a deliberate emphasis on generic, re-usable structures which tends to strip a lot of specific semantics out of the standards. This forces individual organizations to write implementation guidelines that specify how they are going to use the standards for their own "context". These guidelines specify several types of things to complete both the semantics and syntax of the EDI messages that they use. They specify the subset of data elements they are going to use from the full set that the standard supports. They specify code values for qualifier elements that complete the semantics of generic structures. They may in text narrative provide further semantic qualification to a data element that is otherwise used in compliance with the standard. Finally, they may specify using data elements to convey data that doesn't fit the standard, usually because there isn't a convenient place for it in the standard.


In contrast, the CC approach emphasized business semantics over other design considerations, and recognized the importance of context. CCs with context fully applied allow production of a set of BIEs that are already tailored for a specific business usage. This set of BIEs is not only sufficient to convey the semantics of the business information, but also fully specified in terms of semantics. These advantages of the CC approach could greatly reduce the need for implementation guides. Further, if BIEs are linked to the fields used internally in business applications, then mappings to and from data exchanged with trading partners would only have to be done once. In current EDI practices, although most EDI management systems enable a good deal of reuse between mappings, a significant part of the mappings for each new message for each trading partner need to be created new.


The CC work of May of 2001 also included methodologies for discovering and analyzing core components, naming conventions based on ISO 11179, and an initial catalog of core components. An interesting feature of the CC work is that the naming conventions apply only to the formal names of the core components, and not to how they are used in any particular syntax such as XML. The names might only appear, for example, in an ebXML Registry. Actual business messages in XML or any other syntax might or might not use the proper names. For example, they might use the equivalent of the names, only in a language other than "Oxford" English. What makes the CCs usable across a wide variety of such implementations is the fact that each CC is assigned a unique identifier, or UID. The designers envisioned that naming of XML elements would be irrelevant so long as each semantically identical data item (each CC used in a context) would have an attribute of the UID. UIDs would then enable documents in different XML business standards, such as OAGIS or RosettaNet, or even different spoken languages, to be processed by applications using a single interface, or transformed automatically to a different XML business standard.


As powerful and unique as the CC approach is, it is not without a few problems and potential implementation complexities. The root cause of many of these complexities is that the CC designers again took the "kitchen sink" approach in defining the aggregate core components, i.e., those that are made up of two or more "basic" core components or other aggregates. For example, "party" includes in it every item of data that might be relevant to a party in any business context. This leads to several problems relating both to development and maintenance of the catalog, and to usage of the catalog in creating BIEs.


In developing the catalog, the maintainers have a strong interest in harmonization, i.e., in abstracting the information needs of several constituencies into as few data items as possible. Harmonization is a difficult and time consuming exercise. This essential problem is a common problem in OO analysis: trying to determine when two things are the same and when they are different. The CC work provides insufficient guidance in determining this question. As such, the process is too often likely to be inconsistent, subjective, and influenced by politics in the standards process.


In addition, there is a conflict between harmonization and commonality of semantics desired by the CC catalog maintainers, and the specificity of semantics desired by users. This is related to issues of timeliness and version control of the CC catalog. Although not set in stone as of May 2001, the predominant thinking among the CC designers is that UN/CEFACT would be responsible for the main CC catalog. Groups using that catalog to develop their own XML business standards would be encouraged to create BIEs from the CCs. But, they would not be compliant with the catalog if they added components to an existing CC aggregate. To be compliant, CEFACT would first have to add the components to the aggregate in the catalog. Then, other groups could use it in their BIEs. This makes everyone dependent on CEFACT's decisions about harmonization and duplication, and on their timetable. CEFACT could reject a request for an addition, or not respond in a timely fashion. In either case, groups needing to add components to aggregate CCs might not be inclined to wait on CEFACT. They might instead go ahead and add what they need directly to their XML schemas, and therefore reduce compliant use of the catalog. Beyond this, when CEFACT does issue a new version of the CC catalog, the groups that use it will need to review and decide whether or not to use any new components. Realistically, if the CC catalogs ever become stable it is likely to take several years. In the mean time, groups using the catalogs will be building their XML schemas on shifting foundations. In addition, there are likely to be problems in determining semantic equivalencies between different XML standards, for example between a RosettaNet purchase order and a OAGIS purchase order, if they are based on different versions of the CEFACT catalog.


The kitchen sink approach also introduces some complexity in XML syntax for schema designers. Aggregate CCs are extended into contexts by removing or masking child elements rather than adding them, which is the route usually taken in OO class models. This might be fine for semantic concepts stored in dictionaries, but it doesn't work as well with W3C schema. Schema are more amenable to extending types by adding child elements than they are by removing them (we create a new type by extending an existing type and adding elements or attributes).


From strictly an analysis perspective, a cleaner and simpler way to do all of this would be the classic OO approach of having minimal information in a common parent class, and adding more information as we extend into derived classes. Aggregates in the CC catalog would contain only the elements used in common by all BIEs that are based on them. BIEs would add basic core components as needed in the context. This approach would require discipline to ensure that the common elements of derived BIEs were indeed common. This approach would also give groups using the CC catalog more flexibility in creating BIEs, and make them less dependent on CEFACT since they could create or extend BIEs by adding components on their own. This would be more prone to chaos than the current CC approach, but I believe that pragmatism demands a reasonable tolerance of disorder in order to gain responsiveness to user needs.


Another complexity is in the utility of the CC UIDs. Much has been said in certain circles about how UIDs will allow automatic transformation between XML document formats, and automatic loading of data based on examining UIDs instead of element names. There are two main reasons why these kinds of statements are overselling the technology. The first is that what matters in business applications are not context independent CCs, but context dependent BIEs. For UIDs to be useful for these types of applications, they must be those that are assigned to the BIEs and not the CCs. This means that in addition to a catalog of CCs, a fully populated catalog of BIEs covering every needed context will be required. One could perhaps use generic CC UIDs and apply the values of the context drivers at each appropriate point, but this would require that applications or XSLT transformations consider the values of five or more context drivers in addition to the CC UIDs. The simpler scheme would be to just use the BIEs with context already applied. The other complexity is that for UIDs to be usable in instance documents, the current state of technology dictates that they must be represented as attributes of XML elements. Since attributes can't themselves have attributes, this means that all of the business data in an XML message must be represented as elements and not as attributes. Some XML standards that already use a mix of elements and attributes for business data will have difficulty in using UIDs.


The problems with CC UIDs and the fact that it is the BIEs that are really relevant in messages and not the CCs, leads into the other major potential limitation of the CC approach. This is the first time that any group has tried to use a formally defined context mechanism to identify business semantics. We really don't know yet whether or not it is going to work. It's neither a trivial nor an easy task for one person to devise an ontology of business semantics that contains all of the concepts that are likely to be needed to exchange business documents. However, with enough time and effort it could probably be done. It is quite a bit more difficult for a committee of people to come up with such an ontology and agree on it. However, to this basic ontology the CC approach adds an ontology of context categories, and further divides and categorizes the primary Core Component ontology using the context ontology.


The complexities of context lead me to the opinion that it may end up only being useful as a convenience in creating BIEs and not as a formal driver. "Document Assembly and Context Rules" presents a detailed, syntax based methodology for applying context rules to CCs. I tend to think that the concepts and approach discussed in it, like much of the business process, model driven architecture approach, pushes the limits of what can pragmatically be done in the near future. The art of software engineering just is not yet mature enough to support such ambitious functionality feasibly and reliably. The extreme edge of the CC proposals deals with creating XML schemas "on the fly" at run time, using formal context driver values and CC UIDs. This idea has been discussed in a couple of the recent books on ebXML, though it was only hinted at in some of the ebXML documents. This is one idea that to me borders on being Rube Goldberg at best, and science fiction at worst. Even assuming that we could be precise enough in CC analysis and application of context driver values to deterministically create BIEs and context dependent schemas, I fail to see the benefits of the approach when integrating the business applications of different organizations. The semantic information for the XML schemas is static; it essentially known before the instance XML documents are created. Having applications create schemas dynamically at run time reduces the utility of schema validation. How is an SME application going to do schema validation if the schema is only assembled at run time? The big trading partner who mandates the use of the schema would only assemble it when they need to use it. The only option then, is for the SME's systems to do the same thing. This only adds more complexity and cost to ebXML compliant software, pushing it even further out of the reach of SMEs. The only place where dynamic, run time schema creation might make sense is in web forms based applications where a human could supply any missing context or semantic information. I don't see it making much sense in any other environment.


As I said in the beginning of this article, when the ebXML Work Group was dissolved in May 2001, the CC work was the least developed of all of the project teams. There were several reasons for this, but most people with whom I've discussed it agree on two primary reasons. The first reason is that the concepts were new, and it took quite awhile to develop them. The team just wasn't finished when the clock ran out. This is one of the reasons that the "technical report" document type was created in addition to the original ebXML vision of technical "specifications". The second reason that most people agree on is organizational politics. Rather than spending time exclusively on productive work, the CC team was often forced to explain and defend its work and was subjected to reorganizations. Politics are, of course, part of every organization, but the CC project team seems to have suffered disproportionately from ebXML politics. The end result was that many of the CC concepts and methodologies were not fully defined in the final May 2001 documents. Even though there was an initial catalog of core components, there wasn't a single, normative, consistent set of definitions for the key CC concepts and terms. The net effect is that any CC catalog is still preliminary and will probably have to be redone to be made consistent with those definitions when they are finally developed and approved.


UN/CEFACT's Electronic Business Transitional Work Group, or ebTWG, starting in October 2001 took responsibility for development of the CC specifications. They have finally developed a single document that specifies with normative definitions the key CC terms, concepts, and methodologies. At the time that I write this a final draft of that specification is out for public review. Although I believe that there are still problems with that draft, it is a vast improvement over the May 2001 documents and moves the CC work significantly nearer to completion. As I made the point earlier, development of a CC catalog without such an approved specification is still a preliminary exercise. I do note however, that CEFACT has continued developing the CC catalog.


In closing, perhaps the best way that I can summarize my view of the ebXML Core Component work is to say that it is the most inventive and experimental part of the great ebXML experiment. Some experiments succeed, and others fail. Only time will tell whether or not this one yields the cure for cancer or is just another mess to be cleaned up off of the laboratory floor. I do believe, though, that the basic concepts will be progressed in some form in the future, even if the ebXML CC work doesn't survive.


April 16, 2002

© Michael C. Rawlins