Two Good Uses for XML

During the past seven years, XML (eXtensible Markup Language) has gone from “What is it?” to “We’re doing it.” Surveys show that most enterprise IT departments have some XML-related projects underway, and many companies use XML in mission-critical applications. As a result, the volume of XML data created, transferred and stored is growing daily.

In the early days of XML, there was considerable confidence that the World Wide Web Consortium (W3C) and other organizations would quickly define a core set of XML infrastructure specifications to cover XML schema definitions, linking specifications, messaging, and querying. There was also a widely shared assumption that industries would converge on a manageable number of standard schemas by which they would do business using XML.

Since that time, unfortunately, XML technologies have grown exponentially to include thousands of daunting pages of generalized infrastructure specifications and even more on top of those to cover industry-specific applications of XML. Outside of a few truly core specifications — such as XML itself, XML Namespaces, XPath/XSLT and SOAP/WSDL — it is difficult to discern true standards that are fully supported in an interoperable manner.

Drowning in “Standards”

It is not clear whether these specifications are proliferating in response to consumer pull or vendor push. But one thing is clear: instead of following a relatively small number of existing, proven standards, vendors are rushing to create a large number of new, unproven specifications that are loosely called “standards.”

We now have literally dozens of core infrastructure specifications from a number of overlapping organizations: the World Wide Web Consortium (W3C), the Web Services Interoperability Organization (WS-I), the Organization for Structured Information Standards (OASIS), and even the Organization for International Standardization (ISO).

To make matters worse, as ambiguities and errors in these specifications are discovered, discussed and corrected, the revised specifications grow even more complex. The resulting muddle and confusion threatens to slow adoption of XML by end-users, and to obscure XML’s true value to business.

Standardization works best when it enshrines best practices supported by a core group of focused organizations. Therefore, one can only hope that market forces and the process of technological evolution will soon converge on widely useful vertical industry schemas and kill-off many of the redundant and overly complex infrastructure specifications.

Even as that winnowing process is in progress, however, companies can take advantage of two evolving areas where XML can make a particularly valuable contribution: information aggregation and semantic integration. These do not require industry consensus on authoritative schema or complex technologies, but offer tools that can be applied today.

Information Aggregation

XML’s pervasiveness and universality allow individuals to combine information from disparate sources and enterprise applications in a process called information aggregation. Today most enterprise applications can import and export data in XML, and several technologies have emerged that can process and transform XML without having to understand the content in detail.

SOAP, for example, provides a format for wrapping XML messages with specialized information about how the body of the message may be routed, encrypted and authenticated. In addition, specifications such as XPath, XSLT and XQuery allow XML data to be located in a message stream or database, associated with related messages or documents, and then transformed into another format. By combining these technologies, a new generation of infrastructure products, known as XML-powered “hubs” or “buses,” has emerged. These products allow information to be exchanged among applications that were designed in isolation, but can be configured to operate in unison.

Semantic Integration

A second opportunity for organizations to derive significant value from XML is called semantic integration. In this case, companies take advantage of the labels associated with each XML information item to process the information in a way that is sensitive to the meaning — not just the structure — of the data. But there is little agreement on exactly how to do this.

The W3C has invested heavily in a suite of specifications, called the Semantic Web, that allow a person to describe the meaning of specific terms in a specific namespace and to perform logical inferences to map the content of a document. One of the most promising of these specifications is the Web ontology language (OWL), which provides a mechanism to turn informal business conventions into rigorous “ontologies.” Although the Semantic Web approach has generated considerable excitement in academia, and generous governmental support in both Europe and America, it has gotten mixed reviews from industry vendors and end-users.

Many other approaches to the semantic integration problem are being tried, including heuristic text mining techniques and the use of basic XPath, XSLT and XQuery technologies to build informal taxonomies. It seems likely that some combination of XQuery’s ability to look at the structure of XML markup, OWL’s ability to rigorously define what the markup means, and text-mining tools to process the content within the tags will take semantic integration out of the lab and into the workplace in the next few years.

As is underscored by its role in information aggregation and semantic integration, one of XML’s greatest advantages is that it does not compete with alternative technologies so much as transcend them. XML creates a common basis for unifying diverse platforms, applications and communities without displacing what really works. Therefore, the most successful users of XML technology will be those who exploit the power of the simple principles at its core — unhindered by reams of so-called standards — and can apply them in a reliable and efficient way.

Michael Champion is a senior technologist at Software AG, Europe’s largest and most established systems software provider. He has been a software developer for 20 years, and has had extensive involvement with the W3C, including co-chairing the Web Services Architecture Working Group. His participation on the W3C’s Document Object Model (DOM) Working Group from 1997 to 2003 included work as an editor of the core XML portion of the DOM Level 1 Recommendation. Champion has authored numerous articles and is a frequent speaker at industry events.