Will Office 2003 Lead to Lock-in?

With the recent beta release of Microsoft Office 2003 out the door earlier this week, many customers got their first look at what Microsoft hopes will re-write the office productivity landscape with a new ecosystem of collaborative functionality based on XML. But will organizations have to buy into an entirely Microsoft architecture to tap it?

That’s the contention of Gary Edwards, a Web application design consultant and OpenOffice.org’s representative on the OASIS Open Office XML Format Technical Committee.

Edwards said that Office 2003 beta’s handling of the XML file format means that firms will not be able to tap the rich collaborative features of Open Office 2003 without resorting to proprietary Microsoft file formats. And to truly unlock its collaborative potential, firms will have to standardize on the Windows XP operating system (Office 2003 won’t run on Windows 9x), as well as Windows 2003 Server, SharePoint Server, Exchange Server, etc. As for the file formats, he called Office 2003’s XML “crippled,” because it strips XML files of all presentation and formatting information when saving them in the XML file format. It does not do this when saving files in Microsoft’s proprietary file formats.

“Although it’s still early in the review process, it does look as though XP XML has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers,” Edwards said. “Reports are that when saving to XML, [Office 2003] strips out the presentation and formatting information, leaving near raw content. It appears, at least from the non-enterprise systems user’s perspective, that all the really cool collaborative advantages are based on saving files in the XP proprietary format. Which means that “all” the users in the collaborative effort must be on the XP platform, using XP Office, connecting through XP servers. What kind of universal connectivity and exchange is that? XP users won’t even be able to collaborate equally with the 200 million Win9x users. Not unless they upgrade.”

However, Ronald Schmelzer, founder and senior analyst of XML research firm ZapThink, noted that Microsoft’s approach aligns more closely with a core tenet of XML theory: the separation of process and data.

“The idea is for XML not to specify how the information should be processed, but rather leave that task to XSL templates and other post-XML processing steps,” he said. “XML is supposed to be a presentation-neutral format.”

Still, Schmelzer said that becomes more tricky when integration goes beyond the enterprise itself.

“I think when it goes beyond intra-business integration to cross-industry and inter-organization integration, the question will be how much of the data they exchange do they want loaded with presentational and operational functionality and how much do they want to leave to the individual implementation of the company?” he said. “This is really not an answerable question — because it depends on the scenario. The problem with standards is that there are so many of them. The resolution here is to look at how companies and industries will adopt XML in their verticals and then determine which aspects of that should be embodied in standards and which should be embodied in products. Experience shows that companies and industries can hardly agree on the data, let alone the representation, so erring on the side of “less” in the XML body makes more sense.”

Microsoft chose not to respond to questions about presentation and formatting in their XML vs. their proprietary file format, simply noting that the native file format for InfoPath (the application for creating XML forms in Office 2003) is .xml.

“The native file format for InfoPath forms is .xml, which makes it easy for companies to integrate InfoPath forms into their existing business processes — one of the key advantages of this product,” a Microsoft spokesman said.

The spokesman added, “InfoPath creates fully standards-supported XML data. We can’t speak to what the competition does, but yes, if they support XML data, then they can leverage XML created in InfoPath with no other work involved. Where back-end databases are concerned, if that database supports XML Web services, then they can automatically accept InfoPath data as well.”

The Application-File Format Model

Edwards’ point depends upon some understanding of how XML is pressuring the traditional “application-file format” model. The traditional standalone application-file format model allows user customization via an application programming interface (API) set by the application provider. Users could only alter the application to their needs through the API, with access to the API and the file format structure determined by the vendor’s “permission” policy. But next generation XML-enabled applications could lead to a drastic power shift by putting much of the control traditionally reserved for the vendor into the hands of the user.

“Next generation XML-enabled desktop applications will need to march to the beat of a different drummer,” Edwards said. “The tightly bonded application-file format model is being replaced by a loosely-coupled three part model comprised of the application, XML standard schema templates, and XML standard file formats.”

With a central role is an XML standard file format, natively portable between proprietary applications. Application providers can use the standard format to easily configure their applications to import and export conforming compound documents.

“Today’s clumsy import/export process has great difficulty accurately mapping content, presentation and formatting components,” Edwards said. “And forget about anything having to do with intelligent or live document files that might contain business logic, routing, processing, transaction and user interface instructions.

“Perhaps the most important factor relating to standard XML file formats is that of human readable tags and standard processing techniques. With a proprietary file format, users had to either get special permission from the application vendor, or reverse engineer the binary format in order to work with the files in ways that met their specific needs (if those needs went beyond what the app vendor offered).”

However, Edwards said that an XML standard file format allows users to construct scripting machines and transformation procedures, without vendor permission. Combined with a community of developers creating tools, machines and advanced transformation procedures based on a standard file format, Edwards said power would shift from the application vendor to the owner of the information.

“Some people think that XML technologies are a gnarly swarm of human readable standards seriously lacking in the performance efficiencies of traditional binary files,” Edwards said. “The whole point, however, is to empower users by giving them direct access to an open file format so that they can mine, re-use and re-purpose information any way they can think of. Plus, the standardization of the file formats and related XML transformation technologies means that powerful machines can be constructed to service advanced content management and collaboration needs without having to beg the application vendor for permission or future enhancements.”

Schema Templates

XML schema templates are another important part of the puzzle, and one which contains much of what has been the domain of the API. It contains business process and processing intelligence instructions, and can contain instructions on how an application should present the user interface and where to access network components like Web services, data stores and Java computation. The difference? Schema templates are created by users, like vertical industry consortiums, rather than application vendors.

“Anywhere there is a defined business process, transaction process, or collaborative effort, there will probably be a shared schema template defining that process,” Edwards said. “In particular, vertical industries and global business trading partners are perfecting schema templates of all sorts, in efforts to streamline the transaction, exchange and interaction between disparate information systems.”

He added, “Using XML, businesses can describe a transaction process in terms that are machine readable and actionable. XML information conforming to open standards can easily be transformed or translated from a shared business process, back and forth, into the many disparate information systems, enabling these legacy systems and data silos to directly participate and interact at the point of transaction.

“Prior to this evolution, the only way to effectively interact and exchange information was to standardize on a specific platform, using specific applications (including exact version synchronization), and specific file formats. Literally everyone had to agree on the same proprietary stack, top to bottom.”

But XML schemas can eliminate that need by allowing organizations to keep legacy systems while still connecting and collaborating with anyone and everyone.

Pulling it Together

The essential point to all that, according to Edwards, is that the schema format actually determines the structure of the file format. For this to work, he said, the application must be able to pass on that schema template intact — which it can’t do if the file is stripped of presentation and formatting information.

Schmelzer, however, said this may be a bit of an over-reaction, given the fact that most collaboration and integration is still happening internal to the enterprise.

“XML and Web services use, especially for content-driven applications, is still very much limited to basic use of XML as a data-exchange mechanism between systems — primarily for internal integration approaches,” he said. “When dealing with exchanging information internally, what is most important is not to bundle all collaborative features into making for a huge, cumbersome XML file that only certain applications can process, but rather to strip out all the presentation layer features and focus on just the data to be exchanged. In this case, I don’t see how Microsoft is violating that. You can choose to save a document with all the rich presentation data left in, if you choose (and that data will only be processable by Office applications), or you can choose to save the XML with just the data in it. I don’t see how that cripples anything.”