2007-01-07

Office Suites: OpenDocument and OpenXML

2006 January Work-in-Progress (WIP)

Due to the increased number of comments on this subject, I figured it's time to blog a full history and comparison of these two, alleged standards.

Overview:
  1. XML: Not an interoperability standard
  2. A brief history of Sun StarOffice (and OpenOffice.org)
  3. A brief history of Microsoft Office
  4. Rich Text Format (RTF): The useless standard
  5. Intellectual Property (IP) considerations
  6. Cross-platform history
  7. Document compatibility and longevity
  8. My history and preferences
1. XML: Not an interoperability standard

Probably the biggest myth to dispel from the get-go before talking of any standards is that XML is a standard for content and style like HTML or SGML. XML is very much not. I'm not going to spend time on defining all these acronyms and explaining further (other than in the following on XML) as I assume that if you are reading this you have heard of them (except maybe SGML), know what HTML is and know that XML (even if you don't understand it) is what most everyone is using today.

In a nutshell, XML was the answer the World Wide Web Consortium (W3C) gave to vendors who constantly complained about the slowness of their standardization process. XML is essential a set of standards for creating vendor standards for their implementations. XML has nothing to do with interoperability. It merely says how you should format your tags, how you should document your definitions, how you will define your schema, etc... Most "complete XML" documents are those that have a base "content" block (typically its own file), with one or more style blocks (or possible style templates), another block for modifications to the schema/templates, and then any base support definitions, templates, schema, etc... (which may or may not be included)

Now there is no "master" or "root" way to "interpret" all of these support specifics and allow one specific XML implemention by another XML solution, at least not automagically in software. And there never will be, period. And that's just the technical reality, before we tackle source code availability, intellectual property (IP) considerations and countless other, non-technical details.

2. A brief history of Sun StarOffice (and OpenOffice.org)
  • StarView class library
  • StarOffice 3.0
  • StarOffice 4.x and 5.x
  • OpenOffice XML (OpenOffice 1.x and StarOffice 6/7)
  • OpenDocument XML (OpenOffice 2.x and StarOffice 8+)
StarView class library

I'm going to skip over the history of German StarDivision, which was founded in 1986, and begin with the cross-platform StarView generation. In the early '90s, StarDivision developed a cross-platform class library aka "toolkit" for C++. In non-geek-speak: a bunch of programmer code that lets you write software for any computer without having to worry about the differences between the computers. At the time, there was a lot of uncertainty where Microsoft was going with IBM on OS/2, as well as the "Chicago rumors" inside of Microsoft (which became Windows 95, and took Windows NT's place as the desktop OS) as well as existing UNIX need for a cross-platform library. I'm sure the success of the first, integrated office suite, ApplixWare on UNIX (Solaris, among others), StarDivision decided to take its existing product offers and do similarly.

Geek Note: StarView was built for DOS, OS/2, Win16, [true] Win32 (as Windows NT 3.1 just came out, over 2 years before Windows 95), MacOS (pre-v8/Carbon) and Sun Solaris 2 (OpenView) and became commercial available for license by late 1994, after the release of StarOffice 3.0.

StarOffice 3.0

StarOffice 3.0 was the first, integrated, cross-platform office version (sans MacOS) that first appeared in 1994 from StarDivision. It was available on OS/2 (32-bit), Windows (16-bit, but also ran on Windows NT) and Solaris (32-bit Sun UNIX). All dialogs were the same across all applications, and offered a true, integrated suite. Its StarView library was both a blessing and a curse. While it resulted in almost total transparency across platforms, it was a memory hog, as the entire office suite with all its features was resident. It often required a whopping (at the time) 16 or even 24+MB of memory to run well on OS/2 or Windows (and even more for Solaris or MacOS).

One of StarOffice 3.0's greatest strengths was its support for a wide variety of documents, including new Internet export features. Not only did it handle WordPerfect and Lotus 1-2-3, not only did it handle Microsoft Word and Microsoft Excel import, but it also did Lotus Ami Pro (one of the few word processors to attempt such import of a "Desktop Publishing," DTP, like format), but even ClarisWorks (especially since a MacOS port came out later) and a few other systems. And they worked surprisingly well, at least for well-defined documentation formats (i.e., non-Microsoft ones). One of the most original aspects of the StarOffice 3.0 suite was its elementary export of HTML, something that Microsoft would not offer for almost another 3 years.

Geek Note: Not all import filters for other formats were available on all platforms. E.g., the Ami Pro import filters were only offered on OS/2 and Windows, and the ClarisWorks import filters were initially only offered on MacOS.

The StarView toolkit supported standard UNIX systems (as well as OS/2-derived ones like 16-bit and 32-bit Windows) and utilized the Sun OpenView and, later, Motif, toolkits for Solaris. GNU systems, such as Linux, are typically very Solaris-like/compatible (SunOS was the original GNU platform before Linux), so StarOffice 3.0 became available for Linux later (and was a standard "port" for version 4.0+ on-ward).

StarOffice 4.x and 5.x

StarOffice 4.0-4.2 was little changed from StarOffice 3.0, except for the addition of full Win32 (32-bit Windows) to the StarView toolkit so it ran natively on Windows 95 and NT (I would argue better than Microsoft Office in the case of the latter, since it was designed for multi-user systems like UNIX). PowerPoint support was added to StarImpress (and Impress' features really made PowerPoint look unpolished, especially for Internet and other capability) and the Linux port became a standard offering.

StarOffice 5.0-5.2 switched away from the StarView toolkit to a completely new C++ class library. With MacOS 8+ switching to Carbon, MacOS support was dropped as was OS/2. It was completely Windows 9x/NT aware, as well as native to Linux and Solaris. Font support on these platforms were greatly improved, especially UNIX platforms with true type support. Although still written in C++, Java became a staple for scripting and extensibility in the suite.

Geek Note: I don't know what StarOffice 5.0-5.2 used, details are sketchy. From what I've summized, it was STL C++ with some Boost (at least they became Boost) class libraries, and then some sort of cross-platform MFC (Windows) and Motif (UNIX) toolkit (and it wasn't WINELIB, it was very native).

In 1999, with over 40 thousand employees, Sun Microsystems tired of supporting Windows desktops and bought StarDivision outright for under $100M (the deal was original reported/estimated to be around $250M, and may have been over $100M after other considerations/purchases). Sun decided to begin releasing all the source code (programmer code) it could under the Lesser GPL (LGPL) license, which allows any commercial company to integrate the source code into their office suite, as long as changes to just the Sun provided source code portion is returned.

Geek Note: The LGPL allows dynamic linking with any other source code, even proprietary. Any and all source code Sun is unable to secure the rights to LGPL, Sun has either largely replaced with new LGPL developments or kept if it sees value but does not affect compatibility (e.g., clipart, additional fonts, additional import/export filters, etc...) in StarOffice. This has proved so successful of an approach that Sun is using the same for Java, with all GPL-licensable source code going in the OpenJDK, which replaces and provides for a superset of the Java SE (Standard Edition) and only those components not GPL-licensable going in the Java EE (Enterprise Edition) starting with version [1.]6.

OpenOffice XML (OpenOffice 1.x and StarOffice 6/7)

The resulting release of Sun's source code resulted in OpenOffice.org (OOo) version 1.0 (and StarOffice 6.0). At the heart of OOo was a new document standard, OpenOffice XML. StarOffice had always maintain a strong document definition and excellent backward compatibility over its almost decade-long existance, and OpenOffice XML was the complete "clean-up" of those definitions, styles, etc... into a set of standard, and complete, XML. OpenOffice's formats (.sx_) are ZIP compressed archives with files for the document's content, style and related schema/definitions. All OpenOffice XML base definitions, schema and templates are openly available and no additional logic in the suite itself is required except for processing and rendering. Although being licensed LGPL, any other tool can directly utilize its processing and rendering outside of the GUI (graphical user interface) -- such as production programs on a factory floor that aren't PCs.

Geek Note: Not just use of XML formatting, DTDs, schema, exact dimensions on formatting (very critical), standard templates, etc... PKZIP 2.x is an archive with individually compressed LZ77 files (LZ77 is also used by gzip).

The OpenOffice XML release included a submission of the full specification to the non-profit Organization for the Advancement of Structured Information Standard (OASIS). OASIS is the defacto-standards organization that over 600 sponsoring companies and vendors submit all XML standards to in the Internet generation. The sponsors of the OpenOffice XML submission not only included Adobe, Corel (WordPerfect) and Sun, but the US' largest exporter and industry documentation leader, Boeing. There are now many, lesser-known, 3rd party office suites on the market that utilize the LGPL OpenOffice.org codebase from Sun.

OpenOffice.org 1.1 (and StarOffice 7) was the revision that included the official, OASIS ratified standard of OpenOffice XML (which OpenOffice 1.0 is compatible with as well with an update) and various other improvements. One improvement was the complete segmentation of the suite into individual components (reversing the design of all prior, "load everything memory hog" approaches) that now results in a much leaner suite. It also included many improvements in Microsoft Office compatibility, some as a directly result of Sun's broad cross-licensing agreement with Microsoft. Sun holds all copyright on OpenOffice.org, which allows them to license to third parties under a non-LGPL license (this is of great note). It requires all contributors to the OpenOffice.org project to sign a non-exclusive contributor's agreement. Sun pays the salaries of the majority of full-time OpenOffice.org developers.

Geek Note: For those Linux "advocates" fixated on the recent Novell-Microsoft cross-licensing agreement of 2006, it is important to note that there will be no fork of OpenOffice.org by Novell. Not only is Sun helping with the new OpenXML support, not only is all OpenOffice.org code under Sun's copyright, not only does Sun's "agreement" with Microsoft that allegedly "taints" OpenOffice.org well before Novell's, but this alleged "new issue" doesn't start with OpenXML as Sun has been working on pre-OpenXML, Microsoft Office document import/export support from Microsoft "gold books" for over 3 years now (which affects OpenOffice.org 1.1+). There is a lesser-known, well-speculated reason for this (see the next section on the history of Microsoft Office).

OpenDocument XML (OpenOffice 2.x and StarOffice 8+)

OpenDocument XML is the result of a full, year-long review by OASIS and its members (including newer sponsors like IBM) of the OpenOffice XML standard for extensibility and compatibility with many office approaches, including Microsoft Office. OpenDocument XML has about 100 changes from OpenOffice XML, and is now considered the "going forward" version that all major office suites like Corel PerfectOffice/WordPerfect, IBM Lotus SmartSuite and even Microsoft Office (despite statements to the contrary), are adopting at least full export/import support for OpenDocument XML.

OpenOffice.org 2.x and StarOffice 8+ support OpenDocument XML, and read-only support has been added in OpenOffice.org 1.1.5+. Most importantly, Sun has signed over all Intellectual Property (IP) rights to the trust of OASIS, which will hold them (as many other submissions) under a right for all to use freely.

Probably the greatest, additional change in OpenOffice 2.x is the introduction of a truly open database front-end framework. OpenOffice.org always had an open data integration support, but it lacked a true front-end that wasn't tied to the legacy Adabas D (included in all versions of StarOffice from 8 back to pre-open source). The only component not offered in the suite is a personal information manager (PIM), although it does integrate with Mozilla browser (Firefox), mail (Thunderbird), calendering (Sunbird) and other components.

No native MacOS X (Cocoa/Aqua) version is offered, but there are both X11 (UNIX X-Window, as MacOS X is a Darwin-based UNIX system and offers networking with X11 graphical compatibility) and several other options. A Cocoa/Aqua port has been a highly desired option, especially since full, cross-platform document compatibility is offered by the OpenOffice.org base (unlike Microsoft Office for Mac, see following sections).

3. A brief history of Microsoft Office
  • Microsoft Office 4.x
  • Microsoft Office 7 (95)
  • Microsoft Office 8, 9 and 10 (97, 2000 and XP, including 98, 2001 and X/2002-Mac)
  • Microsoft Office 11 (2003 and 2004-Mac)
  • OpenXML (Microsoft Office 12/2007)
Microsoft Office 4.x

After the success of Claris and AppleWorks on Mac, Microsoft actually released its first "Office Suite" on Mac in 1990. It wasn't an integrated office suite, but just a collection of existing programs for the Mac platform. ApplixWare on Solaris and Windows, and several other, emerging suites including Lotus SmartSuite 3.x, were also available for Windows. So on the Windows 3.0 front, Microsoft cobbled together its Word, Excel and PowerPoint products into Office 3.0 (aka 92). The programs were separate and did not offer OLE (Object Linking and Embedding).

It wasn't until really Office for NT 4.2 in 1994 (Word 6.0, Excel 5.0 and PowerPoint 4.0) that Microsoft offered a fully integrated office suite where different programs could link to each other (Office 4.0 was to a lesser extent using Word and its simple applets, as it had issues with the older versions of Excel 4.0 and PowerPoint 3.0), although menus, key combinations, context sensitive menus/help and other integration was lacking (and would until much later, although acquired products like Visio still don't). The "NT" name was a misnomer, as the programs weren't 32-bit (except on Alpha, although its document compatibility was another matter), and required Windows NT 3.51 "Daytona" for "Chicago," what would become Windows 95, compatibility.

Geek Note: The modern Windows application use of the registry begins largely with Office for NT 4.2.

The first, commonly used Microsoft Office version was 4.3, with the same product versions on the NT 4.2 release. Thanks to Jochaim Kempin, a brilliant strategist out of Microsoft German, Microsoft Office 4.x when from less than 10% PC OEM share to over 90% OEM share. Kempin invented "bundling" and "rebates," and turned Germany's largest PC vendor, Vobis, from a 0% Microsoft shop to a 100% Microsoft shop overnight. By not only giving away software at a substantial loss (e.g., pay $11/unit for MS-DOS and get Windows plus Office 4.3 for free), but paying vendors "rebates" of the exact licensing costs for competitors products so vendors do not ship them, within 2 years (by 1996), Microsoft gained an overwhelming share of the market. This was despite the fact that individual Lotus 1-2-3 and Corel WordPerfect outsold Microsoft Word and Office sales combined on the retail shelf several times over.

Geek Note: Although no other software (let alone hardware) company has accomplished similar, 90% of the PC market is typically driven by tier-1 and tier-2 PC OEMs. 3dfx graphics cards outsold nVidia on the consumer , retail shelf, yet nVidia infiltrated the PC OEM market and grabbed a 90% share within no time.

The answer was simple, an assumed "free lunch." Most companies were looking to upgrade their aged DOS or Windows software, typically Lotus 1-2-3 3.1 and WordPerfect 5./15.2 for Windows (or earlier, possibly still DOS) and the new cost of Lotus 1-2-3 4.0 (or SmartSuite) or WordPerfect 6.0 (or PerfectOffice) was easily $700+ per system. PCs were now costing under $1,500 new, and they came with Microsoft Office (which the company got for free). Even when some, largely tier-1 OEMs licensed Lotus SmartSuite 3.1 or 4.0, Microsoft would often pay the $70-120/license in a "rebate" back to the OEM not to ship. And if they didn't agree, such as IBM who owned Lotus, Microsoft would jack the price of DOS and Windows up over $100/system (typically 4x its $25/system

The common attitude is that Lotus and WordPerfect were "too slow" with their Windows software, and that's why Microsoft "won." That's not true at all. AmiPro (acquired by Lotus), was the first, native Windows what-you-see-is-what-you-get (WYSIWYG) documentation program (it's actually a DTP, not a word processor), and Lotus 1-2-3 3.1 and, later, 4.0 (which had some of Lotus Improv's capabilities) were available before their Excel equivalents. Corel also offered Borland's Quattro Pro in its PerfectSuite, which was considered more native to Windows (build with Borland's advanced Windows components and framework) than Microsoft Excel at the time.

It was attacking the OEM, bundling and rebates. By 1996, with over 90% of the market, Microsoft started to charge for Microsoft Office -- essentially the Office 7 version. At first it was $50/license. But within another year, it raised the cost to over $200/license for just the base version.

The MacOS 4.2.x versions had documents largely incompatible with newer Microsoft Office 4.x for Windows, as with prior releases. You could often send a document from Windows to MacOS, but not vice-versa. The deeper reasons for this is documented in a following section on cross-platform compatibility.

Microsoft Office 7 (95)

The first "Chicago" (Windows 95) 32-bit native release of Office was version 7, with all products (Excel, PowerPoint, Access, etc...) incremented in version to match Microsoft Word, now version 7. Microsoft Schedule+ was introduced as a personal information manager (PIM) to combat Novell's Groupwise offering (among others), using the legacy Microsoft Mail API (MAPI) interface in Windows NT systems, also added when installed on Windows 95.

Office 7 (95) formats were partially compatible with Office 6 (4.x) documents, to a point. Microsoft offered upward compatibility, but did not offer good backward compatibility. The technical reasons for this are largely documented in the cross-platform section.

Despite the "Designed for Windows NT" logo program, 0% of Microsoft own products -- including Microsoft Office -- passed the requirements for the logo certification. Office 95 was not multiple user aware and had had various security issues when run under Windows NT, including most things refusing to run unless the user had "Administrator Privileges" (the "Power User" group/privilege came about as a direct result of Office 95 incompatibilities). Office 95 was the first office suite to be built on Visual Studio 5.0, and required the Internet Explorer libraries (a purposeful attempt to make Internet Explorer a required portion of the operating system), and integrated at the heart of the Windows (even Windows NT, bypassing its security protections). This is where most of the security vunerabilities come from -- especially on Windows NT, which has security privilege levels that DOS-based (yes, it's still in there -- version 7.x, and most system calls are 16-bit DOS 20) Windows 95 does not.

Geek Note: Yes, MS-DOS 7.x is still the kernel of Windows 95/98/Me (95/A = MS-DOS 7.0, 95B+ is 7.1 with FAT32 support), including all major system calls (16-bit DOS 20-3Fh functions) that are "augmented" by the still "386Enhanced" memory model VXDs and other virtual drivers and libraries. Caldera successfully sued Microsoft (settled out-of-court) for illegal product bundling of MS-DOS 7 + Windows 4 (Windows 9x) in violation of the 1995 DOJ Decree, and would testify later in the 1999 trial. In fact, Windows Me attempted to remove various support of these 16-bit functions to force developers to stop using them, but even Microsoft's own products took issue, hence why Windows Me is the least compatible version as most will note.

Microsoft Office 8, 9 and 10 (97, 2000 and XP, including 98, 2001 and X/2002-Mac)

To Do

Microsoft Office 11 (2003 and 2004-Mac)

To Do

OpenXML (Microsoft Office 12/2007)

To Do

4. Rich Text Format (RTF): The useless standard

To Do: Not even used by Microsoft, let alone everyone uses a "DOC import filter" on RTF because it's always got embedded DOC from Microsoft Word. Possibly should be #2 instead of #4?

5. Intellectual Property (IP) considerations

To Do: Sun gave theirs to OASIS, Microsoft promises not to sue open source
but reserves all rights.

6. Cross-platform history

To Do: Including why Microsoft has issues (data-alignment-based) between versions, let alone MacOS X, and why they _had_to_ switch to a XML format (even if it only encapsulates binary data)

7. Document compatibility and longevity

To Do: Not only my "real world" experiences from Office 7/95 through 11/2003 (read: can I please re-format again, I love doing that over and over Microsoft! NOT!), but also my _personal_ experience coming from Lotus 1-2-3 4.0 and AmiPro 3.1 to StarOffice 3.0 on Windows in early 1995, and the fact that I can _still_ read my documents!

8. My history and preferences

To Do: I'll talk about LaTeX to OpenDocument/MathML conversion options (since I use LaTeX for professional documentation), the IEEE Article and Book templates, and the equivalent DOC 6.0a (Office 4.x) template that I've had to hack and hack and re-hack over the years, etc...

NOTE: I'll probably end up "reorg'ing" this again in the end, but this is a "good start."

2 comments:

brian ashe said...

Good start so far. FWIW, MS wasn't *just* evil, they actually did some smart things along the way. Neat article on Excel's strategy, from someone who was there (at MS, on the Excel team) in the early 1990s: here

Carlos said...

These articles are fantastic; the information you show us is interesting for everybody and is really good written. It’s just great!! Do you want to know something more? Read it...:Great investment opportunity in Costa Rica: jcondos house, costa rica beach condos, condos rentals. Visit us for more info at: http://www.jaco-bay.com/