Office XML: The long view

For many years I have tried, and mostly failed, to get people to appreciate the value of structured information. Sure, I’ve connected with the chattering classes who Twitter, blog, and read TechMeme, but I’ve only been preaching to the choir. Inside our echo chambers we grok XML, tagging, syndication, and information architecture. Out in the real world, though, most people aren’t hopping on that cluetrain, and that’s almost as true today as it was a decade ago.

Of course I’m not alone in my quest. Tim Berners-Lee has also tried, and mostly failed, to evangelize the power of structured information. The gating factor always was, and still is, data entry. You can go a long, long way with unstructured information, as Google has brilliantly shown. In late 2002 Sergey Brin told me:

Look, putting angle brackets around things is not a technology, by itself. I’d rather make progress by having computers understand what humans write, than by forcing humans to write in ways computers can understand.

That’s a great way to make progress, but we’re not in an either/or situation here. There’s also huge progress still to be made by enabling (not forcing) people to write in ways that computers can understand more deeply and effectively.

Jean Paoli saw an opportunity to do something about that on a large scale. It was also late 2002 when I first started talking to him about the injection of XML capabilities into Office. I evangelized that stuff long before I became Microsoft evangelist, because I believed then, and still believe today, that it’s a crucial enabler for a world facing challenges that are infinitely compounded by almost universally crummy information management.

In the flurry of commentary surrounding yesterday’s approval of Office Open XML as an ISO standard, I haven’t seen anyone thank Jean and his team for having the vision to transform Office in this important way, and the constancy of purpose to make it real. Well, I’ll say it. Thanks!

Posted in .

4 thoughts on “Office XML: The long view

  1. I must say that I’m very disappointed that ISO fast-track procedure was used to force this standard through ISO. Standard of this size should benefit from peer review, and OOXML didn’t have enough of it.

    My own actions to request re-vote about OOXML approval where blocked by Microsoft partners who just abstained from voting. So, Microsoft shouldn’t be really proud about this.

  2. It is fantastic that MSOffice natively writes XML. It’s not pretty XML, but it is a step in the right direction — arguably a small one, as little or no effort has been taken to normalise bad data, so to an extent it is shovel-ware. Doesn’t seem to be worthy of a standard as many people have noted, and the games played to get it through ISO… well yuck.

    If anyone in there is listening — the tide is shifting.

  3. Not going to argue about the standards process, or Microsoft..

    I agree with you on structured information, but to me the critical point with XML (and other markup languages, I go back to GML for text markup on mainframes) is transformation. XML is good – but XSLT is what makes it great in my opinion. If data I want is in XML (or a subset, such as well-structured HTML) then I can use XSLT to transform it into whatever _other_ format I need. Similarly with other forms of markup or data identification – in fact, I developed XSLT which transforms an iCalendar (RFC 2445) file into XML – but I could have used any other programming language because the elements within iCalendar, the calendar components and properties, are marked up with identifiers. The markup makes these kinds of transformations possible. In theory (I haven’t tried it) a properly-coded XSLT could transform OfficeXML into ODF and _back_. That makes the two formats more-or-less functionally equivalent: I can extract information from the document using either set of markup.

    The only issue I’ve seen with Office XML (and this was in Office 2003, may have changed) is that they one-for-one translate the “control codes” in an Office Document into markup, rather than going the extra mile and building two things: one the structured document, the other a stylesheet document with the document style information.

Leave a Reply