A conversation with Brian Jones about Office and XML

In 2002 and throughout 2003 I wrote a flurry of InfoWorld articles about the XML features that were being infused into Office, including:

I was excited by the opportunities I saw then, and I still am today, though it has been a slower burn than I’d hoped. In today’s podcast with Brian Jones we discuss what those opportunities are, what’s changed (or hasn’t) over the past few years, and what remains to be done.

This podcast isn’t about the Office Open XML (OOXML) vs. Open Document Format (ODF) controversy that’s been such a hot topic lately. Instead it tackles a broad theme: how, in general, do we unite documents and data on the desktop and on the web?

An object lesson in surface area visibility

A comment from Mark Middleton perfectly illustrates the point I was making the other day about visualizing your published surface area. I started this blog in December, and ever since I’ve been running with a robots.txt file that reads:

User-agent: *
Disallow: /

In other words, no search engine crawlers allowed. Of course that’s not what I intended. I’d simply assumed that the default setting was to allow rather than to block crawlers, and it never occurred to me to check. In retrospect it makes sense. If you’re running a free service like WordPress.com, you might want to restrict crawling to only the blogs whose authors explicitly request it.

WordPress.com’s policy notwithstanding, the real issue here is that these complex information membranes we’re extruding into cyberspace are really hard to see and coherently manage.

For the record, the relevant setting in WordPress.com is Options -> Privacy -> Blog visibility -> I would like my blog to appear in search engines like Google and Sphere, and in public listings around WordPress.com. Interestingly, although I’ve made that change, it’s not yet reflected in the robots.txt file. I wonder how long that’ll take?

A conversation with Ed Vielmetti and John Blyberg about superpatrons and superlibrarians

Last fall, in Ann Arbor, Michigan, I gave a talk entitled Superpatrons and Superlibrarians. Joining me for this week’s podcast are the two guys who inspired that talk. The superpatron is Ed Vielmetti, an old Internet hand who likes to mash up the services proviced by the Ann Arbor District Library. That’s possible because superlibrarian John Blyberg, who works at the AADL, has reconfigured his library’s online catalog system, adding RSS feeds and a full-blown API he calls PatREST.

I’ve written from time to time about Eric von Hippel’s notion of user innovation toolkits and the synergistic relationship between users and developers that can develop around such toolkits. What Ed Vielmetti and John Blyberg are doing with Ann Arbor District Library is a great example of how that relationship can work.

Update: I meant to call out some of the excellent work that John’s been doing lately. This catalog record is an example of an Amazon-like recommendation feature: “Users who checked out this item also checked out these library items…” Nice!

You’ve also gotta love the experimental card catalog images.

Who can see which parts of my published surface area?

To describe the various projections of ourselves into cyberspace, I use the following metaphor: we’re cells, and we’re growing the surface area of our cellular membranes. Every time I write a blog item, or post a Flickr photo, or tag a resource in del.icio.us, I enlarge the surface area of that membrane. I do it for two reasons. First, because I want influence to flow from me to the world. Second, because I want influence to flow the other way too. I’m soliciting feedback and interaction.

I monitor that feedback using an array of sensors that works surprisingly well. All of the parts of my public membrane can be instrumented with RSS feeds. By tuning into those feeds, I know — fairly immediately and comprehensively — who has touched which parts of my exposed surface area.

What I can’t do very easily, though, is visualize that entire complex surface. If somebody reacts to something I published years ago on some site I’ve forgotten about, I’m reminded that part of my surface area extends to that site. But it’s only a reactive thing, there’s no proactive way to review the totality of my published corpus. That’d be handy.

It’d be even handier for the parts of my membrane that aren’t fully public. The lack of such a capability, in those cases, is what makes security so hard for people to manage.

For example, I’m still working through the implications of the calendar cross-publishing arrangement I’ve set up for myself. Consider my Outlook calendar. It’s shared within the company by virtue of a default policy that I can view, or modify, by right-clicking the calendar in Outlook and selecting Properties -> Permissions. But it’s also shared with my family by way of a private URL that I created on my WebDAV server and transmitted out-of-band. I can see that private URL by right-clicking and selecting Publish to Internet -> Change Publishing Options, but there’s no indication there of who I gave the private URL to.

This example is just one manifestation of a general problem that cuts across all systems and applications that enable people to selectively expose surface area. There’s no unified way to see and explore that surface area. You have to make a mental inventory of all of the bits that you’ve exposed, you have to individually review the scope of visibility for each bit, and then you have to synthesize a view of what’s visible from a variety of perspectives. Humans are lousy at this kind of thing, computers are good at it, but we haven’t figured out how to enlist the computers to help us to it.

We can at least dream about how a surface-area viewer would work. You’d point it at a blob representing you, and zoom in to resolve the various bits of exposed surface area. There’d be a viewer-impersonation knob that would start at Everyone but could be spun to any of the groups or individuals to which you’ve granted permissions using any of your diverse permission-granting services. You’d spin that knob from one setting to another, and fly around exploring who can cross various parts of the blob’s membrane and who can’t.

I know this would impractical for all sorts of reasons. But I still want it.

High-tech PR in the age of blogs, part 4

Yesterday I published the second installment of my new Microsoft Conversations podcast series. It’s a conversation with Marty Collins, senior marketing manager with the solution architecture group responsible for msdn.microsoft.com/architecture and skyscrapr.net. She wanted to interview me about the relationship between blogs and technical marketing, and I wanted to hear her thoughts on the same subject, so we wound up interviewing each other.

I gave Marty my take on how professionals — not only in the field of software, but also much more broadly — can and should use blogs to communicate their public agendas. And in response to her questions about how marketers can appropriately reach out to bloggers, I referred to the three-part series on high-tech PR in the age of blogs that I wrote back in 2002 and 2003. My bottom-line advice was and is: if you want to attract bloggers’ attention, point them to other bloggers who are authentic and credible. Three years ago that seemed like an exotic approach, but times have changed and it seems quite natural today.

Here’s an indication of how much times have changed. The folks that Marty markets to are solution architects, many of whom blog. On their blogs they raise questions, discuss options, and air concerns that intersect with her marketing agenda. What if her team of architects were able to monitor those conversations, and parachute in to respond where appropriate? That’s her plan. I think that it’s radical, will provoke controversy, but is ultimately clueful.

You could of course monitor those conversations using the existing suite of awareness tools: search, link aggregation, tag aggregation. But a new breed of power tools is emerging, and she’ll be using the ones provided by Visible Technologies.

Let’s think this through. I write a blog entry about problems with calendar sharing, as I did yesterday. It mentions products from Microsoft, Google, and Apple. Those companies watch my blog, and respond by injecting advice as comments on the blog entry. Am I shocked at this unwanted intrusion into my blog? Or am I grateful to receive useful information that I was lacking? Both outcomes can (and will) occur. These emerging brand awareness technologies will be abused by some marketers who will use it the wrong way. But there will also be right ways to use it.

Would-be abusers of this method will need to confront a couple of realities. First, if the comments you try to inject don’t add value to the conversation, most bloggers will just deflect them as spam. If for some reason they can’t or don’t, your comments will become part of a public record that’s easily discoverable and will undermine the reputation you were trying to enhance.

The incentives, and the checks and balances, are in place to do this right. I’ll be fascinated to watch this evolve.

Update: Although I’ve presented Marty’s plan as a form of marketing, which it is, I neglected to add that she also sees the underlying technique as a form of customer service. Today, for example, an A-list blogger like Dave Winer need only mention a problem with a product or service, and many voices — including maybe the provider of that product or service — will chime in to help. What if that ability to draw a response were democratized? What if any blogger could simply mention a problem with a Dell computer, and have a Dell support person notice and chime in right there on the blog with a solution? Very cool idea.

Calendar cross-publishing concepts

A growing number of people will be keeping their work calendars in Microsoft Outlook and their family calendars elsewhere, say on Google. How can they cross-publish their work and family calendars? It’s possible, but the conceptual hurdles are formidable. Here’s what I found when I did the experiment.

I started by subscribing a work calendar in Outlook 2007 to a family calendar on Google. From one perspective it was easy and just worked. From another perspective, I marvel at the amount of tacit knowledge required for me to have that experience. You start here:

The Manage calendars link in Google Calendar leads to the second tab of this four-tab screen:

The Shared: Edit settings link goes to the second tab of a two-tab screen:

This tab is for sharing within the Google realm, but that’s not the scenario I’m going for. It shouldn’t be necessary to sign into Google to see a family calendar hosted there, it should be possible to subscribe directly from Outlook 2007. And you can, but this is the wrong place to do it. You have to switch over to the first tab:

The private URL is what we’re looking for. And in particular, the iCal flavor of the private URL. That’s what other calendar programs, including Outlook, can latch onto to subscribe to this calendar. The URL that Google produces starts with http:// and, when you plug it into Outlook 2007, bingo, there’s the family calendar nicely merged in with the work calendar. Easy, but only if you’ve mastered a whole bunch of concepts.

Now let’s go the other way. How can a family member using Google Calendar subscribe directly to a work-related Outlook calendar? Let’s start by right-clicking a work calendar in Outlook and selecting Publish to Internet:

Hmm. Office Online or WebDAV server? Let’s start with the former. Now the choices are:

This seems analogous to the Google Calendar situation, but only partly. The invited users option is indeed analogous to the Google counterpart. Here those invited users will need to have Windows Live IDs in order to use the sharing URL that’s produced. But the other choice doesn’t produce a private URL, it produces one that’s discoverable in Office Online. (Or, I guess, will be when calendar search is implemented there.) So while the sharing URL that’s produced in both these cases looks like a private URL — that is, it includes the kind of random character string that often functions as a password embedded in the URL — in this case you’d be wrong to use it as though it were a private URL.

What about WebDAV? Now we’re in geek territory. You have to know what a WebDAV server is, and have one at your disposal. (I do, but I’m an outlying data point.) When you publish your Outlook calendar to WebDAV and then try to subscribe from Google Calendar, you’ll fail if the calendar is secured with HTTP basic authentication. (However, Apple iCal will succeed in this case.) If you instead allow anonymous access to the WebDAV-hosted calendar it’ll work in Google Calendar, but only if you alter the sharing URL produced by Outlook, changing webcal:// to http://.

There’s just a whole lot going on here. Even though I’m familiar with all this stuff, it was a real struggle for me to understand what works (and why), as well as what doesn’t (and why not).

The other day I heard a phrase I like a lot: concept count. It refers to the number of distinct concepts people must hold in their heads in order to do things, and it’s closely related to my recent item on conceptual barriers. Let’s step back and inventory some of the concepts involved in this calendar cross-publishing scenario.

Subscription. There’s a fundamental distinction between static and dynamic exchange of calendar information. On the static side, we use words like import and export and snapshot. On the dynamic side, we use words like subscription and feed. Most people have direct experience with the former, but few as yet with the latter.

Subscription URL. If you do have direct experience of subscription, you’ll know that a subscription URL is a magical thing. It doesn’t point to a thing, it points to a process that always yields new things. But if you haven’t yet experienced live subscription, there’s a conceptual barrier. The difference between one kind of URL and the other is abstract. As in a related example about click-to-load versus right-click-to-save, the URL itself is an overloaded construct that people have to disambiguate.

Private URL. Lots of services nowadays use this technique. Yes, it’s only security by obscurity. Anyone who knows the URL has access, so it’s a weak form of protection, similar to but more convenient than a shared password. Shared passwords have their uses, and so do private URLs. But how many people realize that private URLs are analogous to shared passwords, can evaluate the security tradeoffs, and can recognize a private URL when they see one that’s been produced by Google Calendar or Amazon S3 or another service? The URL does not wear its intended purpose on its sleeve. Google and Amazon and others are beginning to define expectations that private URLs are unguessable but, as we see with the public calendar URLs at Office Online, an URL that looks unguessable may in fact be discoverable through search.

iCal. For ages we’ve had Internet calendaring software that uses protocols and formats associated with the word iCal, but the concept mostly hasn’t sunk in yet. Here’s one reason why:

Various iCal URI schemes including webcal:, webcals:, http:, https:. In the course of this experiment I encountered examples of all of these. The concept that https means secure http may be fairly well understood at this point, though I’m not convinced of that. But I’m sure that webcal: isn’t yet, and that’s hardly surprising. In principle an alternate URI scheme is just the sort of cue that would help people understand the difference between a generic resource and a calendar resource. In practice it hasn’t worked out that way. Partly that’s because there are so few alternate URI schemes in use that people haven’t internalized what they can signify. And partly it’s because the webcal: scheme has been very haphazardly implemented. Some sharing services emit webcal: (or webcals:) URIs that must be transposed to http: or https: in order to work, and vice versa. If SSL is involved, there’s another conceptual mapping between webcals: and https:.

Authentication. A calendar-sharing URL may or may not require authentication. If it does, you may or may not be able to tell, by looking, what kind is needed. If the calendar URL comes from Google or Office Online, it makes sense that you’d use a Google account or a Windows Live ID. But what if it comes from a WebDAV server? There may or may not be a straightforward way to know (or find out) what kind of credentials that server will require.

Scope of visibility. I think this is the toughest concept of all. Blogs and video sites notwithstanding, sharing anything at all on the web will be a new thing for most people. Thinking through what will be visible to whom is non-trivial. On the corporate side, Outlook 2007 distinguishes between calendar sharing and Internet publishing. Each defines the bedrock concepts of user, group, and world in different ways. On the personal side, there are yet other ways to construe these fundamental scopes of visibility.

All this only scratches the surface. We could elaborate a whole lot more of these conceptual underpinnings. Bottom-line: support for standards is necessary but not sufficient. Even when products comply with standards like iCal, people struggle mightily to use those products interoperably. It’s the conceptual barriers that get in their way. It’s really hard to figure out how a concept expressed in one system maps to the same (or a similar) concept in another system. To make that easier, technology providers will have to agree on more than just protocols and file formats. We’ll also have to work together to minimize conceptual clutter and normalize core concepts.

The persistent blogosphere

In response to last Friday’s podcast with Tony Hammond about publishing for posterity, David Magda wrote to point out that our main topic of discussion — the DOI (digital object identifier) system — is one implementation of the CNRI (Corporation for National Research Initiatives) Handle System but there are others, including DSpace. I wondered whether this class of software might work its way into the realm of mainstream blogging. David responded:

A weblog (or web pages in general) are simply a collection of text, link, pictures. This is no different than any other document / object / entity that Dspace would handle. It’d simply be another type of CMS IMHO. I think this would be a really good project to implement for an undergrad thesis, or perhaps as part of a master’s thesis.

However as neat as all this is, I don’t think it would be implemented soon: or at least not in mainstream software. Few people will care whether their MySpace page survives over the aeons (and many people don’t want their kids to know what they did twenty years in the past).

But some of us do, and more of us will. The other day, for example, my daughter walked into my office while I was in the middle of a purge. Among the items destined for the recycling bin was a pile of InfoWorld magazines.

She: You’re throwing all these out?

Me: No, I’m keeping a few of my favorites. But as for the rest, I don’t have the space, and anyway it’s all on the web.

She: Don’t you want your grandkids to be able to see what you did?

Heh. She had me there. A pile of magazines sitting on a shelf is almost certainly a more reliable long-term archive than a website running on any current content management system.

Here’s another example. Back in 2002 I cited an essay by Ray Ozzie that appeared on what was then his blog, at ozzie.net. But if you follow the link I cited today, you’ll land on the home page of the latest incarnation of Ray’s blog. The original essay is still available, but to find it you have to do something like this:

My Blog v1 & v2 -> stories -> Why?

So OK, the web rots, get over it, we should all accept that, right?

Well, libraries and academic publishers don’t accept that. Nothing lasts forever, but they’re building content management systems that are far more durable and resilient than any of the current blogging systems.

Conventional wisdom says that it wouldn’t make sense to make blogging systems similarly durable and resilient, for two reasons. First, because the investment would be too costly. Second, because blogs aren’t meant to last anyway, they’re just throwaway content.

The first point is well taken. As Tony Hammond points out in our podcast, the cost isn’t just software. Even when that’s free, infrastructure and governance are costly.

But I violently disagree with the second point. Just because most blog entries aren’t written for posterity doesn’t mean that many can’t be or shouldn’t be. My view is that blogs are becoming our resumes, our digital portfolios, our public identities. We’re already forced to think long-term about the consequences of what we put into those public portfolios because, though no real persistence infrastructure exists, stuff does tend to hang around. And if it’s going to be remembered, it should be remembered properly.

So a logical next step, and a business opportunity for someone, is to provide real persistence. This service likely won’t emerge in the context of enterprise blogging, because enterprises nowadays are more focused on the flip side of document retention: forgetting rather than remembering. Instead it’s a service that individuals will pay for, to ensure that the public record they write will persist across a series of employers and content management systems.