A conversation with Barry Ribbeck about digital identity in higher education

I met Barry Ribbeck, who’s Director of Systems Architecture and Infrastructure at Rice University, a few years ago at a Dartmouth conference on the deployment of public key infrastructure (PKI) in higher education. I attended that conference several times as an observer, and wrote a couple of InfoWorld columns about it. For today’s podcast I invited Barry to reflect on what’s been happening with token-based authentication, PKI, and identity federation in the realm of higher education.

Near the beginning of our conversation I mentioned that people are spooked by the Real ID initiative, and Barry offered a great perspective. We already have a national — indeed, international — federation of machine-readable identity documents. It’s called the ATM network, and we all use it routinely.

For years, people like Barry Ribbeck have been working toward the same kind of ubiquitous deployment of smartcards and digital certificates. It’s been slow going, and still is, but these folks have a long-term vision and the patience and determination to make it real.

That word “administrator”: I do not think it means what you think it means

As I go back and forth between Vista and OS X, I’ve been trying to map out the similarities and differences of their respective security models.

On both systems you can be either the administrator or a standard user, but you are never the fully-privileged root, or superuser.

When you want to change a secure setting (like the firewall), or install an application, you have to temporarily elevate your privileges.

On a default OS X system, the administrator can write a secure file or alter a secure setting without being prompted. A standard user who tries to do these things is prompted for an administrator’s name and password.

On my OS X system, as administrator, I’m prompted for name/password even to change a secure system setting, because I’ve checked the Require password to unlock each secure system preference option. Because I’d forgetten that I’d done that, the Apple ads dinging Vista for its chatty security prompts initially made no sense. From my perspective OS X was chattier than Vista.

On a default Vista system, the administrator and standard users are both prompted, but in different ways. For the administrator it’s a click-through dialog, for the standard user it requires (as on OS X) an admin’s name and password.

On my Vista system I’d prefer to mimic the OS X behavior and require a full name/password challenge. I believe that’s possible using the Local Security Policy editor but in my case, since my system is part of a managed domain, I might not be able to make that change myself.

Another thing that initially made no sense to me was that the account on my freshly-installed Vista system came up as an administrator, not as a standard user. That’s because I’d made a faulty conceptual mapping between XP and Vista. On XP, you can try to implement the old Unix best practice of creating and mostly running as a standard user, reserving the root account for occasional privilege elevation. That strategy rarely works, though, and I had initially thought that Vista’s User Account Control (UAC) system was a way to remove the obstacles that prevent it from working.

In fact Vista’s model is less like Unix or XP, where root and administrator mean basically the same thing, and more like OS X where they mean different things.

It would be extremely helpful to me, and I’m sure to many others, to see a comparative chart of exactly what those meanings are. If someone can point to one, that’d be great, because there’s been some confusing semantic drift. That word ‘administrator’: I do not think it means what you think it means.

Despite the separation of root and administrator, the old best practice of relinquishing the administrative account remains available on both OS X and Vista. Given that it’s not the default on either system this is mostly an academic question, but does anyone think that it should still be a best practice? If so, why?

On interesting data point comes from a recent interview in which Charles Torre speaks with UAC gurus Jon Schwartz and Chris Corio. (If, like me, you don’t have 65 minutes of viewing time but do have 65 minutes of listening time, you can find just the audio here.) Towards the end of the interview Jon Schwartz mentions that he considered, and rejected, the idea of setting up his parents’ machine so they’d only be able to log in to user accounts.

Because I’ve lived through the evolution of all this stuff, I still feel a twinge of guilt for running as administrator on both OS X and Vista. But most people never knew why that might be a problem, and now it’s water under the bridge — with one huge exception. There are hordes of people on XP today who will be there for years to come. So while it’s difficult to use standard accounts routinely on XP, anything that can be done to make that strategy more viable will be a huge benefit to everyone.

The People and Information Monitor

I’m in the midst of installing a big honking piece of software over a cable modem connection, so it’s taking a while. In order to explore exactly what is taking a while, I’ve been checking out Vista’s Reliability and Performance Monitor. Under the covers it’s still good old perfmon.exe, a system monitoring tool that had me at hello, way back whenever I first saw it, in whatever version of NT that was, maybe the first one.

The name change in Vista refers to a new feature which is mentioned in my podcast with Partha Sundaram about software instrumentation. In addition to all the low-level counters for disk, memory, CPU, and network activity, there’s a view that summarizes the stability of your system and correlates it with application-level events. Last week, for example, I was using a beta third-party application that crashed a half-dozen times. That made the system’s overall reliability dip down in the summary view, and the details reveal why.

In Vista, the default views of the traditional low-level counters are more comprehensive than in XP or in Server 2003. Everything’s correlated to the process that are running, and the files they’re reading or writing. You could learn a whole lot about the internals of Vista by just leaving perfmon running on your second monitor while going about your business.

There was a time when I would have found that mesmerizing. Part of me still does. But most of me cares more about the people I’m communicating with, and the information I’m producing and consuming. And of course the vast majority of people who use personal computers care only about those things.

So as I watch the Reliability and Performance Monitor xray the guts of my system, I’m imagining what it would be like to have an equally capable People and Information Monitor to xray my activities in the infosphere.

It sort of exists, but in a fragmentary way. The Recent Changes view in Vista’s desktop search, which I mentioned the other day, is a step in that direction. It can see into multiple local data siloes — the file system, email, calendars.

Then there all my siloes in the cloud: my blog, various other online services. To the extent these offer RSS feeds I can begin to aggregate them. But there’s no way to really correlate my interactions with people and information across those services, never mind across the desktop/cloud chasm.

Such a thing is conceivable, though. A desktop operating system could monitor the union of local events and network events, could correlate the names and addresses of people and items of information, and could offer visualization and analysis in the realm of people and information rather than CPUs and netcards.

Maybe one person in a hundred, or in a thousand, will ever appreciate a sexy low-level Reliability and Performance Monitor. But a People and Information Monitor? Everybody needs one of those.

Sharing knowledge on the web

Joe Gregorio posted a gem the other day. It’s a little tutorial on how to model a common operation on the web — validating zipcodes — using the principles of the REST architectural style. Along the way, almost certainly without intending to, he taught me some things about the Python programming language that I hadn’t known.

Joe’s example uses two features of Python — memory-mapped files and array bisection — to speed up the search for a zipcode in a sorted file of zipcodes. But you don’t need to know anything about REST or Python to appreciate the aspect of Joe’s example I want to highlight here, which is that when we narrate our work on the web, we may convey more value than we know or intend.

The purpose of Joe’s posting was to show how to apply a recipe for RESTful design, and it accomplishes that nicely. In doing so, Joe is helping to articulate principles that are widely practiced but not always well understood. By reflecting on his knowledge of those principles, by writing them down, and by sharing that writing, Joe makes that knowledge available to the rest of us.

Along the way, other useful things happen. In the dialectic that emerges in the comments section, Richard Searle proposes — and Joe agrees — that the word originally chosen to invoke the validator, lookup, is too verb-like. The recipe calls for nouns, and so the word becomes zipcode instead.

Why did Joe choose lookup initially? Knowledge is imperfect. When we externalize what we know, we can observe and discuss and correct those imperfections. That’s one of the subtle benefits that flow from externalizing knowledge in public performance.

Another is the one I’ve already mentioned. Although I doubt Joe meant to teach me about memory-mapped files and array bisection in Python, he did anyway, as a happy side effect.

When the blogosphere works this way, as it often does, it exemplifies the best qualities of professional discourse. I wish I could show more people how this works. But it’s hard to abstract away from the knowledge domain of this example — RESTful design and Python programming — to general principles that can apply in any knowledge domain.

In the technical blogosphere, we have an almost perfect confluence of factors. Almost everything related to the work of software development — both products (source code) and processes (specifications, conversations) — is a text document that can flow easily and naturally on the web. And our examples are often self-reflexive — we use the web to illustrate work that is about the web itself.

This way of externalizing knowledge in public performance doesn’t translate so easily to other domains, at least not yet. I think that’ll change, though, as all work products and work processes tend toward digital representations. And I think that rich media will play a huge role in that change. Programming is fundamentally a textual craft, as are others, but many are not. If you’re a builder or a firefighter or a pilot, the most effective medium in which to publicly perform your knowledge won’t be text, it’ll be video.

Suppose you’re a builder, firefighter, or pilot who wants to share (and clarify) your knowledge of green construction, rescue operations, or cockpit instrumentation. It’s admittedly a stretch to imagine that, just as Joe Gregorio posted a textual blog entry in order share his knowledge of RESTful design, you’ll post a video in order to to share your knowledge in these areas. But I hope you will imagine it.

Carl Malamud to Brian Lamb: “You should not treat the U.S. Congress like Disney would treat Mickey Mouse”

When I posted a video clip of Hillary Clinton’s talk at the Keene High School, which I’d TiVo’d from our community access cable station, I wasn’t entirely sure it was OK to do that. But when I asked Lee Perkins who runs Cheshire TV he said absolutely, go for it.

The following week I was puzzled by a New York Times primer on which C-Span videos can, or cannot, be excerpted and reposted. Apparently only the “5 to 15 percent” of C-Span’s programming that’s from the House and Senate floor is considered to be in the public domain. Here was C-Span VP and general counsel Bruce Collins’ explanation:

What I think a lot of people don’t understand — C-Span is a business, just like CNN is,” Mr. Collins said. “If we don’t have a revenue stream, we wouldn’t have six crews ready to cover Congressional hearings.

I wondered about that, but lacked context. Now Carl Malamud has provided the missing context. In a stunning letter to C-Span’s president and CEO Brian Lamb, which includes the above quote, Malamud points out that C-Span is supported not only by its revenues operating as a nonprofit business, but also by “considerable public largesse.” Taxpayers, Malamud argues, are footing the bill for much of the facilities, wiring, and equipment that enable C-Span’s camera crews to do their work.

Malamud concludes:

I thus write to you today with a specific request and a notice:

  1. Your inventory shows 6,251 videos of congressional hearings for sale in the C-SPAN store at an average price of $169.50, for a total retail value of approximately $1,059,544. I am offering today to purchase this collection of discs from you for the purpose of ripping and posting on the Internet in a nonproprietary format for reuse by anybody. I understand your store would take a while to process such an order and am willing to place it in stages.
  2. I have purchased Disc 192720-1 from the C-SPAN store, ripped more than one minute of video from the disc, and used it for the creation of a news and satirical commentary of compelling public interest and then posted the resulting work at the Internet Archive. I did not ask C-SPAN for a license and I assert fair use of this material.

Mr. Lamb, C-SPAN has been a pioneer in promoting a more open government. You
created a grand bargain with the Cable Industry and the U.S. Congress. When I
created the first radio station on the Internet and was asked why I did so as a non-profit instead of going for the gold like many of my colleagues, my reply has always been that I was inspired by your example.

Your grand bargain has served the American people and the C-SPAN organization well. Holding congressional hearings hostage is not in keeping with your charter, and it is not in keeping with the spirit of that grand bargain you made with the American people. Please re-release this material back into the public domain where it came from so that it will continue to make our public civic life richer.

Wow.

A conversation with Terry Swack about design, green construction, and the business of sustainability

This week’s podcast is a conversation with Terry Swack. She’s a graphic designer, Internet strategist, and serial entrepeneur. In recent years she has focused on helping businesses use the Internet to respond to the growing demand for environmentally sustainable products and services.

One of her projects is Green Building Blocks, a directory of “green” design and building professionals. She’s about to launch Clean Culture, a “customer experience strategy firm” that will help companies explain how they’re advancing the cause of sustainability.

How do you get from graphic design to green construction and clean energy? By following your intuitions, and always learning and doing new things. In the end, everything’s connected.

A letter to the editor about Real ID

Yesterday my local newspaper ran an editorial entitled Death to Real ID. That link will turn into a pumpkin in five days, but here’s the intro:

Although the Bush administration today is announcing possible delays in the Real ID program, it’s beginning to look as if New Hampshire could play a role in killing the thing outright. That would be a welcome development.

Real ID, passed by Congress in 2005, is designed to turn state drivers licenses into “electronically readable” national identity cards. As the law now stands, beginning on May 11, 2008, Americans will be required to show the cards before they board airplanes, open bank accounts, collect Social Security payments or receive almost any other government service.

And of course it won’t be long before every huckster and propane salesman in the country will be demanding to examine your Real ID card along with your Social Security number before doing business with you.

Now I rather enjoy New Hampshire’s “Live Free or Die” state motto, and I’m not an uncritical supporter of Real ID, but in the US as a whole, and in New Hampshire in particular, it’s hard to even have a discussion about digital identity and I think that’s a shame.

My letter to the editor, below, does not argue for Real ID. It’s just an effort to avoid foreclosing all discussion on the subject of digital identity. Is it effective? What other arguments would help?

To the editor:

At a moment in history when the President of the United States is asserting that the government has the right to intercept phone calls and emails without a warrant, it’s a good idea to raise the totalitarianism alert level from orange to red. But Real ID isn’t a black and white, or green and red, issue. The Sentinel’s March 1 red flag (“Death to Real ID”) fails to address, or even acknowledge, the complex and evolving story of digital identity.

Real ID, we’re told, “is designed to turn state drivers licenses into ‘electronically readable’ national identity cards” that we’re required to show before boarding planes or accessing bank accounts.

That’s true.

Today, by contrast, our drivers licenses are electronically unreadable national identity cards that we’re required to show in all the same circumstances.

That’s better how?

It’s fascinating to compare our national stance on identity cards, epitomized by New Hampshire’s state motto, with that of other countries. Last fall, at the 40th International Council for Information Technology in Government Administration, I met the guy who runs Belgium’s national ID card program. Belgians are receiving these cards at the rate of 10,000 a month, and will all have them by 2009.

There’s also a youth version of the eID. When Belgian children turn 12, they’ll receive a smartcard and a reader from the government. Americans would regard this program as an Orwellian intrusion. For Belgians, it’s a way to help protect kids without compromising their privacy.

One of the first uses of the youth eIDs will be to prove age to age-restricted web sites. There’s no technical requirement to disclose identity, and a strong cultural preference not to. Kids will need only prove (by knowing the card’s PIN) that they are citizens, and prove (by selectively disclosing their birth date) that they meet the age requirement.

Selective disclosure is one of the privacy-enhancing features that electronic ID cards, unlike regular cards, can offer. When you show your drivers license at the liquor store, for example, all the clerk really needs to know is your birth date. An electronic card can be configured to disclose only that fact, and none of your other personal information.

Phil Windley, who was CIO of Utah and is the author of a leading book on the subject of digital identity, said this in an interview with me last year:

“If you talk to people from a number of countries in Europe, they would just laugh at the idea that we don’t have a national ID. But they would be scared to death of the fact that we don’t have strong privacy laws.”

The issues surrounding digital identity are complex and subtle, but they’re not going away. When the Sentinel reduces those issues to “totalitarianism” and “police-state claptrap” it does readers a disservice.

Creating persistent search folders in Vista

I’ve been noodling around with search folders in Vista. The one that shows up by default in the shell’s Favorite Links panel, entitled Recently Changed, is of particular interest. Just like the Recent Changes page in a wiki, it’s a nice way to monitor activity in a dynamic system that’s always accreting new stuff.

Good news: The Recently Changed folder is governed by an XML file, in the Searches subdirectory of the home directory, called Recently Changed.search-ms:

<?xml version="1.0"?>
<persistedQuery version="1.0">

<viewInfo viewMode="details" iconSize="16">
  <sortList>
    <sort viewField="System.DateModified" direction="descending"/>
  </sortList>
</viewInfo>

<query>
  <conditions>
    <condition type="leafCondition" valuetype="System.StructuredQueryType.DateTime" 
    property="System.DateModified" operator="imp" 
    value="R00UUUUUUUUZZXD-30NU" propertyType="wstr" />
  </conditions>

<kindList>
  <kind name="document"/>
  <kind name="picture"/>
  <kind name="music"/>
  <kind name="movie"/>
  <kind name="video"/>
  <kind name="note"/>
  <kind name="journal"/>
  <kind name="email"/>
</kindList>

<subQueries>
  <subQuery knownSearch="{4f800859-0bd6-4e63-bbdc-38d3b616ca48}"/>
</subQueries>

</query>
</persistedQuery>

Bad news: I can’t figure out to write my own queries. value="R00UUUUUUUUZZXD-30NU"? What’s up with that? I guess this relates in some way to the advanced query syntax for Windows desktop search. But I can’t find any examples that look like this:

<conditions>
  <condition type="leafCondition" valuetype="System.StructuredQueryType.DateTime" 
  property="System.DateModified" operator="gt" value="date:yesterday" />
</conditions>

There must be documentation for this somewhere, but at the moment there are very few hits in any of the search engines for the query: vista persistedquery. That’s a shame. I know that advanced search doesn’t appeal to the masses, but sure does appeal to me.

A conversation with Partha Sundaram about software instrumentation

I’m deeply fascinated by software instrumentation in all its varieties. When Scott Dart told me that he had hard data on how many people are using the tagging features in Photo Gallery, I wanted to know how. The answer is SQM, which is pronounced “squim” and which expands to Software Quality Metrics.

According to Partha Sundaram, my guest for today’s podcast, SQM was formerly used on a per-application basis, but is now, in Vista, also a piece of core infrastructure that can be used to analyze how the operating system itself is being used in the field. He reviews the current use of SQM in Vista, and some future goals for the technology.

One of those goals is to make it more obvious, to customers who have given consent to the anonymized and aggregated collection of their data, what’s been learned from that data, and how that knowledge has been used to improve the software.

I wondered if SQM might also be a way for people to monitor and analyze their own use of Windows-based software, perhaps by collecting and sharing more information than Microsoft’s privacy policy would otherwise allow. The example that particularly interests me is tag management. On del.icio.us, for example, I could in principle review the evolution of my own tag vocabulary — when new tags appeared, when tags were renamed — and could (again, in principle) allow that data to be pooled with other peoples’ data for aggregate analysis.

That scenario is outside SQM’s scope, though, Partha says, and would require a style of data collection that’s way more granular than what SQM is designed for. You could potentially use SQM to find out how often tags are renamed — in itself an interesting question — but not what the tags were changed from or to.

Privacy advocates will probably be relieved to know that. And indeed the whole idea of user-defined instrumentation might seem rather esoteric. But I’ll argue that it really isn’t. In my talk with Mary Czerwinski, for example, Mary noted that by using another internal logging tool she found that she’d been spending almost two-thirds of her time in email, she resolved to change that, and she succeeded. For the many people who subscribe to the Getting Things Done methodology, it would be a boon to be able to ask and answer questions about personal habits of communication and information management.

To that end, that you’d want to have a system-wide framework with which to define meaningful events and analyze them. Of course you couldn’t just watch events rattling around within Windows. You’d also want to insert a probe into your network connection so that you could watch, and correlate, events traveling across HTTP, SMTP, and other Internet connections.

Given privacy concerns, this whole notion would be a tough sell to say the least. But you can’t improve what you can’t measure. If we want to make software better, we’ll need more and better software instrumentation.

The digital darkroom revealed


Today while editing a podcast I stopped to record a bit of the on-screen action. I’ve written before about the audio editing techniques used by the NPR pros make conversations sound clear and intelligible. I use the same methods on my podcasts, and I’ve been meaning to show it. Today’s two-and-a-half-minute screencast gives you a good idea how it works.

In this short example, I’m talking to Partha Sundaram about something called SQM (pronounced ‘squim’). In the original version we both talk over each other a bit, and I repeat myself. In the final version each voice stands alone and the needless repetition is gone.

You don’t need fancy editing software to do this. Although I’m using Audition in this demo, I’ve done the same kind of thing quite often in Audacity.

You do, however, need to put the voices onto separate channels. When it comes to telephone recording, I am a disciple of Doug Kaye and I use the gadget he recommends, the Telos One, to split the caller and callee onto left and right stereo channels. At $600 the Telos box clearly isn’t for everyone, though, so I’d be interested to hear about a more accessible way to achieve channel separation.

As I mention in the screencast, it’s tedious to do this kind of editing. But it can go pretty fast once you get the hang of it. Since I review my podcasts anyway before publishing them, I’ve decided it’s worth the trouble to make them as clean and intelligible as I can quickly manage. Just like the pros do.

Or do they? I was driving home with my son last night, listening to Fresh Air — a great episode in which Terry Gross interviews Ira Glass about the new TV version of This American Life — and we were both struck by the absence of internal editing. When my son heard this bit — an extreme but not atypical example of the kind of verbal redundancy we heard throughout the show — he burst out laughing. I just found it puzzling. Is internal editing done only for certain shows and not for others? What rules govern when it is or isn’t done?

Two-way public media

By the time Dave Winer asked me to listen to his talk at the recent Public Media conference, it was too late — I’d already heard it on a drive to the airport Saturday morning. It is an inspired (and inspiring) discussion of what can and should happen at the intersection of podcasting, public media, and democracy.

Of the many themes that resonate powerfully with me, the one I want to focus on here is the idea of two-way public media. That can mean a few different things. One of the meanings Dave highlights is that the device which downloads and plays podcasts can also record and upload them, without being tethered to a computer.

Another meaning is that listeners become programmers, or rather, deejays, or better yet, webjays. Back in 2004 I became fascinated with Lucas Gonze’s webjay.org, which hasn’t changed much since then, or since its Jan 2006 acquisition by Yahoo.

In a December 2004 podcast, one of my first — and one of only a few that’s in the style of a story rather than an interview — I imagined how Napster-like collaborative recommendation would play out in a post-Napster world, thanks to the proliferation of free and legally-shareable audio, and to services like Webjay that would encourage and assist with annotation and remixing.

Things haven’t really turned out that way, at least not yet. Webjay remains a boutique offering. The dominant Internet audio applications — iTunes and Windows Media Player — optimize for consumption of commercial audio, not for production of free and legally-shareable audio. And by production I don’t mean just the ability to record and upload. What matters as much or more is the ability to annotate, curate, and share.

In the textual blogosphere we’re all webjays. When we find good stuff, it’s natural and straightforward to collect it, comment on it, and stream our remixed versions of it back to the blogosophere.

It’s nothing like that in the realm of Internet audio. Like others, I’m disappointed that Windows Media Player 11 does not support podcatching — I shouldn’t need to install iTunes on a Windows box to acquire that capability — but that’s a tangential point. More importantly, neither iTunes nor WMP11 invites me to annotate, curate, and share.

That’s largely because, as Dave correctly notes in his talk, the design center for these Internet audio platforms is music, and in particular, commercial music. There’s a very different kind of Internet audio, the kind that NPR mostly is, that ITConversations is, that my Friday podcast series is: commentary and discourse.

What could the mainstream platforms do to support this kind of Internet audio? They could offer blog this features. They could make quotation of audio as easy and natural as quotation of text. They could embrace and popularize the Webjay idea of shareable, remixable playlists.

For most people, of course, playlists are about music. When I made my open source audio podcast back in 2004 I was inspired by my favorite webjay, Oddio Katya, whose wonderful monthly Tunes in Overplay playlists have happily resurfaced after a long hiatus. A couple of years ago I imagined that, by now, there would be a flock of webjays like Katya, and that they would be helping me tap into the growing reservoir of free and legally-shareable music.

But iTunes and WMP11 aren’t designed to inspire future Oddio Katyas to curate and share what is freely available. And while I wish things were otherwise, I can understand why they’re not. The music business and the technology business are intertwined, and it is in neither party’s interest to facilitate the kinds of alternatives that the Internet invites and enables.

As Dave points out in his talk, though, there’s no such conflict of interest in the realm of public discourse. So there’s no reason why iTunes and Windows Media Player shouldn’t make it easy for me to create a playlist of his talk, and one of my own public radio commentaries, and a PopTech lecture from ITConversations, and a Long Now talk, and a LibriVox reading, and a Berkeley lecture, and then share that playlist on the web — where I can watch it morph as other people mix in related material that I wouldn’t have discovered on my own.

It’s possible today to treat public-interest audio and video as two-way media in this sense, but you have to be a bit of a propeller-head. I’d love to see that capability democratized, and I’d love to see my team take the lead in making it happen.

A conversation with Steve Vinoski about services, the enterprise, and the web

From an undisclosed location somewhere on the east coast, middleware maven Steve Vinoski joins me for this week’s Friday podcast. Earlier this month Steve announced that he was leaving IONA to join a stealth-mode startup. He can’t discuss his new job yet, but I took this opportunity to ask him to review his long career working with distributed systems and reflect on lessons learned.

It’s a timely conversation because Steve was originally slated to represent IONA at next week’s W3C Workshop on Web of Services for Enterprise Computing. I’d hoped to attend that conference as well, but all attendees have to present position papers, and I’d be the wrong guy to represent Microsoft’s position. So instead I asked Steve what he would have said there, and I chimed in with some things that I would have said, and we both had a lot of fun.

Screencasting tips

Yesterday’s screencast turned out to be a nice example of how the screencasting medium can communicate what otherwise cannot be explained easily, if at all. Here’s the kind of reaction you hope a screencast will elicit:

I checked out the Photo Gallery earlier, but didn’t see the added value. Now I do.

It’s hard to quantify the impact of a timely and well-produced screencast, but my gut tells me that Simon Willison’s outstanding effort, How to use OpenID, has more than a little to do with the momentum now building around OpenID.

I’ve written before about how to make screencasts that communicate effectively, and I’ll be updating those observations from time to time because it’s an evolving story.

One of my goals is to help folks inside Microsoft use this medium more effectively. Another is to help everyone else do so, because there’s a major obstacle in the way of my vision of the future of software and networks: Much of the value and capability of this stuff is unappreciated by most people.

In trying to understand why, I’ve settled on what I call the “ape with a termite stick” argument. If you’ve heard it before, skip ahead. If not, it goes like this. People learn to use tools by watching how other people use them, and imitating what they see. Observation is the key. Suppose apes had language, and the discoverer of the termite stick could explain to the tribe:

“So, you find a stick about yea long, and strip off the bark so it’s sticky, and poke it into the hole, and presto, it comes up bristling with yummy ants.”

Some of the other apes might get it, but most of them wouldn’t. On the other hand, any ape who could observe this technique would get it immediately, and never forget it.

Given all the network connectivity that we have nowadays, it’s perhaps surprising — but nevertheless true — that we have few opportunities to directly observe how other people, who are proficient users of software tools, do what they do. Screencasts are the best way I’ve found to make such tool use observable, and thus learnable.

Enough theory. When you get down to brass tacks and try to capture those “aha” moments, it’s easier said than done for a bunch of reasons. In the case of this particular screencast, I just want to point out three things.

Focus.

I always ask presenters to size the application window (or windows) to something like 800 by 600. That’s partly to minimize the quantity of video that has to be delivered, which continues to matter because broadband isn’t yet where it needs to be. But equally, it’s a way to focus on the real action. In the case of the Photo Gallery screencast, for example, I cropped away the window chrome because nothing was going on there. It’s a subtle and subliminal thing but, when you eliminate the uninteresting and uninformative, the interesting and informative aspects of what remains will emerge more clearly.

With some screencasting tools, including the one I mostly use, Camtasia, it’s also possible to also pan and zoom in order to focus even more precisely. I haven’t used that feature, yet, because I’m usually pressed for time and the basic kinds of editing that I do are already time-consuming. But I do want to add this technique to my repertoire, and use it in selective and appropriate ways.

Editing is crucial. The raw capture for yesterday’s screencast was 30 minutes. It included some false starts, some extraneous material, and a fair bit of verbal stuttering on the part of both Scott and myself. When we finished the capture, I wasn’t sure we even had anything that would be usable. But as I trimmed away the clutter, a reasonably clear storyline emerged.

Even the 14-minute version will, of course, be too long for many people. One solution would be to divide the material into chapters. But since none of those would work well standalone, a better solution might be to make an elevator-pitch version that tells the same story in just 3 to 5 minutes. I’d want that version to complement the 14-minute version, though, not replace it.

Interactivity.

Almost all the screencasts that I’ve seen, and many that I’ve made, are solo efforts. But I also love to do interview-style screencasts, and the Photo Gallery screencast is an example of that genre. When it works well, as I think it did in this case, the interaction between the interviewer and the presenter can help the presenter — who in some ways knows the subject too well — recognize what’s not obvious to viewers and adapt accordingly.

As an aside, I should mention that although we made this screencast remotely — Scott was in Redmond and I was in my home office in New Hampshire — we used a technique that was new for me. Normally I record screens projected to my computer using a screensharing application. In this case, because of all the images in the presentation, that didn’t work well. The projection couldn’t keep up. So I had Scott record his screen on his end, while I recorded the audio on my end. It worked great. I was able to follow the visual action well enough on my end, Scott captured a high-quality video which he later posted for me to download, and it was straightforward for me to marry up his video track with my audio track.

Show, don’t tell.

The “aha” moment, if there is one, speaks for itself. When the ape can see that termite stick bristling with ants, there is no need for someone to say: “This is a really cool benefit.” It’s just obvious.

In our session, Scott was actually quite restrained. But there were a few places where he made editorial comments like “this is really convenient” or “this is a great benefit”. I took them out. If I could give only one piece of advice to technical marketers everywhere, it would be this: Show me, don’t tell me.

Tagging and foldering in Photo Gallery

In this 14-minute screencast I interview Scott Dart, who blogs here, about how tagging works in Vista’s Photo Gallery. I wanted to look over Scott’s shoulder, rather than do this myself with my own photos, because Scott’s been managing a lot of photos in this app for a long time, and he’s in a position to reflect on the evolution of his tag vocabulary.

The metadata storage strategies discussed here lately are just plumbing. What you see in this screencast is the payoff: An application that will be, for many people, the first experience of a style of personal information management that relies on tagging and search as much as, or more than, on folders and navigation.

Conventional wisdom was that people could never be bothered to invest effort in tagging their stuff. What del.icio.us and then Flickr and then a host of other web applications showed is that people will invest that effort if the activation threshold is low and the reward is immediate. On the web, the rewards are both personal (I can more easily find my photos) and broadly social (I can interact not only with friends and family but with like-minded photographers everywhere). On the desktop, the rewards will mainly be personal and more narrowly social (friends and family), though if photos can bring their tags with them when they travel to the cloud, the broader social rewards become available too.

One of the fascinating threads in this screencast is the interplay between foldering and tagging. In principle you don’t need a folder hierarchy rooted in the file system, and doing away with it entirely would reduce the concept count. In practice that’s not yet possible, if only because cameras don’t produce endless streams of uniquely-identified files. When DSCF0004.JPG rolls around again, you have to put it into a different file-system folder than the last time.

It’s too bad, really, because those file-system folders serve little other purpose. They’re conceptual clutter that obstructs your view of tagging, and of tag-oriented search and navigation, which is where all the action really wants to be.

A further complication is that, unlike most of the popular tag systems on the web, tagging in Photo Gallery is hierarchical. You don’t have to use it that way, you could keep a flat list of tags, but the system invites you to nest your tags in a way that seems folderish but that has a magical property. The same thing — not a copy of the thing — can be in two or more places at once.

It’ll be fascinating to observe what people make of this. For example, that magical same-thing-in-two-places property may seem less magical to the majority of folks who don’t know what I know about directory structures on disks. I experience cognitive dissonance when I see a “real” file-system hierarchy and a “virtual” tag hierarchy living in the same navigational tree. But somebody who never had a strong sense of the difference between those two modes might not be bothered at all.

Are people actually using tags to organize and search for their photos? According to Scott, data from the opt-in software quality metrics (SQM) feature — which relays anonymized usage data to product teams for analysis — says that they are.

How private tag vocabularies develop, and what happens when they intersect with the web, are two processes that I’d love to be able to study over time. That raises an interesting question. Can I access that SQM usage data myself? Could groups of willing participants pool their data and do independent analyses of it? It’s our data, there’s no reason why not. Does anyone know how?

Who’s got the tag? Database truth versus file truth, part 3

I’ve recently been exploring the implications of the following mantra:

The truth is in the file.

In this context it refers to a strategy for managing metadata (e.g., tags) primarily in digital files (e.g., JPEG images, Word documents) and only secondarily in a database derived from those files.

Commenting on an entry that explores how Vista uses this technique for photo tags, Brian Dorsey throws down a warning flag:

Many applications are guilty of changing JPEGs [ed: RAW file, not JPEGs, are the issue, see below] behind the scenes and there is nothing forcing them to do it in compatible ways. Here is a recent example with Vista.

A cautionary tale, indeed. This is the kind of subject that doesn’t necessarily yield right and wrong answers. But we can at least put the various options on the table and discuss them.

There is an interesting comparison to be made, for example, between OS X and Vista. While researching this topic I found this Lifehacker article on a feature of OS X that I completely missed. You can tag a file in the GetInfo dialog, and when you do, the file will be instantly findable (by that tag) in SpotLight.

My purpose here is not to discuss or debate the OS X and Vista interfaces for tagging files and searching for tagged files. I do however want to explore the implications of two different strategies: “the truth is in the file” versus “the truth is in the database”.

In Vista, if I tag yellowflower.jpg with iris, that tag lives primarily in the file yellowflower.jpg and secondarily in a database. An advantage is that if I transfer that file to another operating system, or to a cloud-based service like Flickr, the effort I’ve invested in tagging that file is (or anyway can be) preserved. A disadvantage, as Brian points out, is that when different applications try to manage data that’s living inside JPEG files, my investment in tagging can be lost.

Conversely, if I tag yellowflower.jpg with iris in OS X, yellowflower.jpg is untouched, the tag only lives in Spotlight’s database. If I transfer the file elsewhere, my investment in tagging is lost. But on my own system, my tags are less vulnerable to corruption.

Arguably these are both valid strategies. The Vista way optimizes for cross-system interoperability and collaboration, while the OS X way optimizes for single-system consistency. Of course as always we’d really like to have the best of both worlds. Can we?

It’s a tough problem. Vista tries to help with consistency by offering APIs in the .NET Framework for manipulating photo metadata. But those APIs don’t yet cover all the image formats, and even if they did, there’s nothing to prevent developers from going around them and writing straight to the files.

For its part, OS X offers APIs for querying the Spotlight database. So an application that wanted to marry up images and their metadata could do so, but there’s no guarantee that a backup application or a Flickr uploader would do so.

It’s an interesting conundrum. Because I am mindful of the lively discussion over at Scoble’s place about what matters to people in the real world, though, I don’t want to leave this in the realm of technical arcana. There are real risks and benefits associated with each of these strategies. And while it’s true that people want things to Just Work, that means different things to different people.

If you’re an avid Flickr user, if you invest effort tagging photos in OS X, and if that effort is lost when you upload to Flickr, then OS X did not Just Work for you. Conversely if you don’t care about online photo sharing, if you invest effort tagging photos in Vista, and then another application corrupts your tags, then Vista did not Just Work for you.

I think many people would understand that explanation. In principle, both operating systems could frame the issue in exactly those terms, and could even offer a choice of strategy based on your preferred workstyle. In practice that’s problematic because people don’t really want choice, they want things to Just Work, and they’d like technology to divine what Just Work means to them, which it can’t. It’s also problematic because framing the choice requires a frank assessment of both risks and benefits, and no vendor wants to talk about risks.

I guess that in the end, both systems are going to have to bite the bullet and figure out how to Just Work for everybody.

Blogging from Word 2007, crossing the chasm

The other day I wrote:

…as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, I’m obviously all about metadata coexisting with human-readable HTML.

Operating in that mode for years has given me a deep understanding of how documents, and collections of documents, are also databases. It has led me to imagine and prototype a way of working with documents that’s deeply informed by that duality. But none of this is apparent to most people and, if it requires them to write semantic CSS tags in XHTML using emacs, it never will become apparent.

So it’s time to cross the chasm and find out how to make these effects happen for people in editors that they actually use. Here’s how I’m writing this entry:

This is the display you get when you connect Word 2007 to a blog publishing system, in my case WordPress, and when you use the technique shown in this screencast to minimize the ribbon.

Here’s a summary of the tradeoffs between my homegrown approach and the Word-to-WordPress system I’m using here:

method

pros

cons

My homegrown approach

  • Can use any text editor
  • Source is inherently web-ready
  • Easy to add create and use new semantic features
  • Low barrier to XML processing
  • Only for geeks

Word 2007

  • A powerful editor that anyone can use
  • Source is not inherently web-ready
  • Harder to create and use new semantic features
  • Higher barrier to XML processing

These are two extreme ends of a continuum, to be sure, but there aren’t many points in between. For example, I claim that if I substitute OpenOffice Writer for Word 2007 in the above chart, nothing changes. So I’m going to try to find a middle ground between the extremes.

To that end, I’m developing some Python code to help me wrangle Word’s default .docx format, which is a zip file containing the document in WordML and a bunch of other stuff. At the end of this entry you can see what I’ve got so far. I’m using this code to explore what kind of XML I can inject programmatically into a Word 2007 document, what kind comes back after a round trip through the application, how that XML relates to the HTML that gets published to WordPress, and which of these representations will be the canonical one that I’ll want to store and process.

So far my conclusion is that none of these representations will be the canonical one, and that I’ll need to find (or more likely create) a transform to and from the canonical representation where I’ll store and process all my stuff. We’ll see how it goes.

Meanwhile here’s one immediately useful result. The tagDocx method shown below parallels the picture-tagging example I showed last week. Here, the truth is also in the file. When you use the Vista explorer to tag a Word 2007 file, the tag gets shoved into one of XML subdocuments stored inside the document. But any application can read and write the tag. Watch.

Before:

Run this code:

$ python

import wordxml

wordxml.tagDocx(‘Blogging from Word2007.docx’,’word2007 blogging tagging’)

 

After:

Here’s why this might matter to me. In my current workflow, I manage my blog entries in an XML database (really just a file). I extract the tags from that XML and inject them into del.icio.us. That enables great things to happen. I can explore my own stuff in a tag-oriented way. And I can exploit the social dimension of del.icio.us to see how my stuff relates to other people’s stuff.

But in del.icio.us the truth is not in the file, it’s in a database that asserts things about the file — its location on the web, its tags. If I revise my tag vocabulary in del.icio.us, the new vocabulary will be out of synch with what’s in my XML archive. So I have to do those revisions in my archive. I can, and I do, but it’s all programmatic work, there’s no user interface to assist me.

What I’m discovering about Vista and the Office apps is that they offer a nice combination of programmatic and user interfaces for doing these kinds of things. This blog entry uses three photos, for example. It’s easy for me to assign them the same tags I’m assigning this entry. If I do, I can interactively search for both the entry and the photos in the Vista shell. And I can build an alternate interface that runs that same search on the web and correlates results to published blog entries.

That’s still not the endgame. At heart I’m a citizen of the cloud, and I don’t want any dependencies on local applications or local storage. Clearly Vista and Office entail such dependencies. But they can also cooperate with the cloud and, over time, will do so in deeper and more sophisticated ways. It’s my ambition to do everything I can to improve that cooperation.

Note: There will be formatting problems in this HTML rendering which, for now, painful though it is, I am not going to try to fix by hacking around in the WordPress editor. There are a lot of moving parts here: Word, WordPress, the editor embedded in WordPress (which itself has a raw mode, a visual mode, and a secret/advanced visual mode). I haven’t sorted all this out yet, and I’m not sure I can. (Formatting source code. Why is that always the toothache?)

Anyway, if you want to follow along, I’ve posted the original .docx version of this file here.

Here’s wordxml.py which was imported in the above example. Note that this is CPython, not IronPython. That’s because I’m relying here on CPython’s zipfile module, which in turn relies on a compiled DLL.

import zipfile, re

 

def readDocx(docx):

inarc = zipfile.ZipFile(docx,’r’)

names = inarc.namelist()

dict = {}

for name in names:

dict[name] = inarc.read(name)

inarc.close

print dict.keys()

return dict

 

def readDocumentFromDocx(docx):

arc = zipfile.ZipFile(docx,’r’)

s = arc.read(‘word/document.xml’)

f = open(‘document.xml’,’w’)

f.write(s)

f.close()

return s

 

def updateDocumentInDocx(docx,doc):

dict = readDocx(docx)

archive = zipfile.ZipFile(docx,’w’)

for name in dict.keys():

if (name == ‘word/document.xml’):

dict[name] = doc

archive.writestr(name,dict[name])

archive.close()

 

def tagDocx(docx,tags):

dict = readDocx(docx)

archive = zipfile.ZipFile(docx,’w’)

for name in dict.keys():

if (name == ‘docProps/core.xml’):

dict[name] = re.sub(‘<cp:keywords>(.*)</cp:keywords>’,'<cp:keywords>%s</cp:keywords>’ %

tags, dict[name])

archive.writestr(name,dict[name])

archive.close()

 

 

A conversation with Dan Chudnov about OpenURL, context-sensitive linking, and digital archiving

Today’s podcast with Dan Chudnov is a sequel to my earlier podcast with Tony Hammond about the Nature Publishing Group’s use of digital object identifiers. I invited Dan to discuss related topics including the OpenURL standard for context-sensitive linking.

I’m not the only one who’s had a hard time understanding how these technologies relate to one another and to the web. See, for example, Dorothea Salo’s rant I hate library standards, also Dan’s own recent essay Rethinking OpenURL.

I have ventured into this confusing landscape because I think that the issues that libraries and academic publishers are wrestling with — persistent long-term storage, permanent URLs, reliable citation indexing and analysis — are ones that will matter to many businesses and individuals. As we project our corporate, professional, and personal identities onto the web, we’ll start to see that the long-term stability of those projections is valuable and worth paying for.

Recently, for example, Dave Winer — who’s been exploring Amazon’s S3 — wrote:

I have an idea of making a proposal to Amazon to pay it a onetime fee for hosting the content for perpetuity, that way I can remove a concern for my heirs, and feel that my writing may survive me, something I’d like to assure.

Beyond long-term storage of bits, there’s a whole cluster of related services that we’re coming to depend on, but that flow from relationships that are transient. When I moved this blog from infoworld.com to wordpress.com, for example, InfoWorld very graciously redirected the RSS feed, but another organization might not have done so. I could have finessed that issue by using FeedBurner, but I wasn’t — and honestly, still am not — ready to make a long-term bet on that service.

For most people today, digital archiving and web publishing services are provided to you by your school, by your employer, or — increasingly — by some entity on the web. When your life circumstances change, it’s often necessary or desirable to change your provider, but it’s rarely easy to do that, and almost never possible to do it without loss of continuity.

There are no absolute guarantees, of course, but a relatively strong assurance of continuity is something that more and more folks will be ready to pay for. Amazon is on the short list of organizations in a position to make such assurances. So, obviously, is Microsoft. Will Microsoft’s existing and future online services move in that direction? I hope so. Among other things, it’s a business model that doesn’t depend on advertising, and that would be a refreshing change.

XMP and microformats revisited

Yesterday I exercised poetic license when I suggested that Adobe’s Extensible metadata platform (XMP) was not only the spiritual cousin of microformats like hCalendar but also, perhaps, more likely to see widespread use in the near term. My poetic license was revoked, though, in a couple of comments:

Mike Linksvayer: How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.

Danny Ayers: Like Mike I don’t really understand Jon’s references to microformats – I first assumed he meant XMP could be replaced with a uF.

Actually, I’m serious about this. If I step back and ask myself what are the essential qualities of a microformat, it’s a short list:

  1. A small chunk of machine-readable metadata,
  2. embedded in a document.

Mike notes:

XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.

Yes, I understand. And as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, I’m obviously all about metadata coexisting with human-readable HTML. And I’ve been applying this technique since long before I ever heard the term microformats — my own term was originally microcontent.

But some things that have mattered to me in my ivory tower, like “colocation of metadata with human-visible HTML,” matter to almost nobody else. In the real world, people have been waiting — still are waiting — for widespread deployment of the tools that will enable them to embed chunks of metadata in documents, work with that metadata in-place, and exchange it.

We’ll get there, I hope and pray. But when we finally do, how different are these two scenarios, really?

  1. I use an interactive editor to create the chunk of metadata I embed in a blog posting.
  2. I use an interactive editor to create the chunk of metadata I embed in a photo.

Now there is, as Mike points out, a big philosophical difference between XMP, which aims for arbitrary extensibility, and fixed-function microformats that target specific things like calendar events. But in practice, from the programmer’s perspective, here’s what I observe.

Hand me an HTML document containing a microformat instance and I will cast about in search of tools to parse it, find a variety of ones that sort of work, and then wrestle with the details.

Hand me an image file containing an XMP fragment and, lo and behold, it’s the same story!

In both of these cases, there either will or won’t be enough use of these formats to kickstart the kind of virtuous cycle where production of the formats gets reasonably well normalized. In the ivory tower we pretend that the formats matter above all, and we argue endlessly about them. Personally I’d rather see what I’d consider to be a simpler and cleaner XMP. Others will doubtless argue that XMP doesn’t go far enough in its embrace of semantic web standards. But when we have that argument we are missing the point. What matters is use. This method of embedding metadata in photos is going to be used a whole lot, and in ways that are very like how I’ve been imagining microformats would be used.

PS: As per for this comment, Scott Dart informs me that PNG (and to a lesser extent GIF) can embed arbitrary metadata, but that support for those embeddings regrettably didn’t make the cut in .NET Framework 3.0.

Truth, files, microformats, and XMP

In 2005 I noted the following two definitions of truth:

1. WinFS architect Quentin Clark: “We [i.e. the WinFS database] are the truth.”

2. Indigo architect Don Box: “Message, oh Message / The Truth Is On The Wire / There Is Nothing Else”

Today I’m adding a third definition:

3. Scott Dart, program manager for the Vista Photo Gallery: “The truth is in the file.”

What Scott means is that although image metadata is cached in a database, so that Photo Gallery can search and organize quickly, the canonical location for metadata, including tags, is the file itself. As a result, when you use Photo Gallery to tag your images, you’re making an investment in the image files themselves. If you copy those files to another machine, or upload them to the Net, the tags will travel with those image files. Other applications will be able to make them visible and editable, and those edits can flow back to your local store if you transfer the files back.

That’s huge. It’s also, of course, a bit more complicated. As Scott explains, there are different flavors of metadata: EXIF, IPTC, and the new favorite, XMP. And not all image formats can embed image metadata. In fact many popular formats can’t, including PNG, GIF, and BMP. [Update: Incorrect, see next rock.] But JPG can, and it’s a wonderful thing to behold.

For example, I selected a picture of a yellow flower in Photo Gallery and tagged it with flower. Here’s the XML that showed up inside yellowflower.jpg:

<xmp:xmpmeta xmlns:xmp="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
  xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:subject>
<rdf:Bag xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:li>horse</rdf:li></rdf:Bag>
</dc:subject>
</rdf:Description>
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
  xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0">
  <MicrosoftPhoto:LastKeywordXMP>
  <rdf:Bag xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:li>flower</rdf:li>
  </rdf:Bag>
  </MicrosoftPhoto:LastKeywordXMP>
</rdf:Description>
<rdf:Description xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0">
  <MicrosoftPhoto:Rating>1</MicrosoftPhoto:Rating>
 </rdf:Description>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/">
  <xmp:Rating>1</xmp:Rating>
</rdf:Description>
</rdf:RDF>
</xmp:xmpmeta>

It’s a bit of a mish-mash, to say the least. There’s RDF (Resource Description Framework) syntax, Adobe-style metadata syntax, and Microsoft-style metadata syntax. But it works. And when I look at this it strikes me that here, finally, is a microformat that has a shot at reaching critical mass.

Perhaps we’ve been looking in the wrong places for the first microformat to achieve liftoff. Many of us hoped hCalendar would, but it’s hard to argue that it has. I suppose that’s partly because even though we have a variety of online event services that produce the hCalendar format, there just aren’t that many people publishing and annotating that many events.

There are already a lot of people saving, publishing, and annotating photos. And the tagging interface in Vista’s Photo Gallery, which is really sweet, is about to recruit a whole lot more.

There’s also good support in .NET Framework 3.0 for reading and writing XMP metadata. In the example above, the tag flower was assigned interactively in Photo Gallery. Here’s an IronPython script to read that tag, and change it to iris.

import clr
clr.AddReferenceByPartialName("PresentationCore")
from System.IO import FileStream, FileMode, FileAccess, FileShare
from System.Windows.Media.Imaging import JpegBitmapDecoder, 
 BitmapCreateOptions,BitmapCacheOption


def ReadFirstTag(jpg):
  f = FileStream(jpg,FileMode.Open)
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
    BitmapCacheOption.Default)
  frame = decoder.Frames[0]
  metadata = frame.Metadata
  f.Close()
  return metadata.GetQuery("/xmp/dc:subject/{int=0}")


def WriteFirstTag(jpg,tag):
  f = FileStream(jpg,FileMode.Open, FileAccess.ReadWrite, 
    FileShare.ReadWrite)
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
    BitmapCacheOption.Default)
  frame = decoder.Frames[0]
  writer = frame.CreateInPlaceBitmapMetadataWriter()
  try:
    writer.SetQuery("/xmp/dc:subject/{int=0}",tag)
    writer.TrySave()
  except:
    print "cannot save metadata"
  f.Close()
  writer.GetQuery("/xmp/dc:subject/{int=0}")

print ReadFirstTag('yellowflower.jpg') 
WriteFirstTag('yellowflower.jpg','iris')
print ReadFirstTag('yellowflower.jpg')

The output of this script is:

flower
iris

And when you revisit the photo in Photo Gallery, the tag has indeed changed from flower to iris. Very cool.

Adaptive user interfaces for focused attention

The goal of the search strategy I outlined the other day was to find Mary Czerwinski, a Microsoft researcher, and interview her for a podcast. I did find her, and the resulting podcast is here. We had a great time talking about ways that adaptive user interfaces can leverage spatial and temporal memory, about ambient awareness of team activity, and about the proper role of interruptions in the modern work environment.


In the course of the conversation I mentioned WriteRoom and the notion of a distraction-free desktop. Lately I find myself powerful attracted to Zen simplicity, and I wondered how that impulse might square with the new Office ribbon. It’s a great improvement over the conventional menu systems, but I wondered if there were a quick and easy way to suppress the ribbon when you want to achieve the WriteRoom effect.

It turns out that there are several ways to do that, and I documented them in this short screencast.

Now that I’ve learned how to use the ribbon selectively, there’s one piece of unfinished business. In Vista as in Windows XP, you can hide the desktop icons by right-clicking the desktop and choosing View->Show Desktop Icons. But in order to really incorporate this feature into your workflow you’d like to have it on a hotkey, like WindowsKey->M which instantly minimizes all open windows.

Jeff Ullmann had written to me a while ago with a solution based on the Windows Scripting Host, but the registry layout that it depends on is different in Vista. So, how can you make a clean-desktop hotkey in Vista? I’ve seen the question asked in various places but as yet have found no answers. If you’ve got the recipe I’d love to see it.

Annotate the web, then rewire it

In an essay last week about Yahoo Pipes, Tim O’Reilly said he was inspired, back in 1997, by a talk at the first Perl conference in which I had “expressed a vision of web sites as data sources that could be re-used, and of a new programming paradigm that took the whole internet as its platform.” Someone asked in the comments whether that idea hadn’t instead been put forward in Andrew Schulman’s talk. It turns out that neither Tim nor I can remember exactly what Andrew and I said, but I hope we both touched on this idea because it’s a big one that underlies the whole web services movement and much else besides.

Later on in that comment thread, Tim cites an email message from me in which I try to reconstruct what may have happened. One of the artifacts I dug up was this 1996 BYTE column (cleaner version here). That’s when the lightbulb clicked on for me, and I saw very clearly that the web was collection of components that I’d be able to wire together.

Of course all I was doing was drawing attention to what the creators of the web had intended and made possible. In my recent interview with Roy Fielding, for example, we talked about his early work on libwww-perl, the library that made websites into playthings for Perl programmers. Wiring the web was very much part of the original vision. The idea just needed some champions to broaden its appeal. That’s the role that I, among others, have played.

From that perspective, then, what of Yahoo Pipes? It delights me! Much more importantly, I think it could ultimately appeal to non-technical folks, but there are some conceptual barriers to overcome. The concept of “wiring the web” is one of those, but not the first one. The dominant way in which most people will “program” the web is by writing metadata, not code, and we’ll need an interface as friendly and powerful as Pipes to help them do that.

That last sentence won’t make any sense to the average non-technical person, but the example I gave yesterday might. A by-product of this presidential election cycle will be massive quantities of online video. We should expect to be able to reach into the various repositories and assemble coherent views by issue and by candidate, and Yahoo Pipes would be a great way to do that. But not until and unless the video has been sliced and diced and tagged appropriately so as to yield to structured search.

It’s the slicing and dicing and tagging, not the rewiring, that’s the real bottleneck. I talked last week about factoring group formation out of the various social networks into a common infrastructure. We need to do the same for tagging. How do I know whether to tag my contribution as HillaryClinton and NewHampshire and manufacturing or Hillary Clinton and NH and manufacturing? Where’s the immediate feedback that shows me, across tag-oriented services including YouTube and Blip, how my contribution does or doesn’t align with others, and how I might adjust my tag vocabulary to improve that alignment?

When I tag a video snippet with the name of a politician (“Hillary Clinton”) and a topic (“manufacturing”) I clearly envision a future query in which these slots are filled with the same values or different ones (“Barack Obama”, “energy”). And I clearly envision the kinds of richly-annotated topical remixes that such queries will enable. But such outcomes are not obvious to most people. We need to figure out how to make them obvious.

Retail politics in New Hampshire circa 2007

Hillary Clinton kicked off her campaign this weekend in New Hampshire, and spoke today at the high school in Keene, where I live. Seeing candidates up close and personal is one of the perks of life in small-town New Hampshire, but today it didn’t pan out for me. I arrived early but still couldn’t get into the cafeteria where the event was held. I could have watched the video feed that was piped into the auditorium for a spillover crowd, but instead I went home and watched on the local cable channel.

Here’s a question-and-answer exchange that I captured and put up on Blip.tv:

The question was: “How can government revive and support U.S. manufacturing?” The five-part answer runs almost six-and-a-half minutes. That’s way more time than is ever allotted in the official debates we so obsessively scrutinize.

Retail politics is a wonderful thing, and I wish I’d been there in person. Not everyone who lives in Keene got in, though, and few who live outside Keene did. But those of us connected to the local cable network got to see and hear a whole lot more than the snippets that will air on regular TV. The same will be true in other local communities. Collectively over the course of the various campaigns we’ll see and hear a lot and, in principle, we will be able to collaboratively make sense of it.

By the time the 2008 election rolls around, we ought to be in a position to assemble and review catalogs of these kinds of detailed responses, tagged by candidate and by issue. If you care about manufacturing, you ought to be able to mix yourself a 2-hour show that includes the most informative discourse on the topic from all the candidates. And you should be able to review commentary, from experts who aren’t necessarily the usual TV suspects, that adds value to that discourse.

In practice there’s a fly in the ointment. Are we allowed to republish and categorize this material, as I’ve done here, to provide fodder for decentralized discussion and analysis?

I’m going to check with the guy who runs our local cable channel tomorrow and if there’s a problem I’ll take that video down. But I hope there won’t be a problem. What’s more, I hope that he and his counterparts in other communities will take the issue off the table by choosing appropriate Creative Commons-style licenses for this kind of public-interest material, whether it airs on local cable channels or streams to the Net or both.

A conversation with Antonio Rodriguez about Tabblo, photo albums, and social networks

My guest for this week’s podcast is Antonio Rodriguez, founder of Tabblo, a photo site that’s used to create online photo albums that can be transformed into a variety of print formats.

Among the topics of discussion were:

  • How photo albums tell stories about key events in peoples’ lives
  • Strategies for archival storage of images
  • Strategies for organizing collections of images
  • The relationship between photo applications that live on the desktop and applications that live in the cloud
  • Whether people share their photos online, and if so, with whom
  • What Tabblo’s layout engine does, and how it might be extended
  • Automatic geotagging

We also revisited a topic we’d discussed earlier in the week, on a panel at the MIT Enterprise Forum. The question, also explored here, is: How might certain features of social networks, notably group formation, be factored out of invidual sites and made available in a more federated way?

My first IronPython application

Back in 2004 I wrote a little Python-based web application to do XML-style search of my blog entries. It was a laboratory in which I studied structured search, microformats, in-memory data, and lightweight web applications.

Today I converted that application to IronPython. My purpose is to explore what can be done with the combination of IronPython and the .NET Framework.

I’ve reconstituted the original CPython-based searcher here:

services.jonudell.net:8000/?

The new IronPython-based searcher is here:

services.jonudell.net:8001/?

They look just the same, but you can tell which is which by looking in the browser’s title bar. One says CPython, the other IronPython.

Both are running on a Windows Server 2003 box — the same one, actually, that’s been running the CPython version for the past few years.

The code’s mostly the same too, except for the infrastructure parts. The CPython version uses the simple BaseHTTPServer that comes with Python, and it uses libxml2 for XML parsing and and libxslt for XSLT transformation. The IronPython version, instead, uses the .NET Framework’s built-in webserver (System.Net.HttpListener) and XML facilities (System.Xml).

It’s pretty much an apples-to-apples comparison, as far as these things go. Neither version is, or pretends to be, robust or scalable. Both are using bare-bones HTTP implementations in single-threaded mode, which is a technique that I find perfectly appropriate for lots of handy little services and applications that are used lightly.

The two versions seem to perform about the same on most queries as well, though the IronPython version is way faster when you use the box labeled “all paragraphs containing phrase”.

So what’s the point of this exercise? It demonstrates an unusual approach to using .NET, one that bridges between two very different cultures. In the open source realm, an enormous amount of work gets done in dynamic languages that leverage components, or modules, or libraries, to do the heavy lifting in areas like HTTP and XML. But it’s a big challenge to integrate Python with, say, libxml2, and it’s that same challenge all over again when you want to connect PHP or Ruby to libxml2.

Meanwhile, in the realm of Microsoft-oriented development, most of the work is being done in statically-typed languages. These languages also rely on components, or modules, or libraries to do the heavy lifting. But they can more effectively share the common heavy-lifting capability that’s in the .NET Framework.

The best of both worlds, I believe, is dynamic languages fully connected to common infrastructure. I’m not alone in thinking that, and the Python/.NET combo is not the only way to get there. Sean McGrath has said:

Jython, lest you do not know of it, is the most compelling weapon the Java platform has for its survival into the 21st century. [2004]

Today’s experiment confirms my hunch that IronPython will be at least as compelling, and will open up the .NET Framework to lots of folks for whom the traditional methods of access aren’t appealing.

There was one fly in the ointment. I had wanted to host this IronPython application on the Windows Communication Foundation (WCF) which would provide a much more robust engine than System.Net’s HttpListener. And at first it looked like it would work. But WCF service contracts require the use of a .NET feature called attributes. It turns out there isn’t yet a way to represent those in IronPython. If someone has figured out an intermediary that enables IronPython to implement RESTful WCF services, I’d love to see how that’s done.

Search strategies, part 2

Our web search strategies are largely unconscious. Back in December I dredged one up to take a look at it, and resolved to do that again from time to time. Today’s challenge was to find this article on infomania that I read about a week ago and neglected to bookmark. More specifically, I needed to recall the name Mary Czerwinski, a Microsoft researcher mentioned in the story, because I want to interview her for a podcast.

The multi-step strategy that got me there is subtle, and independent of any particular search engine. Here were the givens:

  1. I thought I’d seen the story on SeattlePI.com.
  2. I thought the researcher was female, and was an organizer of the event that was the subject of the story.
  3. I thought I’d recognize her name if I saw it.
  4. I thought that the word “attention” would appear frequently in the story.

I started with these queries:

“microsoft research” conference on attention

“microsoft research” seminar on interruption

This would have nailed it:

“microsoft research” workshop on infomania

But of course I didn’t recall that it was a workshop rather than a seminar or conference, and the word infomania hadn’t sunk in when I read the article.

Next I tried this:

“microsoft research” “continuous partial attention”

This leads, in any search engine, to Linda Stone, which I knew was a blind alley. I’ve read and heard Linda Stone on the subject of continuous partial attention, I know she’s no longer at Microsoft and wasn’t the female researcher in the story. But I figured this query would get me in the neighborhood, that the nimbus of documents surrounding her name would shake something loose. It didn’t.

Next I broadened to:

“microsoft research” attention

This leads, in any search engine, to Eric Horvitz. Note that although Eric Horvitz’s name does appear in the story I was looking for, the word “attention” does not appear in the story.

I wish I could be more precise about what happened next, but the general idea was to explore documents surrounding Eric Horvitz that would contain the name of a female researcher which, when I saw it, would ring a bell. In a couple of clicks I saw the name “Mary Czerwinski” and it did ring a bell. So my final search at SeattlePI.com was for Mary Czerwinski, and the target story was the first hit.

In retrospect I could’ve searched SeattlePI for Eric Horvitz and found the target story as the second hit. I can’t say exactly why I didn’t, but I suspect it’s because I thought exploring the document cluster around Eric Horvitz would be useful for other reasons than to locate Mary.

We perform these kinds of searches every day without thinking much about them, but there’s an amazing amount of stuff going on under the hood. Consider, for example, the aspect of this strategy that involves switching from general search engines to SeattlePI’s search engine. If I was right about the the source of the article, that would be a winning strategy because the target would tend to pop up readily in SeattlePI’s engine. If I was wrong, though, it would be a complete waste of time. Some part of my brain calculated that tradeoff. A successful search strategy involves a bunch of those kinds of calculations. How could we surface them from unconsciousness, study them, and optimize them?

Critical mass and social network fatigue

At the MIT Enterprise Forum tomorrow in Boston, I’ll be moderating a panel with three social software entrepeneurs on the topic of getting to critical mass. I want to ask the panelists about overcoming the friction involved in joining and learning to use their services.

Years ago at BYTE Magazine my friend Ben Smith, who was a Unix greybeard even then (now he’s a Unix whitebeard), made a memorable comment that’s always stuck with me. We were in the midst of evaluating a batch of LAN email products. “One of these days,” Ben said in, I think, 1991, “everyone’s going to look up from their little islands of LAN email and see this giant mothership hovering overhead called the Internet.”

Increasingly I’ve begun to feel the same way about the various social networks. How many networks can one person join? How many different identities can one person sanely manage? How many different tagging or photo-uploading or friending protocols can one person deal with?

Recently Gary McGraw echoed Ben Smith’s 1991 observation. “People keep asking me to join the LinkedIn network,” he said, “but I’m already part of a network, it’s called the Internet.”

Now of course LinkedIn offers protocols and features that the open Net doesn’t, at least not yet, and the same is true for all the specialized overlays that we call social networks. But there’s a ton of duplication in those layered protocols and features. If we can’t factor out a bunch of the duplication, I think social network fatigue becomes the major hurdle standing in the way of reaching critical mass.

I’m sure everyone will agree that sign-in protocols should be extracted and made common. What else can and should be refactored? What can’t and shouldn’t?

A conversation with Brian Jones about Office and XML

In 2002 and throughout 2003 I wrote a flurry of InfoWorld articles about the XML features that were being infused into Office, including:

I was excited by the opportunities I saw then, and I still am today, though it has been a slower burn than I’d hoped. In today’s podcast with Brian Jones we discuss what those opportunities are, what’s changed (or hasn’t) over the past few years, and what remains to be done.

This podcast isn’t about the Office Open XML (OOXML) vs. Open Document Format (ODF) controversy that’s been such a hot topic lately. Instead it tackles a broad theme: how, in general, do we unite documents and data on the desktop and on the web?

An object lesson in surface area visibility

A comment from Mark Middleton perfectly illustrates the point I was making the other day about visualizing your published surface area. I started this blog in December, and ever since I’ve been running with a robots.txt file that reads:

User-agent: *
Disallow: /

In other words, no search engine crawlers allowed. Of course that’s not what I intended. I’d simply assumed that the default setting was to allow rather than to block crawlers, and it never occurred to me to check. In retrospect it makes sense. If you’re running a free service like WordPress.com, you might want to restrict crawling to only the blogs whose authors explicitly request it.

WordPress.com’s policy notwithstanding, the real issue here is that these complex information membranes we’re extruding into cyberspace are really hard to see and coherently manage.

For the record, the relevant setting in WordPress.com is Options -> Privacy -> Blog visibility -> I would like my blog to appear in search engines like Google and Sphere, and in public listings around WordPress.com. Interestingly, although I’ve made that change, it’s not yet reflected in the robots.txt file. I wonder how long that’ll take?

A conversation with Ed Vielmetti and John Blyberg about superpatrons and superlibrarians

Last fall, in Ann Arbor, Michigan, I gave a talk entitled Superpatrons and Superlibrarians. Joining me for this week’s podcast are the two guys who inspired that talk. The superpatron is Ed Vielmetti, an old Internet hand who likes to mash up the services proviced by the Ann Arbor District Library. That’s possible because superlibrarian John Blyberg, who works at the AADL, has reconfigured his library’s online catalog system, adding RSS feeds and a full-blown API he calls PatREST.

I’ve written from time to time about Eric von Hippel’s notion of user innovation toolkits and the synergistic relationship between users and developers that can develop around such toolkits. What Ed Vielmetti and John Blyberg are doing with Ann Arbor District Library is a great example of how that relationship can work.

Update: I meant to call out some of the excellent work that John’s been doing lately. This catalog record is an example of an Amazon-like recommendation feature: “Users who checked out this item also checked out these library items…” Nice!

You’ve also gotta love the experimental card catalog images.

Who can see which parts of my published surface area?

To describe the various projections of ourselves into cyberspace, I use the following metaphor: we’re cells, and we’re growing the surface area of our cellular membranes. Every time I write a blog item, or post a Flickr photo, or tag a resource in del.icio.us, I enlarge the surface area of that membrane. I do it for two reasons. First, because I want influence to flow from me to the world. Second, because I want influence to flow the other way too. I’m soliciting feedback and interaction.

I monitor that feedback using an array of sensors that works surprisingly well. All of the parts of my public membrane can be instrumented with RSS feeds. By tuning into those feeds, I know — fairly immediately and comprehensively — who has touched which parts of my exposed surface area.

What I can’t do very easily, though, is visualize that entire complex surface. If somebody reacts to something I published years ago on some site I’ve forgotten about, I’m reminded that part of my surface area extends to that site. But it’s only a reactive thing, there’s no proactive way to review the totality of my published corpus. That’d be handy.

It’d be even handier for the parts of my membrane that aren’t fully public. The lack of such a capability, in those cases, is what makes security so hard for people to manage.

For example, I’m still working through the implications of the calendar cross-publishing arrangement I’ve set up for myself. Consider my Outlook calendar. It’s shared within the company by virtue of a default policy that I can view, or modify, by right-clicking the calendar in Outlook and selecting Properties -> Permissions. But it’s also shared with my family by way of a private URL that I created on my WebDAV server and transmitted out-of-band. I can see that private URL by right-clicking and selecting Publish to Internet -> Change Publishing Options, but there’s no indication there of who I gave the private URL to.

This example is just one manifestation of a general problem that cuts across all systems and applications that enable people to selectively expose surface area. There’s no unified way to see and explore that surface area. You have to make a mental inventory of all of the bits that you’ve exposed, you have to individually review the scope of visibility for each bit, and then you have to synthesize a view of what’s visible from a variety of perspectives. Humans are lousy at this kind of thing, computers are good at it, but we haven’t figured out how to enlist the computers to help us to it.

We can at least dream about how a surface-area viewer would work. You’d point it at a blob representing you, and zoom in to resolve the various bits of exposed surface area. There’d be a viewer-impersonation knob that would start at Everyone but could be spun to any of the groups or individuals to which you’ve granted permissions using any of your diverse permission-granting services. You’d spin that knob from one setting to another, and fly around exploring who can cross various parts of the blob’s membrane and who can’t.

I know this would impractical for all sorts of reasons. But I still want it.