Truth, files, microformats, and XMP

In 2005 I noted the following two definitions of truth:

1. WinFS architect Quentin Clark: “We [i.e. the WinFS database] are the truth.”

2. Indigo architect Don Box: “Message, oh Message / The Truth Is On The Wire / There Is Nothing Else”

Today I’m adding a third definition:

3. Scott Dart, program manager for the Vista Photo Gallery: “The truth is in the file.”

What Scott means is that although image metadata is cached in a database, so that Photo Gallery can search and organize quickly, the canonical location for metadata, including tags, is the file itself. As a result, when you use Photo Gallery to tag your images, you’re making an investment in the image files themselves. If you copy those files to another machine, or upload them to the Net, the tags will travel with those image files. Other applications will be able to make them visible and editable, and those edits can flow back to your local store if you transfer the files back.

That’s huge. It’s also, of course, a bit more complicated. As Scott explains, there are different flavors of metadata: EXIF, IPTC, and the new favorite, XMP. And not all image formats can embed image metadata. In fact many popular formats can’t, including PNG, GIF, and BMP. [Update: Incorrect, see next rock.] But JPG can, and it’s a wonderful thing to behold.

For example, I selected a picture of a yellow flower in Photo Gallery and tagged it with flower. Here’s the XML that showed up inside yellowflower.jpg:

<xmp:xmpmeta xmlns:xmp="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
  xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:subject>
<rdf:Bag xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:li>horse</rdf:li></rdf:Bag>
</dc:subject>
</rdf:Description>
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
  xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0">
  <MicrosoftPhoto:LastKeywordXMP>
  <rdf:Bag xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:li>flower</rdf:li>
  </rdf:Bag>
  </MicrosoftPhoto:LastKeywordXMP>
</rdf:Description>
<rdf:Description xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0">
  <MicrosoftPhoto:Rating>1</MicrosoftPhoto:Rating>
 </rdf:Description>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/">
  <xmp:Rating>1</xmp:Rating>
</rdf:Description>
</rdf:RDF>
</xmp:xmpmeta>

It’s a bit of a mish-mash, to say the least. There’s RDF (Resource Description Framework) syntax, Adobe-style metadata syntax, and Microsoft-style metadata syntax. But it works. And when I look at this it strikes me that here, finally, is a microformat that has a shot at reaching critical mass.

Perhaps we’ve been looking in the wrong places for the first microformat to achieve liftoff. Many of us hoped hCalendar would, but it’s hard to argue that it has. I suppose that’s partly because even though we have a variety of online event services that produce the hCalendar format, there just aren’t that many people publishing and annotating that many events.

There are already a lot of people saving, publishing, and annotating photos. And the tagging interface in Vista’s Photo Gallery, which is really sweet, is about to recruit a whole lot more.

There’s also good support in .NET Framework 3.0 for reading and writing XMP metadata. In the example above, the tag flower was assigned interactively in Photo Gallery. Here’s an IronPython script to read that tag, and change it to iris.

import clr
clr.AddReferenceByPartialName("PresentationCore")
from System.IO import FileStream, FileMode, FileAccess, FileShare
from System.Windows.Media.Imaging import JpegBitmapDecoder, 
 BitmapCreateOptions,BitmapCacheOption


def ReadFirstTag(jpg):
  f = FileStream(jpg,FileMode.Open)
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
    BitmapCacheOption.Default)
  frame = decoder.Frames[0]
  metadata = frame.Metadata
  f.Close()
  return metadata.GetQuery("/xmp/dc:subject/{int=0}")


def WriteFirstTag(jpg,tag):
  f = FileStream(jpg,FileMode.Open, FileAccess.ReadWrite, 
    FileShare.ReadWrite)
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
    BitmapCacheOption.Default)
  frame = decoder.Frames[0]
  writer = frame.CreateInPlaceBitmapMetadataWriter()
  try:
    writer.SetQuery("/xmp/dc:subject/{int=0}",tag)
    writer.TrySave()
  except:
    print "cannot save metadata"
  f.Close()
  writer.GetQuery("/xmp/dc:subject/{int=0}")

print ReadFirstTag('yellowflower.jpg') 
WriteFirstTag('yellowflower.jpg','iris')
print ReadFirstTag('yellowflower.jpg')

The output of this script is:

flower
iris

And when you revisit the photo in Photo Gallery, the tag has indeed changed from flower to iris. Very cool.

Adaptive user interfaces for focused attention

The goal of the search strategy I outlined the other day was to find Mary Czerwinski, a Microsoft researcher, and interview her for a podcast. I did find her, and the resulting podcast is here. We had a great time talking about ways that adaptive user interfaces can leverage spatial and temporal memory, about ambient awareness of team activity, and about the proper role of interruptions in the modern work environment.


In the course of the conversation I mentioned WriteRoom and the notion of a distraction-free desktop. Lately I find myself powerful attracted to Zen simplicity, and I wondered how that impulse might square with the new Office ribbon. It’s a great improvement over the conventional menu systems, but I wondered if there were a quick and easy way to suppress the ribbon when you want to achieve the WriteRoom effect.

It turns out that there are several ways to do that, and I documented them in this short screencast.

Now that I’ve learned how to use the ribbon selectively, there’s one piece of unfinished business. In Vista as in Windows XP, you can hide the desktop icons by right-clicking the desktop and choosing View->Show Desktop Icons. But in order to really incorporate this feature into your workflow you’d like to have it on a hotkey, like WindowsKey->M which instantly minimizes all open windows.

Jeff Ullmann had written to me a while ago with a solution based on the Windows Scripting Host, but the registry layout that it depends on is different in Vista. So, how can you make a clean-desktop hotkey in Vista? I’ve seen the question asked in various places but as yet have found no answers. If you’ve got the recipe I’d love to see it.

Annotate the web, then rewire it

In an essay last week about Yahoo Pipes, Tim O’Reilly said he was inspired, back in 1997, by a talk at the first Perl conference in which I had “expressed a vision of web sites as data sources that could be re-used, and of a new programming paradigm that took the whole internet as its platform.” Someone asked in the comments whether that idea hadn’t instead been put forward in Andrew Schulman’s talk. It turns out that neither Tim nor I can remember exactly what Andrew and I said, but I hope we both touched on this idea because it’s a big one that underlies the whole web services movement and much else besides.

Later on in that comment thread, Tim cites an email message from me in which I try to reconstruct what may have happened. One of the artifacts I dug up was this 1996 BYTE column (cleaner version here). That’s when the lightbulb clicked on for me, and I saw very clearly that the web was collection of components that I’d be able to wire together.

Of course all I was doing was drawing attention to what the creators of the web had intended and made possible. In my recent interview with Roy Fielding, for example, we talked about his early work on libwww-perl, the library that made websites into playthings for Perl programmers. Wiring the web was very much part of the original vision. The idea just needed some champions to broaden its appeal. That’s the role that I, among others, have played.

From that perspective, then, what of Yahoo Pipes? It delights me! Much more importantly, I think it could ultimately appeal to non-technical folks, but there are some conceptual barriers to overcome. The concept of “wiring the web” is one of those, but not the first one. The dominant way in which most people will “program” the web is by writing metadata, not code, and we’ll need an interface as friendly and powerful as Pipes to help them do that.

That last sentence won’t make any sense to the average non-technical person, but the example I gave yesterday might. A by-product of this presidential election cycle will be massive quantities of online video. We should expect to be able to reach into the various repositories and assemble coherent views by issue and by candidate, and Yahoo Pipes would be a great way to do that. But not until and unless the video has been sliced and diced and tagged appropriately so as to yield to structured search.

It’s the slicing and dicing and tagging, not the rewiring, that’s the real bottleneck. I talked last week about factoring group formation out of the various social networks into a common infrastructure. We need to do the same for tagging. How do I know whether to tag my contribution as HillaryClinton and NewHampshire and manufacturing or Hillary Clinton and NH and manufacturing? Where’s the immediate feedback that shows me, across tag-oriented services including YouTube and Blip, how my contribution does or doesn’t align with others, and how I might adjust my tag vocabulary to improve that alignment?

When I tag a video snippet with the name of a politician (“Hillary Clinton”) and a topic (“manufacturing”) I clearly envision a future query in which these slots are filled with the same values or different ones (“Barack Obama”, “energy”). And I clearly envision the kinds of richly-annotated topical remixes that such queries will enable. But such outcomes are not obvious to most people. We need to figure out how to make them obvious.

Retail politics in New Hampshire circa 2007

Hillary Clinton kicked off her campaign this weekend in New Hampshire, and spoke today at the high school in Keene, where I live. Seeing candidates up close and personal is one of the perks of life in small-town New Hampshire, but today it didn’t pan out for me. I arrived early but still couldn’t get into the cafeteria where the event was held. I could have watched the video feed that was piped into the auditorium for a spillover crowd, but instead I went home and watched on the local cable channel.

Here’s a question-and-answer exchange that I captured and put up on Blip.tv:

The question was: “How can government revive and support U.S. manufacturing?” The five-part answer runs almost six-and-a-half minutes. That’s way more time than is ever allotted in the official debates we so obsessively scrutinize.

Retail politics is a wonderful thing, and I wish I’d been there in person. Not everyone who lives in Keene got in, though, and few who live outside Keene did. But those of us connected to the local cable network got to see and hear a whole lot more than the snippets that will air on regular TV. The same will be true in other local communities. Collectively over the course of the various campaigns we’ll see and hear a lot and, in principle, we will be able to collaboratively make sense of it.

By the time the 2008 election rolls around, we ought to be in a position to assemble and review catalogs of these kinds of detailed responses, tagged by candidate and by issue. If you care about manufacturing, you ought to be able to mix yourself a 2-hour show that includes the most informative discourse on the topic from all the candidates. And you should be able to review commentary, from experts who aren’t necessarily the usual TV suspects, that adds value to that discourse.

In practice there’s a fly in the ointment. Are we allowed to republish and categorize this material, as I’ve done here, to provide fodder for decentralized discussion and analysis?

I’m going to check with the guy who runs our local cable channel tomorrow and if there’s a problem I’ll take that video down. But I hope there won’t be a problem. What’s more, I hope that he and his counterparts in other communities will take the issue off the table by choosing appropriate Creative Commons-style licenses for this kind of public-interest material, whether it airs on local cable channels or streams to the Net or both.

A conversation with Antonio Rodriguez about Tabblo, photo albums, and social networks

My guest for this week’s podcast is Antonio Rodriguez, founder of Tabblo, a photo site that’s used to create online photo albums that can be transformed into a variety of print formats.

Among the topics of discussion were:

  • How photo albums tell stories about key events in peoples’ lives
  • Strategies for archival storage of images
  • Strategies for organizing collections of images
  • The relationship between photo applications that live on the desktop and applications that live in the cloud
  • Whether people share their photos online, and if so, with whom
  • What Tabblo’s layout engine does, and how it might be extended
  • Automatic geotagging

We also revisited a topic we’d discussed earlier in the week, on a panel at the MIT Enterprise Forum. The question, also explored here, is: How might certain features of social networks, notably group formation, be factored out of invidual sites and made available in a more federated way?

My first IronPython application

Back in 2004 I wrote a little Python-based web application to do XML-style search of my blog entries. It was a laboratory in which I studied structured search, microformats, in-memory data, and lightweight web applications.

Today I converted that application to IronPython. My purpose is to explore what can be done with the combination of IronPython and the .NET Framework.

I’ve reconstituted the original CPython-based searcher here:

services.jonudell.net:8000/?

The new IronPython-based searcher is here:

services.jonudell.net:8001/?

They look just the same, but you can tell which is which by looking in the browser’s title bar. One says CPython, the other IronPython.

Both are running on a Windows Server 2003 box — the same one, actually, that’s been running the CPython version for the past few years.

The code’s mostly the same too, except for the infrastructure parts. The CPython version uses the simple BaseHTTPServer that comes with Python, and it uses libxml2 for XML parsing and and libxslt for XSLT transformation. The IronPython version, instead, uses the .NET Framework’s built-in webserver (System.Net.HttpListener) and XML facilities (System.Xml).

It’s pretty much an apples-to-apples comparison, as far as these things go. Neither version is, or pretends to be, robust or scalable. Both are using bare-bones HTTP implementations in single-threaded mode, which is a technique that I find perfectly appropriate for lots of handy little services and applications that are used lightly.

The two versions seem to perform about the same on most queries as well, though the IronPython version is way faster when you use the box labeled “all paragraphs containing phrase”.

So what’s the point of this exercise? It demonstrates an unusual approach to using .NET, one that bridges between two very different cultures. In the open source realm, an enormous amount of work gets done in dynamic languages that leverage components, or modules, or libraries, to do the heavy lifting in areas like HTTP and XML. But it’s a big challenge to integrate Python with, say, libxml2, and it’s that same challenge all over again when you want to connect PHP or Ruby to libxml2.

Meanwhile, in the realm of Microsoft-oriented development, most of the work is being done in statically-typed languages. These languages also rely on components, or modules, or libraries to do the heavy lifting. But they can more effectively share the common heavy-lifting capability that’s in the .NET Framework.

The best of both worlds, I believe, is dynamic languages fully connected to common infrastructure. I’m not alone in thinking that, and the Python/.NET combo is not the only way to get there. Sean McGrath has said:

Jython, lest you do not know of it, is the most compelling weapon the Java platform has for its survival into the 21st century. [2004]

Today’s experiment confirms my hunch that IronPython will be at least as compelling, and will open up the .NET Framework to lots of folks for whom the traditional methods of access aren’t appealing.

There was one fly in the ointment. I had wanted to host this IronPython application on the Windows Communication Foundation (WCF) which would provide a much more robust engine than System.Net’s HttpListener. And at first it looked like it would work. But WCF service contracts require the use of a .NET feature called attributes. It turns out there isn’t yet a way to represent those in IronPython. If someone has figured out an intermediary that enables IronPython to implement RESTful WCF services, I’d love to see how that’s done.

Search strategies, part 2

Our web search strategies are largely unconscious. Back in December I dredged one up to take a look at it, and resolved to do that again from time to time. Today’s challenge was to find this article on infomania that I read about a week ago and neglected to bookmark. More specifically, I needed to recall the name Mary Czerwinski, a Microsoft researcher mentioned in the story, because I want to interview her for a podcast.

The multi-step strategy that got me there is subtle, and independent of any particular search engine. Here were the givens:

  1. I thought I’d seen the story on SeattlePI.com.
  2. I thought the researcher was female, and was an organizer of the event that was the subject of the story.
  3. I thought I’d recognize her name if I saw it.
  4. I thought that the word “attention” would appear frequently in the story.

I started with these queries:

“microsoft research” conference on attention

“microsoft research” seminar on interruption

This would have nailed it:

“microsoft research” workshop on infomania

But of course I didn’t recall that it was a workshop rather than a seminar or conference, and the word infomania hadn’t sunk in when I read the article.

Next I tried this:

“microsoft research” “continuous partial attention”

This leads, in any search engine, to Linda Stone, which I knew was a blind alley. I’ve read and heard Linda Stone on the subject of continuous partial attention, I know she’s no longer at Microsoft and wasn’t the female researcher in the story. But I figured this query would get me in the neighborhood, that the nimbus of documents surrounding her name would shake something loose. It didn’t.

Next I broadened to:

“microsoft research” attention

This leads, in any search engine, to Eric Horvitz. Note that although Eric Horvitz’s name does appear in the story I was looking for, the word “attention” does not appear in the story.

I wish I could be more precise about what happened next, but the general idea was to explore documents surrounding Eric Horvitz that would contain the name of a female researcher which, when I saw it, would ring a bell. In a couple of clicks I saw the name “Mary Czerwinski” and it did ring a bell. So my final search at SeattlePI.com was for Mary Czerwinski, and the target story was the first hit.

In retrospect I could’ve searched SeattlePI for Eric Horvitz and found the target story as the second hit. I can’t say exactly why I didn’t, but I suspect it’s because I thought exploring the document cluster around Eric Horvitz would be useful for other reasons than to locate Mary.

We perform these kinds of searches every day without thinking much about them, but there’s an amazing amount of stuff going on under the hood. Consider, for example, the aspect of this strategy that involves switching from general search engines to SeattlePI’s search engine. If I was right about the the source of the article, that would be a winning strategy because the target would tend to pop up readily in SeattlePI’s engine. If I was wrong, though, it would be a complete waste of time. Some part of my brain calculated that tradeoff. A successful search strategy involves a bunch of those kinds of calculations. How could we surface them from unconsciousness, study them, and optimize them?