A conversation with Antonio Rodriguez about Tabblo, photo albums, and social networks

My guest for this week’s podcast is Antonio Rodriguez, founder of Tabblo, a photo site that’s used to create online photo albums that can be transformed into a variety of print formats.

Among the topics of discussion were:

  • How photo albums tell stories about key events in peoples’ lives
  • Strategies for archival storage of images
  • Strategies for organizing collections of images
  • The relationship between photo applications that live on the desktop and applications that live in the cloud
  • Whether people share their photos online, and if so, with whom
  • What Tabblo’s layout engine does, and how it might be extended
  • Automatic geotagging

We also revisited a topic we’d discussed earlier in the week, on a panel at the MIT Enterprise Forum. The question, also explored here, is: How might certain features of social networks, notably group formation, be factored out of invidual sites and made available in a more federated way?

My first IronPython application

Back in 2004 I wrote a little Python-based web application to do XML-style search of my blog entries. It was a laboratory in which I studied structured search, microformats, in-memory data, and lightweight web applications.

Today I converted that application to IronPython. My purpose is to explore what can be done with the combination of IronPython and the .NET Framework.

I’ve reconstituted the original CPython-based searcher here:

services.jonudell.net:8000/?

The new IronPython-based searcher is here:

services.jonudell.net:8001/?

They look just the same, but you can tell which is which by looking in the browser’s title bar. One says CPython, the other IronPython.

Both are running on a Windows Server 2003 box — the same one, actually, that’s been running the CPython version for the past few years.

The code’s mostly the same too, except for the infrastructure parts. The CPython version uses the simple BaseHTTPServer that comes with Python, and it uses libxml2 for XML parsing and and libxslt for XSLT transformation. The IronPython version, instead, uses the .NET Framework’s built-in webserver (System.Net.HttpListener) and XML facilities (System.Xml).

It’s pretty much an apples-to-apples comparison, as far as these things go. Neither version is, or pretends to be, robust or scalable. Both are using bare-bones HTTP implementations in single-threaded mode, which is a technique that I find perfectly appropriate for lots of handy little services and applications that are used lightly.

The two versions seem to perform about the same on most queries as well, though the IronPython version is way faster when you use the box labeled “all paragraphs containing phrase”.

So what’s the point of this exercise? It demonstrates an unusual approach to using .NET, one that bridges between two very different cultures. In the open source realm, an enormous amount of work gets done in dynamic languages that leverage components, or modules, or libraries, to do the heavy lifting in areas like HTTP and XML. But it’s a big challenge to integrate Python with, say, libxml2, and it’s that same challenge all over again when you want to connect PHP or Ruby to libxml2.

Meanwhile, in the realm of Microsoft-oriented development, most of the work is being done in statically-typed languages. These languages also rely on components, or modules, or libraries to do the heavy lifting. But they can more effectively share the common heavy-lifting capability that’s in the .NET Framework.

The best of both worlds, I believe, is dynamic languages fully connected to common infrastructure. I’m not alone in thinking that, and the Python/.NET combo is not the only way to get there. Sean McGrath has said:

Jython, lest you do not know of it, is the most compelling weapon the Java platform has for its survival into the 21st century. [2004]

Today’s experiment confirms my hunch that IronPython will be at least as compelling, and will open up the .NET Framework to lots of folks for whom the traditional methods of access aren’t appealing.

There was one fly in the ointment. I had wanted to host this IronPython application on the Windows Communication Foundation (WCF) which would provide a much more robust engine than System.Net’s HttpListener. And at first it looked like it would work. But WCF service contracts require the use of a .NET feature called attributes. It turns out there isn’t yet a way to represent those in IronPython. If someone has figured out an intermediary that enables IronPython to implement RESTful WCF services, I’d love to see how that’s done.

Search strategies, part 2

Our web search strategies are largely unconscious. Back in December I dredged one up to take a look at it, and resolved to do that again from time to time. Today’s challenge was to find this article on infomania that I read about a week ago and neglected to bookmark. More specifically, I needed to recall the name Mary Czerwinski, a Microsoft researcher mentioned in the story, because I want to interview her for a podcast.

The multi-step strategy that got me there is subtle, and independent of any particular search engine. Here were the givens:

  1. I thought I’d seen the story on SeattlePI.com.
  2. I thought the researcher was female, and was an organizer of the event that was the subject of the story.
  3. I thought I’d recognize her name if I saw it.
  4. I thought that the word “attention” would appear frequently in the story.

I started with these queries:

“microsoft research” conference on attention

“microsoft research” seminar on interruption

This would have nailed it:

“microsoft research” workshop on infomania

But of course I didn’t recall that it was a workshop rather than a seminar or conference, and the word infomania hadn’t sunk in when I read the article.

Next I tried this:

“microsoft research” “continuous partial attention”

This leads, in any search engine, to Linda Stone, which I knew was a blind alley. I’ve read and heard Linda Stone on the subject of continuous partial attention, I know she’s no longer at Microsoft and wasn’t the female researcher in the story. But I figured this query would get me in the neighborhood, that the nimbus of documents surrounding her name would shake something loose. It didn’t.

Next I broadened to:

“microsoft research” attention

This leads, in any search engine, to Eric Horvitz. Note that although Eric Horvitz’s name does appear in the story I was looking for, the word “attention” does not appear in the story.

I wish I could be more precise about what happened next, but the general idea was to explore documents surrounding Eric Horvitz that would contain the name of a female researcher which, when I saw it, would ring a bell. In a couple of clicks I saw the name “Mary Czerwinski” and it did ring a bell. So my final search at SeattlePI.com was for Mary Czerwinski, and the target story was the first hit.

In retrospect I could’ve searched SeattlePI for Eric Horvitz and found the target story as the second hit. I can’t say exactly why I didn’t, but I suspect it’s because I thought exploring the document cluster around Eric Horvitz would be useful for other reasons than to locate Mary.

We perform these kinds of searches every day without thinking much about them, but there’s an amazing amount of stuff going on under the hood. Consider, for example, the aspect of this strategy that involves switching from general search engines to SeattlePI’s search engine. If I was right about the the source of the article, that would be a winning strategy because the target would tend to pop up readily in SeattlePI’s engine. If I was wrong, though, it would be a complete waste of time. Some part of my brain calculated that tradeoff. A successful search strategy involves a bunch of those kinds of calculations. How could we surface them from unconsciousness, study them, and optimize them?

Critical mass and social network fatigue

At the MIT Enterprise Forum tomorrow in Boston, I’ll be moderating a panel with three social software entrepeneurs on the topic of getting to critical mass. I want to ask the panelists about overcoming the friction involved in joining and learning to use their services.

Years ago at BYTE Magazine my friend Ben Smith, who was a Unix greybeard even then (now he’s a Unix whitebeard), made a memorable comment that’s always stuck with me. We were in the midst of evaluating a batch of LAN email products. “One of these days,” Ben said in, I think, 1991, “everyone’s going to look up from their little islands of LAN email and see this giant mothership hovering overhead called the Internet.”

Increasingly I’ve begun to feel the same way about the various social networks. How many networks can one person join? How many different identities can one person sanely manage? How many different tagging or photo-uploading or friending protocols can one person deal with?

Recently Gary McGraw echoed Ben Smith’s 1991 observation. “People keep asking me to join the LinkedIn network,” he said, “but I’m already part of a network, it’s called the Internet.”

Now of course LinkedIn offers protocols and features that the open Net doesn’t, at least not yet, and the same is true for all the specialized overlays that we call social networks. But there’s a ton of duplication in those layered protocols and features. If we can’t factor out a bunch of the duplication, I think social network fatigue becomes the major hurdle standing in the way of reaching critical mass.

I’m sure everyone will agree that sign-in protocols should be extracted and made common. What else can and should be refactored? What can’t and shouldn’t?

A conversation with Brian Jones about Office and XML

In 2002 and throughout 2003 I wrote a flurry of InfoWorld articles about the XML features that were being infused into Office, including:

I was excited by the opportunities I saw then, and I still am today, though it has been a slower burn than I’d hoped. In today’s podcast with Brian Jones we discuss what those opportunities are, what’s changed (or hasn’t) over the past few years, and what remains to be done.

This podcast isn’t about the Office Open XML (OOXML) vs. Open Document Format (ODF) controversy that’s been such a hot topic lately. Instead it tackles a broad theme: how, in general, do we unite documents and data on the desktop and on the web?

An object lesson in surface area visibility

A comment from Mark Middleton perfectly illustrates the point I was making the other day about visualizing your published surface area. I started this blog in December, and ever since I’ve been running with a robots.txt file that reads:

User-agent: *
Disallow: /

In other words, no search engine crawlers allowed. Of course that’s not what I intended. I’d simply assumed that the default setting was to allow rather than to block crawlers, and it never occurred to me to check. In retrospect it makes sense. If you’re running a free service like WordPress.com, you might want to restrict crawling to only the blogs whose authors explicitly request it.

WordPress.com’s policy notwithstanding, the real issue here is that these complex information membranes we’re extruding into cyberspace are really hard to see and coherently manage.

For the record, the relevant setting in WordPress.com is Options -> Privacy -> Blog visibility -> I would like my blog to appear in search engines like Google and Sphere, and in public listings around WordPress.com. Interestingly, although I’ve made that change, it’s not yet reflected in the robots.txt file. I wonder how long that’ll take?

A conversation with Ed Vielmetti and John Blyberg about superpatrons and superlibrarians

Last fall, in Ann Arbor, Michigan, I gave a talk entitled Superpatrons and Superlibrarians. Joining me for this week’s podcast are the two guys who inspired that talk. The superpatron is Ed Vielmetti, an old Internet hand who likes to mash up the services proviced by the Ann Arbor District Library. That’s possible because superlibrarian John Blyberg, who works at the AADL, has reconfigured his library’s online catalog system, adding RSS feeds and a full-blown API he calls PatREST.

I’ve written from time to time about Eric von Hippel’s notion of user innovation toolkits and the synergistic relationship between users and developers that can develop around such toolkits. What Ed Vielmetti and John Blyberg are doing with Ann Arbor District Library is a great example of how that relationship can work.

Update: I meant to call out some of the excellent work that John’s been doing lately. This catalog record is an example of an Amazon-like recommendation feature: “Users who checked out this item also checked out these library items…” Nice!

You’ve also gotta love the experimental card catalog images.