Talking with Cathy Marshall about tags, digital archiving, and lifestreams

My guest for this week’s Innovators show is Cathy Marshall, a Senior Researcher in Microsoft’s Silicon Valley Lab. She’s long been intrigued by personal information management — and nowadays, also by its social dimension.

We kicked off the conversation with a discussion of her recent paper Do Tags Work?. (See also her slides from a talk about the project.) This was a clever study in which she collected a bunch of Flickr photos of people spinning on the bull’s balls in Milan. Notice how that fulltext query effectively retrieves a pile of images, taken by different people, of the same curious custom:

If you are passing through the Galleria Vittorio Emanuele II, you should spin around on the testicles of the bull mosaic found in the centre. Legend has it that this will bring you good luck!

Now try this query, which uses the same terms but looks at tags instead of the free text (title, description) associated with the photos. It finds nothing.

Cathy concludes that while many people think tags are effective hooks for information retrieval, they really aren’t.

Of course, those of us who attend conferences where the first order of business is to announce a tag know that tags can be a very effective way to aggregate all the blog postings, tweets, and photos associated with an event. Folksonomies that aren’t intended to converge don’t. Those that are meant to converge do, quite dramatically, which is why I’ve long been obsessed with intentional tagging as an enabler of loosely-coupled collaboration.

In the second half of the conversation we discussed personal digital archiving, curation, benign neglect, and lifestreams. Cathy tells a lot of stories about the ways in which people do, and also don’t, take care of their digital stuff. She observes, for example, that when people lose the contents of a computer, they react initially with horror, but then often feel a sense of relief. It turns out a lot of what was there wasn’t really needed. The burden of culling through it is lifted, and the guilt associated with not doing that culling that goes away.

(I laughed harder than I have in a long time when Cathy described rental storage units as “garbage cans you pay for, and then when you realize you no longer care about the stuff in them, you stop paying for.”)

We ended by agreeing that the hardest thing about introducing a hosted lifebits service ecosystem will be the conceptual model. For psychological reasons, people will want to think in terms of monolithic containers that keep stuff in one place, and monolithic services that do everything related to that stuff. For architectural reasons, though, we’ll want to federate storage, and also decouple classes of service — so that storage, for example, is orthogonal to access control and authorization, which is orthogonal to social interaction.