Talking with Cathy Marshall about tags, digital archiving, and lifestreams

My guest for this week’s Innovators show is Cathy Marshall, a Senior Researcher in Microsoft’s Silicon Valley Lab. She’s long been intrigued by personal information management — and nowadays, also by its social dimension.

We kicked off the conversation with a discussion of her recent paper Do Tags Work?. (See also her slides from a talk about the project.) This was a clever study in which she collected a bunch of Flickr photos of people spinning on the bull’s balls in Milan. Notice how that fulltext query effectively retrieves a pile of images, taken by different people, of the same curious custom:

If you are passing through the Galleria Vittorio Emanuele II, you should spin around on the testicles of the bull mosaic found in the centre. Legend has it that this will bring you good luck!

Now try this query, which uses the same terms but looks at tags instead of the free text (title, description) associated with the photos. It finds nothing.

Cathy concludes that while many people think tags are effective hooks for information retrieval, they really aren’t.

Of course, those of us who attend conferences where the first order of business is to announce a tag know that tags can be a very effective way to aggregate all the blog postings, tweets, and photos associated with an event. Folksonomies that aren’t intended to converge don’t. Those that are meant to converge do, quite dramatically, which is why I’ve long been obsessed with intentional tagging as an enabler of loosely-coupled collaboration.

In the second half of the conversation we discussed personal digital archiving, curation, benign neglect, and lifestreams. Cathy tells a lot of stories about the ways in which people do, and also don’t, take care of their digital stuff. She observes, for example, that when people lose the contents of a computer, they react initially with horror, but then often feel a sense of relief. It turns out a lot of what was there wasn’t really needed. The burden of culling through it is lifted, and the guilt associated with not doing that culling that goes away.

(I laughed harder than I have in a long time when Cathy described rental storage units as “garbage cans you pay for, and then when you realize you no longer care about the stuff in them, you stop paying for.”)

We ended by agreeing that the hardest thing about introducing a hosted lifebits service ecosystem will be the conceptual model. For psychological reasons, people will want to think in terms of monolithic containers that keep stuff in one place, and monolithic services that do everything related to that stuff. For architectural reasons, though, we’ll want to federate storage, and also decouple classes of service — so that storage, for example, is orthogonal to access control and authorization, which is orthogonal to social interaction.

6 thoughts on “Talking with Cathy Marshall about tags, digital archiving, and lifestreams

  1. Interesting. I haven’t listened yet, but my guess is that Cathy has a bit of an ingrained preference against tags. I say that b/c when looking at the tag-based flickr query that returned no results, I took 10 seconds to ponder it. I went back to the fulltext one, and noted that several of the photos ARE tagged; I saw “milan”, and “balls”, and “bull”. So I changed “bull’s” to “bull” in the tag-based query, and got multiple results:

    Now, her point is still relevant; the tags still don’t get you as many images. But arguably, the failure of “bull’s” vs. “bulls” is just coding…’tag query’ has to understand the same issues of possessives, pluralizations, etc. that text queries have been tackling for years now.

    And the tags do give you some bonuses, including the ability to play with flickr’s “clusters”, machine-tagged metadata (like location), etc. So there is goodness in both places, IMO. I get her point, but there are multiple points of view.

  2. Cathy’s search on tags failed for two reasons: a) nobody ever gives a tag “bull’s” and b) you cannot find in tags, if images are not tagged. So, instead of making a conclusion that tags don’t work, a simpler conclusion — people do not tag as often as they write descriptions.

  3. > nobody ever gives a tag “bull’s”

    Yeah, but that was my (poor) example. Her findings are more nuanced. For example, that people almost never use verbs — like ‘spinning’ — as tags, but do use them in descriptions.

  4. I loved this conversation. (Your interview style is awesome) Especially when you discussed movie/film/cinema. I don’t think I’ve ever seen a film or cinema! but I love the fact that others might see life that way.

    But, in your tags example, I did find that people don’t like apostrophes (or maybe they’re not supported in Flickr tags. but searching for ‘Milan bull balls’ does work.

Leave a Reply