Recently I began keeping track of interesting public data sources using the del.icio.us tag judell/publicdata, and invited others to do the same using their own del.icio.us accounts. That method sets up an interesting pattern of collaboration whereby all contributions flow up to the global bucket, tag/publicdata, but individual contributors can curate subsets of that collection according to their own interests.
A nice example of that pattern emerged when the Many Eyes folks showed up at manyeyes/publicdata. Their contributions flowed up to the global bucket, and thence to the RSS feed I’m watching, which is how I got to find out about this excellent survey of a variety of public sources. It was done for a class at the University of Maryland, and it very helpfully characterizes data sources along a number of axes including searchability, browsability, interaction, and formats.
All this is quite straightforward and unsurprising to anyone who’s familiar with social bookmarking — which is to say, still quite unfamiliar to most people today.
So there’s not much chance that the next maneuver I’m going to describe will resonate in the general population, but I want to describe it anyway because those of us who think about these things ought to be thinking about how to make it more discoverable.
Several years ago, in a screencast entitled Language evolution in del.icio.us, I posited that tag vocabularies could evolve in the same way that natural languages do. In the realm of natural language, we coin new words all the time. When we hear a new word that we like, we adopt it — or, perhaps, adapt it. The punchline of the screencast was that this is how the grassroots semantic web will form. There are just two requirements: We need to be able to speak, and we need to be able to hear others speak.
Speaking, in the realm of tag vocabularies, means writing tags, and sometimes creating new ones. Hearing means reading tags, and observing how they’re applied to resources and by whom.
If you land on a page that you haven’t yet bookmarked, you can use the del.icio.us posting bookmarklet to show you (as recommended tags) which other tags have been assigned to that URL.
I tend to rely on a more sensitive organ of hearing: a bookmarklet that I call dc, for del.icio.us conversation. I use it all the time. Suppose, for example, I’d found that University of Maryland page through some other means of referral than del.icio.us. I’d have reflexively clicked the dc bookmarklet to produce this report which shows who else has bookmarked that page, and how it has been described.
In this case there’s not much to see. The URL was bookmarked once in Feb 07, by elzzup, to the tags data and class, and again in Jul 07, by manyeyes, to the tag publicdata.
This view is interesting for a couple of reasons that I don’t think are widely appreciated. First, it shows a progression from general ways of describing the resource to a more particular way. Note, by the way, that the proposed refinement of data to publicdata is not visible when you launch the bookmarking form, which recommends only class and publicdata. Note also that the introduction of publicdata is really a hack. It would arguably be better to rely on the individual tags public and data. But that would make it necessary to query for the conjunction, and that connection is too fragile. So publicdata also suggests something about how to form tags — that is, by making these conjunctions explicit.
Second, it shows who has proposed publicdata — namely, manyeyes, an identity that may be recognized, and that if recognized will add weight to the proposed usage of the tag.
These are subtle effects. For most people, they’re too subtle to matter at all. But I’m reminded that there’s important work yet to be done to render these effects in ways that make it easier for everyone to hear (and visualize) linguistic evolution in the tag domain, so that people can participate more actively and more naturally in that evolution.