Collaborative curation as a service

This week my ongoing fascination with Delicious as a user-programmable database took a new turn. Earlier, I showed how I’m using Delicious to enable collaborative curation of the set of feeds that drives an aggregation of community calendars.

The service I’m building in this ongoing series has so far collected calendars only for a single community — mine. But the idea is to scale out so that folks in other communities can use it for their own collections of calendars.

As I refactored the code this week to prepare for that scale-out, I thought about how to manage the configuration data for multiple instances of the aggregator. This is a classic problem, there are a million ways to solve it, and I thought I’d seen them all. But then I had a wacky idea. If I’m already using Delicious to enable community stakeholders to curate the sets of feeds they want to aggregate, why not also use Delicious to enable them to manage the configuration metadata for instances of the aggregator?

Here’s a way to do that. Consider this URL:

http://delicious.com/elmcity/metadata

It refers to an URL that doesn’t actually point to anything — click it and you’ll see that for yourself. So it’s really an URN (Uniform Resource Name) rather than an URL (Uniform Resource Locator).

But even though it doesn’t point to anything, it can still be bookmarked. The owner of the elmcity account on Delicious can click Save a Bookmark and put http://del.icious.com/elmcity/metadata into the URL field.

Now you can attach stuff to the bookmark, like so:

Here the title of the bookmark is metadata, and the tags are these strings:

tz=Eastern
title=events+in+and+around+keene
img=http://elmcity.info/media/keene-night-360.jpg
css=http://elmcity.info/css/elmcity.css
contact=judell@mv.com
where=keene+nh
template=http://elmcity.info/media/tmpl/events.tmpl

These strings are, implicitly, name=value pairs. The service that reads this configuration data from Delicious can easily make them into explicit names and values. But how does it find them? By looking up the metadata URL, like so:

delicious.com/url/view?url=http://delicious.com/elmcity/metadata

That request redirects to the special Delicious URL that uniquely identifies the bookmark:

delicious.com/url/9ee9d2e51e4f36d4d49207e1675b3cbb

Of course the service doesn’t want to dig the name=value pairs out of that web page. So instead it reads the page’s RSS feed:

feeds.delicious.com/v2/rss/url/9ee9d2e51e4f36d4d49207e1675b3cbb

To prove that it works, check out this prototype version of the elmcity calendar. That page was built by an Azure service that reads configuration data from the bookmarked URN, and interpolates the name=value pairs into the template specified in the metadata.

Is this crazy? Here are some reasons why I think not.

First, I’m embracing one of a programmer’s greatest virtues: laziness. Why write a bunch of database and user-interface logic just to enable folks to manage a few small collections of name=value pairs? Delicious has already done that work, and done it much better than I could.

Second, the configuration data lives out in the open where stakeholders can see it, touch it, and collaboratively manage it. There are all kinds of ways Delicious can help those folks do that. For example, anyone who cares about this collection of data can subscribe to its feed and receive notifications when anything changes.

Third, it’s easy to extend this model. For example, part of the workflow will entail one or more stakeholders deciding to trust a feed and put it into production. As you may recall, the service trusts a feed when it’s bookmarked with the tag trusted. Part of that approval process will involve making sure that there are URLs associated with events coming from the feed. Some iCalendar feeds provide them, but many don’t.

So in addition to the configuration that’s needed once for each instance of a community aggregator, there’s a bit of configuration that’s needed once per feed. If a feed doesn’t provide URLs for individual events, you can at least provide a homepage URL for the feed. And this piece of metadata can be managed in the same way. Here’s the bookmark for the Gilsum church. It carries the tag url=http://gilsum.org/church.aspx. As you browse around in a set of trusted feeds, it’s pretty easy to see which ones do and don’t carry those tags, and it’s pretty easy to edit them.

It all adds up to a ton of value, and to capture it I only had to write the handful of lines of code shown below.

Now I’ll grant this way of doing things won’t work for everybody, so at some point I may need to create an alternative. And since I don’t want to depend on Delicious being always available, I’ll want to cache the results of these queries. But still, it’s amazing that this is possible.


public Dictionary<string, string> 
  get_delicious_feed_metadata(string metadata_url, string account)
  {
  var dict = new Dictionary<string, string>();
  var url = string.Format("http://delicious.com/url/view?url={0}", 
    metadata_url);
  var http_response = Utils.FetchUrlNoRedirect(url);
  var location = http_response.headers["Location"];
  var url_id = location.Replace("http://delicious.com/url/", "");
  url = string.Format("http://feeds.delicious.com/v2/rss/url/{0}", 
    url_id);
  http_response = Utils.FetchUrl(url);
  var xdoc = Utils.xdoc_from_xml_bytes(http_response.data);
  string domain = string.Format("http://delicious.com/{0}/", account);
  var categories = from category in xdoc.Descendants("category")
                   where category.Attribute("domain").Value == domain 
                   select new { category.Value };
  foreach (var category in categories)
    {
    var key_value = Utils.RegexFindGroups(category.Value, 
      "^([^=]+)=(.+)");
    if (key_value.Count == 2)
      dict[key_value[0]] = key_value[1].Replace('+', ' ');
    }
  return dict;
  }
Posted in Uncategorized

24 thoughts on “Collaborative curation as a service

  1. Thanx for the code!

    I really like the idea of social annotation/curation of facts, and experimented a bit with doing such for molecules, using they InChI for a unique identifier:

    http://rdf.openmolecules.net/?InChI=1/CH4/h1H4

    Because I did not get delicious to play nice back then, I actually used Connotea… but going to checkout your code, and see if I can make Delicious work this time too :)

  2. One thing…would you not, at least potentially, want to restrict your categories list to being generated by particular Descendants? (designated by the value of the dc:creator attribute, probably.)

    I’m thinking (I don’t even want to test this, b/c I don’t want to break anything of yours) that I could, for example, inject categories into your system by bookmarking that same URN and tagging it.

    A quick glance at that code doesn’t appear to filter based on a creator, though that would probably be easy enough, and you may already even be doing it. (Looks like the domain w/in category also would have the relevant info, though you’re parsing all the items to get that).

    Great idea, still! This is just an implementation nit, really; if you were setting the resource w/in a system where an account could control the bookmarking of URNs w/in it’s own namespace, the problem would go away.

  3. > One thing…would you not, at least
    > potentially, want to restrict your
    > categories list to being generated by
    > particular Descendants?

    This is, and only needs to be, a way to find Delicious tags in a Delicious feed.

    > if you were setting the resource w/in a
    > system where an account could control the
    > bookmarking of URNs w/in it’s own namespace

    If you go back and look at http://delicious.com/elmcity/metata you will see that making use of exactly that sort of self-referential strange loop.

  4. > This is, and only needs to be, a way to
    > find Delicious tags in a Delicious feed.

    Hm…right. I’m not 100% sure if you’re telling me you see the issue and have worked around it, or if I’m not explaining well. For clarity, I just added some pretty innocuous tags to the URN’s delicious tags. See:

    http://feeds.delicious.com/v2/rss/url/9ee9d2e51e4f36d4d49207e1675b3cbb

    If I’m reading the code right, those should show up in the categories var above now. I’m not sure if pulling that out via xdoc is ordered or not…it may be that your name=value pairs “trump” mine b/c they were created earlier in time. But that’s the point I was trying to show.

  5. Fascinating. I hadn’t considered that somebody else would bookmark and tag the URN.

    But of course it’s possible, as you’ve shown.

    OK, I get it.

    So I need to alter the metadata fetcher slightly. It was assuming there would be only one item in the RSS feed. Now there are two.

    One has:

    <dc:creator><![CDATA[kkennedy]]></dc:creator>

    The other has:

    <dc:creator><![CDATA[elmcity]]></dc:creator>

    As you suggested, I’ll need to tweak metadata fetcher to only pay attention to the item whose creator matches the account that’s intended to control each instance of the metadata URN.

    There are many degrees of collaborative freedom here, but this is NOT intended to be one of them!

    Thanks, Ken, for a) discovering that, and b) persistently pointing it out until I finally got it!

  6. No problem, Jon! I thought that I wasn’t quite being clear enough…I just was a little reluctant to stick data into your feed to make the point.

    > There are many degrees of collaborative freedom
    > here, but this is NOT intended to be one of them!

    Agreed! *grin* Sometimes, there ARE lines.

    Nevertheless, this “delicious as database” concept is really powerful. I’m planning on shamelessly er…borrowing it for a similar type of project. It’s similar in some ways to several of the “schema-less document dbs” floating around now (like CouchDB, Amazon SimpleDB, etc.), but has the advantage of APIs and interfaces that are more mature. Also, the clever ideas like the /view/url?=xx request gives a great deal of flexibility. They’ve backed into a pretty powerful general purpose tool here, just because of the utility of URNs.

    Thanks so much for your open model for both coding and the design ideas behind the coding. It’s really enlightening and educational.

  7. > It’s similar in some ways to several of
    > the “schema-less document dbs” floating
    > around now (like CouchDB, Amazon SimpleDB,
    > etc.), but has the advantage of APIs and
    > interfaces that are more mature.

    I guess the main advantage I see is, if you can get folks to grok the model and use it, there is no database or UI code to write.

    > Thanks so much for your open model for
    > both coding and the design ideas behind
    > the coding. It’s really enlightening and
    > educational.

    You’re very welcome. BTW I added a WHERE clause to the LINQ query above which fixes the bug you found. Thanks again for pointing that out.

  8. > I guess the main advantage I see is, if you
    > can get folks to grok the model and use it,
    > there is no database or UI code to write.

    Totally. And that’s huge. I’ve finally reached the point where I can cleanly separate projects that are intended to teach me a low-level technology (where I’d want to hack on the db code, for personal development purposes) from those projects where I’m just “Getting Stuff Done”, and I’m perfectly happy to have delicious (or whatever) carry the load for me.

  9. It’s a good idea from my perspective. As always, needs to be balanced against requiring folks to type more than the minimum necessary to get the job done — which, in this case, is to get the feeds collected and flowing.

Leave a Reply