Databasing trusted feeds with del.icio.us

In my last entry, I sketched a strategy for maintaining lists of the Eventful and Flickr accounts that I consider trusted sources for the elmcity.info event and photo streams. I didn’t spell out exactly how I plan to maintain those lists, in the Azure rewrite of the service that I’m now doing, but David Hochman read my mind:

It sure would be interesting to syndicate those lists from a trusted del.icio.us feed, leveraging tags as a public data store, and allowing others to trust your trusted lists.

It sure would. And that’s just what I’m doing.

Part One: The User’s View

Here’s the del.icio.us account:

delicious.com/elmcity

Here are the trusted ICS feeds:

elmcity/trusted+ics+feed

Here are the trusted Eventful contributors:

elmcity/trusted+eventful+contributor

Here are the new Eventful contributors — that is, ones I’ve not yet marked as trusted:

elmcity/new+eventful+contributor

This is wildly convenient in several ways. For starters, I get a feed of new Eventful contributors for free:

feeds.delicious.com/v2/rss/elmcity/eventful+new+contributor

Anyone who subscribes to that feed is alerted to the appearance of a previously-unseen contributor of events within 15 miles of Keene. Here’s one:

eventful.com/users/jheslin

Clicking that link reveals that jheslin has created one venue, but so far no events. That’s not enough evidence on which to base a trust/no-trust decision. So what I’d do, in that case, is just delete the del.icio.us bookmark. If the aggregator were to see another event from jheslin, he (or she) will show up again in the feed. In that case, if jheslin has created events that look legitimate, I can decide to trust him (or her). How? Trivially, by editing the bookmark and changing the new tag to trusted.

That’s easy enough, but I don’t want to be forever responsible for monitoring this feed and making trust decisions. And thankfully I needn’t be. When I delegate that job to somebody else, I’ll just need to transfer the credentials to the del.icio.us/elmcity account, and explain what it means for an Eventful account to be bookmarked at del.icio.us/elmcity with a new or trusted tag, and how to decide when to promote an Eventful account from new to trusted.

The same technique can apply to other account-based event sources — for example, upcoming.org. It also applies to feed-based sources. I’ve been encouraging event publishers in Keene to create iCalendar feeds. Those feeds have URLs, and to include them in the aggregation, somebody just needs to bookmark them under the elmcity account with the tags trusted and ics and feed. Like this.

Same for new and trusted Flickr accounts that feed the photos page, for blogs that feed the blog directory, and for any other class of resource that might be contributed.

Part Two: The Developer’s View

Notice that I haven’t had to write any Web forms, any Ajax code, any database CRUD (create/read/update/delete) logic. Del.icio.us, a database with a Web user interface, takes care of all that. Which is fine by me, because life’s too short to write any more CRUD or Web UI than I have to. I’d rather do more interesting things.

By the same token, life’s too short to write more than a few lines of code to drive the CRUD apparatus. As I mentioned last time, I’m writing the core of the Azure event aggregator in C# rather than Python, because IronPython isn’t yet ready for prime time on Azure. I worried that a C# implementation would be too verbose, but I’ve been pleasantly surprised.

Here’s a C# method that reads a del.icio.us RSS feed and returns a dictionary (aka hashtable, aka associative array) of titles and links:

00 const string rssbase = "http://feeds.delicious.com/v2/rss/elmcity";

01 public static Dictionary<string,string> get_delicious_feed(string args)
02  {
03  var dict = new Dictionary<string,string>();
04  string url = String.Format("{0}/{1}", rssbase, args);
05  var response = Utils.FetchUrl(url);
06  var xdoc = Utils.xdoc_from_xml_bytes(response.data);
07  var items = from item in xdoc.Descendants("item")
08  select new { Title = item.Element("title").Value,
09     Link = item.Element("link").Value, };
10  foreach (var item in items)
11    dict[item.Link] = item.Title;
12  return dict;
13  }

The Python equivalent is more concise, but not by much. I am, admittedly, deferring any discussion of the Utils class which I’m using to make the .NET Framework’s HttpWebRequest/HttpWebResponse classes feel more Pythonic to me.

Also noteworthy here is the use of the generic collection class, Dictionary (lines 3, 11, 12), instead of the more Pythonic (and Java-like) Hashtable. I’ll also defer discussion of tradeoffs between Dictionary and Hashtable until I’ve learned more about them.

Finally, I’ll defer discussion of the LINQ-to-XML idioms (lines 6-10) until I’ve learned more about the tradeoffs between LINQ-to-XML and the XPath style which I’m more familiar with, and which is more widely available.

For now, I’ll just observe that this C# method is readable, debuggable, and Azure-deployable.

Here are some of the ways the above method will be used in the service:

get_delicious_feed("trusted+feed+ics")
get_delicious_feed("trusted+eventful+contributor")
get_delicious_feed("new+flickr+contributor")

For example, here’s the method that the aggregator uses to check whether or not to include an Eventful event contributed by a given Eventful account:

01 public static bool isTrustedEventfulContributor(string accountname)
02  {
03  var dict = get_delicious_feed("trusted+eventful+contributor");
04  var re = new Regex("eventful.com/users/([^/]+)/created/events");
05  return match_url(dict, re, accountname);
06  }

The regular expression at line 4 matches URLs like this:

eventful.com/users/judell/created/events

If you check the corresponding Eventful page you’ll see why the aggregator posts bookmarks with addresses in this format. That way, the human who’s monitoring the feed can easily click through to eyeball the events created by a new user whose legitimacy needs to be checked.

To see how isTrustedEventfulContributor makes its yes/no determination, we need to unpack the match_url method. Here’s the first version I wrote:

private static bool match_url(Dictionary<string,string> dict, 
  Regex re, string url)
  {
  bool isTrusted = false;
  Match m;
  foreach (string key in dict.Keys)
    {
    m = re.Match(key);
    if (m.Groups[1].Value == url)
      {
      isTrusted = true;
      break;
      }
    }
    return isTrusted;
  }

This worked, but didn’t have the concise, functional, Pythonic feel that I like. So I went back to the drawing board and came up with another version:

private static bool match_url(Dictionary<string,string> dict, 
  Regex re, string url)
  {
  var keys = dict.Keys.ToList();
  var matched = keys.FindAll(x => re.Match(x).Groups[0].Value == url);
  return matched.Count == 1;
  } 

This works identically, and it’s much closer to what I’d do in Python: Filter a list using a lambda expression.

Part Three: Conclusion

If you’re not a programmer — and in particular, a programmer who would be interested in Azure, or in a comparison between C# and Python — your eyes glazed over when you got to part two. That’s fine. There’s still an important takeway for you. Del.icio.us (and any del.icio.us-like service) is a database! You can use it, without doing any programming, to maintain lists of arbitrary sets of resources that can be queried and edited, with equal ease, by humans and by programs.

Whatever you can identify with a URL is fair game. You can invent your own simple business logic by defining rules for what tags to use, and when and how to change them. You can monitor RSS feeds, in any feedreader, in order to be alerted when monitored items change. You can share or delegate the work by sharing or delegating access to the del.icio.us account. And last but not least, when you need to get a programmer to make use of this database you and your collaborators have built, that person’s job will be drop-dead simple.

17 Comments

  1. Nice discussion of architecture — wish there were more like these that I could point my students at :-). One question, though: at what point do you think it’s useful or essential to include provenance information in a system like this, so that if A gets an item from trusted source B, A can tell that the original source was actually Q?

  2. “One question, though: at what point do you think it’s useful or essential to include provenance information in a system like this, so that if A gets an item from trusted source B, A can tell that the original source was actually Q?”

    It’s essential. That’s why, on the events page, every link points back to its source. A user has as much info on which to evaluate the site administrator’s feed-trust decision as does the decision-maker.

  3. “do you think you could further reduce the amount of code needed if you used something like yahoo pipes?”

    I often do use Pipes for feed-splicing, so maybe yes. OTOH in this case the splicing, fetching, and combining of feeds are the core activity of the service. Makes sense to have 100% control over that activity rather than outsource it.

  4. “use of the generic collection class, Dictionary”

    As a fellow Python fan who has to write way more C# than Python, there’s an even more generic collection type, System.Collections.Generic.Dictionary. Besides being a fine example of Internet Pedantry on my part, the generic dictionary allows you to specify the type of the key and value which saves the boxing and unboxing from string to object and back again, which improves performance on large enough datasets.

    Great idea, btw. I’ve been thinking of moving my error logging over to private Twitter accounts to buy me free SMS & email alerts without having to write any code. An additional benefit: a false sense of security when Twitter’s down for a week.

  5. “there’s an even more generic collection type, System.Collections.Generic.Dictionary”

    Yeah, that’s what I’m using, in some places — including here, actually, but WordPress ate the angle brackets! I’m putting them back now.

    Elsewhere I’m using System.Collections.Hashtable. It’s an opportunity to compare the ways in which declarative typing can help or hinder.

    “An additional benefit: a false sense of security when Twitter’s down for a week.”

    :-)

  6. Here’s a group using Del.icio.us for a similar purpose. This is a great model.

    http://www.propublica.org/article/how-you-can-help-us-flag-great-journalism-now-even-easier-1230

    If you have an account with the bookmarking site Delicious, now all you have to do is tag an article “PPlinks” and we’ll see it right away. (If you don’t already have an account, Delicious is just a handy way to share links. It’s easy to sign up for. And no, we’re not getting any Delicious-y kickbacks.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s