Simile: Semantic web mashups for the rest of us

I was at MIT yesterday to give a talk, and afterward visited with the Simile project team. I’d known a bit about their semantic web efforts, notably Piggy Bank, a Firefox extension that hosts JavaScript-based screenscrapers that extract data from web pages. But there’s been a ton of development since then, and it’s all good.

The best way to describe what I saw yesterday is probably to start with this example. It’s what the Simile team calls an exhibit, which is a web page that performs faceted browsing of a data set. In this case, my example exhibit is actually a mixture of two others: the CSAIL faculty page, and the CCNMTL staff page. If you visit either of those exhibits you’ll find that you can restrict the view by selecting, for example, the group facet. The CSAIL page exposes another view called position. (The CCNMT doesn’t expose its analog to that facet, but if it did, it would be called title.)

View source on either of these pages and you’ll find a very simple chunk of HTML that enumerates and styles the included data elements. You’ll also see references to two JavaScript files. One contains all the AJAX behavior that populates the page with data and drives the interactive experience. The other contains the data, which is a JSON serialization of a simplified form of RDF.

The first thing to notice is that you can rip and replace the data reference. In fact that’s what happened when one of these pages was cloned from the other. The clone then proceeded to rename and specialize its data set, yielded a seemingly incompatible result.

So how did I create a merge of the two? Enter Potluck, a shockingly capable AJAX-style data mixer. I referenced the two data sources, and then combined analogous fields. I merged CSAIL’s position and CCNMT’s title into a single field called position. Even more interesting is the field called building. The CCNMT data has a thing called building, but the CSAIL has nothing really comparable — there’s tower, but that’s not equivalent. In fact, the CSAIL equivalent to CCNMT’s building is the prefix in the CSAIL office field. In an office value of 32-G606, for example, the building is implicitly building 32.

David Hyunh, the original author of Potluck, showed me how to extract that implicit information using a feature called simultaneous editing. Here’s how that works:

The editor groups things into similar columns. You can then adjust an entire column by editing a single entry. Here I’ve inserted “Building” in front of “32” and selected and deleted everything following “32”.

To define a new column you drag/drop fields. To create the merged position field, I dragged position from CSAIL, and title from CCNMT, then renamed the combined result as position.

To define a facet, you drag/drop the column name to another area of the canvas. For my merged exibit, I used the facets origin (i.e., CSAIL vs CCNMT), plus group, position, and building.

Then I exported the merged data into the same JSON format as the original sources, cloned one of the pages, and referenced the merged data set. From there it was just a bit of tweaking to make the div elements in the HTML page reference the facets that I’d defined.

Stunning.

Behind the scenes it’s all RDF, but the point is that nobody needs to know or care about that. And the larger point is that the Simile folks — having spent years fighting ontology wars — have now gone AWOL. The new stance is: Everybody gets to name their fields as they prefer, and mashup tools like Potluck can define equivalences among them. All the original source data, and all the merged data, is available in a common format that translates into grist for the engines in the RDF mill. All the data, and all the interactive behavior associated with the data, is cleanly separated from the presentation.

This is a great boostrap strategy. When faculty group B sees the cool faceted browser that faculty group A has made, B will want one of its own. It can pretty easily figure out how to adapt its data to the format, perhaps with some help from the Babel translator in order to, say, repurpose a spreadsheet. Everybody gets to scratch their own itches, and the environment makes things easy and fun, but under the covers semantic data is being accumulated.

I don’t think that any semantic web skeptic, and I have been one, has ever disputed the value that can emerge when you traverse RDF-style data sets. The question has always been: How will we get people to create those data sets, in ways and for purposes meaningful to them? The Simile team are laser-focused on solving that problem, and from what I can see they’re biting off huge chunks of it with these tools and methods.

I’m not suggesting that ontologies will play no role, but I’ve long believed that we need to evolve toward them from real data that people can create, use interactively, and begin to cross-combine. That’s exactly the approach that Simile is taking. Seeing it in action, and then easily reproducing it myself, totally made my day.

Posted in .

16 thoughts on “Simile: Semantic web mashups for the rest of us

  1. Piggy Bank and so on are interesting, but I think decent libraries for RDF parsing and querying in languages like Python and Ruby will probably be the sweet spot for mashups. For instance, Python’s rdflib has made it very easy for me to build RDF-based applications – certainly a lot, lot quicker than Jena and ARQ.

    If the W3C can work on a SPARQL equivalent for SQL’s UPDATE and INSERT, even more interesting things can happen.

  2. Thanks Jon, from the entire Simile team, what a great writeup. It is such a delight when other folks “get it.”

  3. Hi Jon,

    Nice write-up of the Simile tools, there’s a lot of great work being done there.

    I wanted to comment on your suggestion that a stance of “Everybody gets to name their fields as they prefer, and mashup tools like Potluck can define equivalences among them” is somehow going “AWOL” from the semantic web approach. To me, that seems to be a key part of the whole effort, that one doesn’t need to fight ontology wars to gain adoption; the fact that some people do fight those wars, or think thats a necessary step, is a separate issue.

    I’ve been trying to explore some benefits of RDF toolkits for data merging, and tease out some of these issues in some recent blog postings:

    http://www.ldodds.com/blog/archives/000314.html http://www.ldodds.com/blog/archives/000316.html

    I totally agree though that no-one (or users anyway) really need care about the RDF underneath all this, it should ultimately fade into the background.

Leave a Reply