Motivating people to write the semantic web: A conversation with David Huynh about Parallax

On this week’s Innovators show I got together with David Huynh, whose work with MIT’s Project SIMILE wowed me last year. David recently joined Metaweb. His first project there, Parallax, creates a new way to browse Freebase, the structured wiki that also wowed me earlier last year.

What struck me about SIMILE and Freebase was the way in which both projects cut through the fog of semantic web technologies and terminologies and got down to brass tacks: How do you get people to want to contribute structured knowledge? You have to appeal to natural instincts and, as I explored in my writeups of both projects, they do.

In the case of SIMILE, when somebody sees an interactive, data-rich Exhibit, the natural response is: “Cool! How do I get one of those for myself?” The answer is: “Pretty straightforwardly, by cloning the example you see and then massaging your data into an easily-written format.” The hidden agenda is: “You may not know or care, but once your data is in that format, it can federate.”

In the case of Freebase, the reciprocal nature of data relationships creates a kind of social glue. People who contribute to Wikipedia, or Freebase, or to the web in general, hope those contributions will be read and appreciated. The genius of Freebase is that, when I define a relationship between one of my records and one of yours, both come into view. I may notice something missing from yours and add it. You’ll in turn notice my contribution and may reciprocate. As we advance our own interests, we naturally find ourselves advancing others too.

Parallax, as you can see in the screencast David has embedded on the project’s home page, is a new way to explore Freebase. In the standard interface, you have to do some digging to trace connections to related sets of information. Or you might even have to drop into the API to do programmatic search. Parallax brings those relationships to the surface. One of the examples in the screencast asks and answers the question:

What were the schools attended by children of Republican presidents?

This boils down to a query that finds a set of Presidents, then a set of Republican Presidents, then a set of children, then a set of schools. With Parallax you perform that query interactively, by following links that surface as you pivot from set to set.

This is a great way to browse the structured corpus, but how does it motivate people to provide more and better contributions? Here’s one way: By exposing the completeness — or incompleteness — of sets viewed in relationship to one another.

As I was browsing the set of U.S. Presidents, for example, Parallax surfaced the connection Works written. But there wasn’t much there: Jefferson and Adams for the Declaration of Independence, Madison for the Federalist Papers, and a few recent books by Jimmy Carter and Bill Clinton.

So I created an item in Freebase for Richard Nixon’s Six Crises, linked it to Nixon’s record in Freebase, and went back to Parallax. Sure enough, there were Nixon and Six Crises. The set of books written by U.S. Presidents had increased by one. Along the way, a new book record was created in Freebase, and an existing person record was enhanced.

As I mentioned to David in our interview, this strikes me as a really powerful way to motivate contributors. In Wikipedia there’s no easy way to observe an implicit set. You can only look at explicit sets, like the lists of Presidential ages and religious affiliations. Somebody might decide to make an analogous list of Presidential books, but that would be much more likely to happen if the partial list that’s already implicitly in Wikipedia could be brought into focus.

When shown partial patterns, people naturally want to complete them. Parallax looks like a great way to tap into that instinctive urge.

Posted in Uncategorized

17 thoughts on “Motivating people to write the semantic web: A conversation with David Huynh about Parallax

  1. When you created the entry in Freebase, did it also create a Wikipedia entry? In other words does Freebase create data for Wikipedia or just take from it? Motivation to create will decline if Freebase is one way only IMHO.

  2. I wonder about the corporate applications here.

    As background: 10 years ago, I was working as a consultant for a large semiconductor company implementing Microsoft Site Server 3.0. Site Server, weirdly, addressed two different goals “build an ASP-driven eCommerce site” and “tag your corporate documents so that they can be searched on an Intranet”. Our customer was using it for the document-tagging capability. It came with a tagging tool, which I trained people up on, so that they could add meta-data to Word documents, PDFs, Visio, etc. The problem, though, was that people would not come into work in the morning and think “hmm I think I’ll do some content tagging now”. i.e. Motivation was the problem.

    Over the years i’ve seen other tools for structuring information, and followed the (often arcane) Semantic Web technologies, but the same motivation problem seemed to be there (i.e. “Why bother tagging this content?” and “Why bother structuring this content?”).

    Search engines then of course stepped into the breach in order to sift through the unstructured content.

    I can see how the geeky recognition motivation can work in the case of global knowledge-bases, using the Wikipedia experience. I myself have spent time updating arcane Wikipedia entries, with the only motivation being the geeky recognition that maybe people will read the entries and find them useful.

    But, I’ve often wondered how the same “geeky recognition” motivation could be harnessed for the categorization, tagging, and metadata for corporate content. I mean, let’s say a company has a folder full of RFP response documents. Is there much geeky recognition to be had from tagging or categorizing those corporate documents? No.

    I can certainly see how Freebase’s relationship-based categorization can work. Over the weekend, I was searching Wikipedia to find out which movies had been filmed on the LA Subway. This was not easy. But, it sounds like with Freebase I could have created a link between “LA Subway” and “Movies” and then clarified the information. I’d feel some geeky recognition from doing this. It also sounds to me like how the brain works, i.e. based on links between information, not just the information itself. I think they are definitely on to something there.

    But, I’d go back to the question “how could this be harnessed for corporate information”. I mean, if a company has a folder of RFP responses and also a collection of documents about how their products are used in Telecoms environments, could they link the two together (so employees could search for “RFP responses to telecoms prospects”, or when you search for “Telecoms” you’d see the option to search down through relevant RFP responses). How could people be motivated to make all these links. My lesson 10 years ago was that it is all very well to train people on data categorization, metadata, and (now) linking, but how can you motivate people to do this in the corporate setting [where there is no worldwide audience].

  3. > Motivation to create will decline if
    > Freebase is one way only IMHO.

    Since Freebase entries tend to be composed from smaller atomic parts, none of which would meet Wikipedia’s criteria for inclusion, that’s unlikely to happen.

    It’s a really interesting question as to how stuff /could/ flow back though. Minimally by external linkage, I guess.

  4. > It also sounds to me like how the brain
    > works

    Indeed.

    > how can you motivate people to do this in
    > the corporate setting [where there is no
    > worldwide audience]

    This is the age-old problem for corporate groupware. Part of the answer is to maximize the attention rewards that accrue from self-interested contributions.

    The reciprocity built into Freebase is an intriguing social hack, I’ve argued elsewhere, because in the act of enhancing my stuff I’m exposed to your related stuff which I’m incented to enhance so that it in turn further enhances my stuff. The process leads us naturally to notice our mutual stuff, and can also raise the likelihood that others will notice it too.

  5. Great Interview:

    The irony for me is that RDF and the Semantic Web was supposed to be the machine understandable Web, as opposed to the HTML human facing Web. As it turns out, only the humans can make any sense out of the inconsistently encoded and incomplete data of a Data 2.0 mash-up. Humans actually love to navigate through a network of loosely typed links and chance associations.

    The AI singularity of RDF inference seem a distant dream.

    My thoughts on Mark’s Comments and the possible emergency of Semantic Enterprise are here RDF and Data 2.0

    Guy

  6. Guy,

    AI singularity of the Semantic Web is simply an unfortunate misunderstanding. What is most important to note at this point in the Web’s evolution is the emergence of the relevance of “Linked Data” i.e., structured data that is interlinked using the very essence of the Web (HTTP).

    What is still somewhat difficult to relay to humans is the fact that that nice looking pages don’t increase the number of hours in a day (which will always be 24). The Web is generating information at a rate that far exceeds the processing power of human beings, so the solution ultimately lies in a human-machine realionship that is inherently symbiotic. This is what “Linked Data” facilitates, and in the case of the Parallax example, simple envisage Parallax taking you the the location in a network from where you can optionally beam a query.

    Jon: If you go back to our last podcast, I made a comment about SPARQL and XQuery. SPARQL will find the resources, and XQuery would allow you to get granular with the literal content associated with the discovered resources (if you are so inclined re. drill-down). Parallax is fundamentally unveiling a substrate that is more effectively traversed by agents (constructucted and instructed by humans via new generation Linked Data aware solutions) :-)

  7. the freebasewikipedia issue struck me as well. since a lot of freebase is based off of wikipedia (scraping?), is it ever refreshed? If so, it would make more sense to edit wikipedia and weight for that to flow into freebase, rather than to add it to freebase and have it missing from wikipedia…

  8. > it would make more sense to edit wikipedia
    > and wait for that to flow into freebase

    If both were structured, then yes. But Wikipedia isn’t, so no.

  9. >Parallax is fundamentally unveiling a
    >substrate that is more effectively traversed
    >by agents

    We need to specify what’s meant by effective traversal though. Yes, agents with access to the query layer can query more effectively. But humans with access to a faceted browsing layer can wander more effectively. And it’s the guided wandering that will help motivate them to contribute the stuff the agents can then find.

    > (constructucted and instructed by humans
    > via new generation Linked Data aware
    > solutions)

    Ah. Good. We’re back on the same page finally :-)

  10. Kingsley

    I think RDF and Linked Data is great.

    My only point is that humans like URLs interesting text and pictures and links to other interesting items. They like to create things and interact. Machines like semantic free primary keys, strongly typed data and foreign keys to one-to-many relationships.

    There is a tremendous opportunity in give humans better tools to bridge this gap. However, the idea of intelligent agents and machine learning has done a lot of damage.

    Guy

  11. “What struck me about SIMILE and Freebase was the way in which both projects cut through the fog of semantic web technologies and terminologies and got down to brass tacks: How do you get people to want to contribute structured knowledge? You have to appeal to natural instincts and, as I explored in my writeups of both projects, they do.”

    The real question lost in the fog is about preventing spammers. They have proven incentive.

    jd/adobe

Leave a Reply