Like a moth to the Freebase flame

Freebase is aptly named, I am drawn like a moth to its flame. I realize it can be annoying to discuss things that folks can’t try out for themselves, and I can’t (yet) do anything about that, but I hope that a few more observations will be welcome.

The comment attached to my first item about Freebase, by Metaweb’s Chris Maden, provides an enlightening glimpse into how knowledge gardening in a structured wiki like Freebase will differ from its counterpart in an unstructured wiki like Wikipedia. Here’s what Chris had to say about the Freebase record for me, which I had tweaked:

I noted that his place of birth was “Philadelphia,” which was odd; our cities tend to be named with their state included. Sure enough, “Philadelphia” had been created accidentally by some other user as a “location,” and then Jon had reused it. So I:
1) Changed Jon’s place of birth to “Philadelphia, Pennsylvania” (which is a “location” and a “city/town”).
2) Added a type to “Philadelphia”: “duplicate.”
3) Added a property to “Philadelphia”: it is a duplicate of “Philadelphia, Pennsylvania.”
4) Removed the “location” type from “Philadelphia” to keep it from coming up in autocomplete for other location properties.
By marking it as a duplicate, if someone does end up using it, our topic merge tool can find it and its namesake and combine their properties. This will be more heavily automated as we gain confidence in our detection algorithms.


Emboldened by this narrative, I created my first user-defined Freebase type. Because the system is so new, there are some quite fundamental things that (so far as I can see) haven’t yet been defined. I wanted to create entries for some of my personal projects, such as LibraryLookup and, so I created a type called Project and added the properties Goal and Collaborators. That enabled me to add entries for my two personal projects, describe their goals, and associate myself with them as a collaborator.

But as I said, it’s the social dimension that’ll kick this whole thing into high gear. When I did a text search in Freebase for the word “project” a bunch of things fell out, including the Helix digital media framework. The original Freebase record, sourced from Wikipedia, was typeless. I promoted it to an instance of Project, and by doing so I’ve invited anybody who visits that record to add a Goal and some Collaborators.

I’m not one of those collaborators, but I have an interest in the project and would like to be able to discover who’s working on it. More broadly, I’d like to be able to answer questions like: “Who among the Helix collaborators is also working on .NET projects?”

I can’t answer that question now, and I may never be able to in Freebase or its imminent competitor, Radar Networks. But the point is that it cost me very little to declare Helix as a Project — onced the type was defined, that is — and that provides an immediate benefit just to me. As with social bookmarking, the act of public annotation is a useful aid to memory and recall.

If my invitation to contribute structured data about Helix is accepted by others, that’d be great. But there too, enlightened self-interest can be the prime mover, as it should be. By leaving their fingerprints on things that they care about, people can shape those things for their own purposes. When those fingerprints lead to mutual discovery and collaboration, that’s icing on the cake.

Of course there are all kinds of things that we care about, and would like to declare to be related. For example, I’ve recently been watching these two trend lines, which chart the relative fortunes of and in the Technorati ranking system:

I’d love to declare once that that these two blogs are related to me, then ask Technorati and a bunch of other services to refer to that relationship. Maybe that’ll happen sooner than I thought.

Posted in .

26 thoughts on “Like a moth to the Freebase flame

  1. Ooh! Evolving schemas and the sociology of annotation. Mon dieu, moths to the flame is the right metaphor.

    Interestingly the Google folk are grappling with the same issue when it comes to Google Base as they watch the evolution of the semi-structured data folks are using. There are a couple of papers (pdfs via Greg Linden) that are incidentally fascinating in this respect. A quote that should give a frisson of data modeling:

    “We are witnessing a kind of database design for the masses”

  2. Jon – exciting to see a mainstream (soon) system that enables: a) association between concepts and b) mashup and extension. Last fall, homeland security research led to a prototype implementation of Semantic Mediawiki extensions to MediaWiki in Washington, DC. SMW enables structured encoding similar to your description of FreeBase.

    We’re collecting the dots, now let’s connect the dots. With a practical means to express relationships, vocabulary becomes central. Terms such as ‘Project’, ‘Goal’, ‘Location’, along with consistent usage and meaning, enable not only first order linkage, but extended machine-based inference and discovery.

    At this vantage point, one can see the next step for search engines – augment probabilistic results with deterministic findings (based on semantic encoding using common vocabularies). Do this both for semistructured web content and transactional relational databases. The emergent picture is Search as 21st century replacement for SQL. Cool.

    Although DC systems are secured, I am sponsoring a public site at where open dialog/collaboration on this subject will be possible and encouraged.

  3. I’ve been trying to think of something to do with freebase for months, I’m glad to have an excuse. Reading through this I thought to myself, “Doesn’t Jon really mean Software Project”?

    There are various ways to represent the relationship between type and sub-type. First I just tried to rename Project to Software Project, but the Project type is owned by judell, so I don’t have permission to do this (in fact, it’s not clear to me if Jon can rename the type, surely he can…).

    Next, I thought it would be nice to have Software Project be a subtype of Project. But I’m not sure how to do this, and I gave up and went back to my Real Work.

    This is fascinating stuff, I think one of the biggest gating factors for Metaweb building a thriving community is going to be how well they document (or teach through osmosis) how this sort of correction/elaboration/increase in specificity should happen between users. They’ve done lots of terrific work, but I think there’s more that needs to be done!

  4. Dan Thomas commented –

    “Terms such as ‘Project’, ‘Goal’, ‘Location’, along with consistent usage and meaning, enable not only first order linkage, but extended machine-based inference and discovery”

    This could be a problem, as more and more people start to play. Even one person has trouble keeping the meaning of tagging terms straight over time, and a group will much more trouble with the semantics.

    This is an area I am quite interested in, and that I started to study by investigating my own browser bookmarks.

    Dan also said “Search as 21st century replacement for SQL”. I think there’s a lot in this notion, especially when you combine it with good quality clustering of results.

    Now to look at those papers referenced by Koranteng Ofosu-Amaah!

  5. Jon has defined these types, so by default they are within Jon’s private “domain” – but the next idea behind the social schema development is that Jon could open up his domain to other specific users who have offered valuble insight into that particular schema. Those folks could work together to refine Jon’s “Project” schema.

    If this schema proves valuable to many users of Freebase, that whole schema (including Goal, Collaborators, and whatnot) could be promoted to be one of the more public schemas, with Jon as one of the moderators of that schema. Jon couldn’t control what topics are ‘typed’/tagged with that schema, but he can define the schema itself.

    And as an aside, there isn’t really a notion of class inheritance in schemas – it’s a pretty flat space, like the way there isn’t really inheritance in most tagging systems.

  6. “If this schema proves valuable to many users of Freebase, that whole schema (including Goal, Collaborators, and whatnot) could be promoted to be one of the more public schemas.”

    Ah. I meant to raise that question. Thanks for anticipating and answering.

  7. based on the permissions issues described here, along with everything else i’m seeing, theyve decided to go the boring route and have a single HEAD state on the graph. i made something virtually identical to the current metaweb in my spare time 6 months ago using jquery, camping and redland, which is i guess why it greets me with a big yawn “you mean i need an account? the datas in their server using their proprietary protocols/formats?”

    they should use their PhDs and money to do something unique. like let anyone edit anything, and use reification on all the statements in the system with digg/pgp style trust/provenance to allow for completely dynamic views of any resource in the system.

    a replacement for rubyonrails allowing metaweb-style editing of model, schema, forms, and instances will show up in a few weeks and who knows, wikipedia might fix their GUI for editing metadata as well…which is a good thing, since some day metaweb might make something exciting..

  8. I just wanted to comment on subtyping/inheritance. As Alec says, there isn’t a notion of inheritance, but we do allow schema to “include” other schemas. So in Jeffery’s example, if he said that Software Project included Project, everything that was typed Software Project would automatically be typed Project as well — thereby getting all the Project properties. This sort of mimics the benefits of inheritance without having the strict requirements that that implies.

  9. “I would have expected your new url to go up and the old to go down”

    It’s like golf, lower scores are better. But I agree that a different visualization might be better.

  10. “how well they document (or teach through osmosis) how this sort of correction/elaboration/increase in specificity should happen between users.”

    You’ve nailed it with “teach through osmosis.” Wikipedia is a great example. Although it’s notionally “easy” to maintain a wiki page, requiring “no knowledge of HTML,” the fact is that Wikipedia editors have mastered all sorts of really complex syntax, which is in turn embedded in all sorts of really complex protocols. Understanding how such a community of practice bootstrapped itself will be crucial for Metaweb or any similar endeavor.

  11. I agree with Dan Thomas and Thomas Passin. You’ve really hit the failure point of most semantic web efforts, and that is the need/desire to maintain consistent world views. Different world views are part of what drives the cultural diversity of the planet. Meaning is not unitary.

  12. More like “ontology design for the masses”… I’m wondering what effect things like Freebase will have on projects like Protege ( and what lessons Freebase can learn from things like Protege. Protege uses graphs extensively to represent relationships–would Freebase benefit from an alternate graph-based UI?

    I also found it informative to try translate Protege’s Ontologies 101 ( tutorial into Freebase. The two are effectively different ways of doing the same thing.

    I’m most excited about the evolution of a query language for Freebase. Any thoughts about this?

  13. Pingback: Nodalities

Leave a Reply