On this week’s Interviews with Innovators show I spoke with Jeff Jonas whose work (and narration of that work on his blog) first captured my interest in 2007.
If you follow Jeff you’ll know what he means when he uses phrases like perpetual analytics, non-obvious relationship awareness, semantic reconciliation, sequence neutrality, and anonymous resolution. If not, and if you’re interested in how we can connect the dots across siloes of data, I recommend that you peruse his blog first and then listen to this interview, which clarifies a couple of points I’d been wondering about.
One of Jeff’s tenets is that new information has be able to answer old questions, and answer them in near-realtime. On the face of it that seems impossible. How can you compare a newly-ingested fact with every existing fact in a database, and run every imaginable query?
Well of course you can’t, and don’t, visit every record in the database. You consult an index, and the interesting question becomes: What kind of index? In Jeff’s world, it’s an index based on keys that represent entities (people, places, organizations) and “features” (locations, relationships). And these entities are fuzzily defined. I think of them as clouds of associations. So for example the key for Jon Udell would point to items where Jon is misspelled as John. Most systems abhor this kind of variation, but Jeff embraces it, and I find that fascinating.
Another intriguing idea was reported by Phil Windley in his write-up on Jeff’s ETech talk:
Jeff treats query as data. When a query is made against the context, and gets no response, it’s stored in the database. Later if data shows up that matches the query, you get a match. Treating queries like data makes it so you don’t have to ask every question every day.
Here again, I wondered how you avoid running every query against every new fact. What does it mean for data to “match” a query? Part of the answer, as I understand it, is that both queries and data are indexed semantically, using keys that encompass clouds of associations.
Another part of the answer emerged in this interview. You have to be really sure about those associations. If you put a John Udell record into the Jon Udell bucket, you had better be certain that this is a legitimate misspelling in an item that refers to a particular instance of Jon Udell (i.e., me, not this guy), rather than a legitimate reference to one of the John Udells.
Now that I know about this constraint, the whole thing makes more sense.
When I do an ego search (ok, not often but for amusement and to see what no-follows I need to add to push useless self-references off the top page), I am always amused by the “Did you mean to search for ‘orchid’?” and the sidebar ads for florists.
So, in annotating the web with semantic information, we have to do better than that, but it seems like a labor-intensive activity and one that is doomed when the wisdom of the crowd swamps the correct association.
Name confusion seems like an interesting problem in the absence of objects that represent the partitioning. There are a lot of D E Hamilton and Dennis E Hamilton entries in phone books around the country. Some of those listings are for me at different locations at different times. There have even been incorrect listings that were for me. And I receive an occassional call from someone looking for someone with a similar name or who just might be related to me.
We manage with this noise, although sticky identity confusion can be a problem (no-fly lists and misdirected liens and warrants come to mind). I think we cannot expect perfection. But how do we obtain the flexibility needed for corrections and improvement? I wonder how quickly that becomes the question. We’ll have to find out.
the issue here is to get past words and focus on their meaning – rather their content, context, and intent. it does not matter if it is john or jon as long as know it is the same person. and the only way that happens is by associating meta-tags and tags that uniquely identify a person regardless of the horrible work we do mispelling their names.
now, that is cool stuff…