On this week’s Interviews with Innovators show I spoke with Jeff Jonas whose work (and narration of that work on his blog) first captured my interest in 2007.

If you follow Jeff you’ll know what he means when he uses phrases like perpetual analytics, non-obvious relationship awareness, semantic reconciliation, sequence neutrality, and anonymous resolution. If not, and if you’re interested in how we can connect the dots across siloes of data, I recommend that you peruse his blog first and then listen to this interview, which clarifies a couple of points I’d been wondering about.

One of Jeff’s tenets is that new information has be able to answer old questions, and answer them in near-realtime. On the face of it that seems impossible. How can you compare a newly-ingested fact with every existing fact in a database, and run every imaginable query?

Well of course you can’t, and don’t, visit every record in the database. You consult an index, and the interesting question becomes: What kind of index? In Jeff’s world, it’s an index based on keys that represent entities (people, places, organizations) and “features” (locations, relationships). And these entities are fuzzily defined. I think of them as clouds of associations. So for example the key for Jon Udell would point to items where Jon is misspelled as John. Most systems abhor this kind of variation, but Jeff embraces it, and I find that fascinating.

Another intriguing idea was reported by Phil Windley in his write-up on Jeff’s ETech talk:

Jeff treats query as data. When a query is made against the context, and gets no response, it’s stored in the database. Later if data shows up that matches the query, you get a match. Treating queries like data makes it so you don’t have to ask every question every day.

Here again, I wondered how you avoid running every query against every new fact. What does it mean for data to “match” a query? Part of the answer, as I understand it, is that both queries and data are indexed semantically, using keys that encompass clouds of associations.

Another part of the answer emerged in this interview. You have to be really sure about those associations. If you put a John Udell record into the Jon Udell bucket, you had better be certain that this is a legitimate misspelling in an item that refers to a particular instance of Jon Udell (i.e., me, not this guy), rather than a legitimate reference to one of the John Udells.

Now that I know about this constraint, the whole thing makes more sense.