Data finds data, then people find people

If you plug the quoted phrase “the data finds the data” into any of the search engines, the first hit will be one of several essays on Jeff Jonas’ blog. Other evocative phrases that lead to Jeff’s blog include “perpetual analytics”, “sequence neutrality,” and “persistent context,” but while those will soon resonate once you scratch the surface of Jeff’s work, none is as broadly compelling as “the data finds the data.” As sound bites go, that one’s a keeper.

Jeff Jonas is chief scientist for IBM’s Entity Analytic Solutions. His long career in data surveillance, and recent interest in privacy-respecting data surveillance, has drawn a lot of media attention lately. In the mainstream he’s appeared in Newsweek and on NPR. In the techsphere, Tim O’Reilly blogged about Jeff’s visit to PC Forum, Dan Farber interviewed him at the Web 2.0 conference and Phil Windley wrote a detailed review of his keynote at ETech 2007.

Given our shared interests — including surveillance, analytics, security, privacy, and manufactured serendipity — it’s surprising that I only recently became aware of Jeff’s work. Of course, we’ve been working different ends of the same street. He’s focused on finding bad guys: casino fraudsters, terrorists, and others who collaborate secretly. I’ve focused on helping people who collaborate openly do so more effectively. And yet…these really are two sides of the same coin.

Here’s an example of “the data finds the data” in Jeff’s world, from his article in IEEE Security and Privacy entitled Threat and Fraud Intelligence, Las Vegas Style. You have two records that refer to the same person, but you don’t know that they do. Then a third record appears which relates to each of the first two, and which establishes that all three refer to the same person. The first two pieces of data find one another, through the agency of a third piece of data.

Here’s an example of “the data finds the data” in my world. On June 17 I bookmarked this item from Mike Caulfield, who is a local friend, the webmaster at Keene State College, and a forward thinker about Net-enabled education. On June 19 I noticed that Jim Groom — who is a distant acquantance at the University of Mary Washington and another forward thinker on the same topic — had responded to Mike’s post. Ten days later I noticed that Mike had become Jim’s new favorite blogger.

I don’t know whether Jim subscribes to my bookmark feed or not, but if he does, that would be the likely vector for this nice bit of manufactured serendipity. I’d been wanting to introduce Mike at KSC to Jim (and his innovative team) at UMW. It would be delightful to have accomplished that introduction by simply publishing a bookmark.

But even if that weren’t the vector, the point is that given the overlap between Jim’s published work and Mike’s published work, it’s likely that they would sooner or later have discovered one another. In the realm of personal publishing, thanks to syndication and search, data tends to finds data. And when it does, people find each other.

This process of discovery works best, of course, when there’s common data available to the syndication and search engines. When the same things have different URLs or different names, the connections are non-obvious.

For non-obvious connections that don’t want to be found, you need a technology like the one Jeff Jonas sold to IBM. It goes by the name NORA: non-obvious relationship awareness.

For non-obvious connections that do want to be found, though, we can help the process along in a variety of ways. Publishing hyperlinks is one way to expose non-obvious relationships. Publishing key words and phrases is another. So, for example, in reading up on Jeff Jonas’ work, I realized that the privacy-assuring version of NORA, called ANNA, which uses one-way hashes to obscure private information while still enabling matching and discovery, is related to Peter Wayner’s notion of translucent databases (1, 2).

I’m not the first one to make that connection — Noah Campbell noted it last fall — but this item will strengthen it, in a way that may help some data find some other data, and some people find some other people.

Posted in .

26 thoughts on “Data finds data, then people find people

  1. Jon,

    Thanks for the trackbacks, I like it when your data finds my data. An interesting frame on this is that I did see Mike Caulfield’s post in your del.icio.us links and noted it accordingly, but not too long after that he commented on an unrelated post about The Big Lebowski, of all things. This seemingly incongruous intermediary in many ways further reinforces the ideas you and Mike had been talking about which jump started the conversation: “What would e-learning look like if we started from the needs of the student, instead of the institution?” A publishing platform allows someone to write about a whole bunch of things they are interested, and while Mike and I both share ideas about re-thinking web-based learning along the axis of the student rather than a course, the interstitial social spaces in many ways brought us in contact. How does it change my reaction to Mike, or his to me, when we both know that our perspectives on ed tech (whether divergent or similar) connect our ideas in relationship to shared tastes.

    What makes this technology that much more amazing to me is that after posting a little bit about “el duderino” (a topic which is a far cry from anything “officially” Ed Tech related -or is it now?), Mike and I have a connection around something as social and personal as movies. This is what we do in “real life,” or at least I do: I talk about movies with people I am getting to know because I love them and I will quickly be able to connect with other people around topics I am excited about. Cross-fertilize that with re-imagining educational technology and you have a combination of social relations that make this medium so rich in forging connections that all too many work cultures artificially divorce themselves. It is this interstitial space between knowing and learning not only concepts and ideas, but more importantly people that beautifully captures the truly social nature of the data. For me, the data is not at all divorced from the means by which we shape it with our personalities and our interests, however far afield they may range from the topic -take this comment for example ;)

    Thanks for connecting us through your ever generative ruminations on the intersections of data, ideas, and people -but more importantly for the belief that we can meaningfully harness these connections to trace the anthropomorphic face of data.

  2. Hi Jon,
    Sorry for the use of comments to get your attention :)
    You have a small bug in your RSS feed or rather your old one

    I’m still subscribed to your infoworld feed which is now filling up with trash posts, I’m switching to the new feed but thought I should warn you it seems it is not just picking up your wordpress.com posts but various others as well.

  3. Mystery solved — I was actually over at Jim’s site the day before because I had searched on WordPress as an eportfolio engine, and I think I ended up on something on his site about “slow-blogging”.

    But I’m a Lebowski fan, so I left a comment to that effect when I saw his tribute —

    Then you bookmarked me and the rest is history.

    Point being what you said in the article: the weird thing is that this is not a fragile single vector process, but that such a rich network of multiple pathways exist. It’s a highly redundant system.

    In fact, I wonder if my WordPress eportfolio leanings might have even been informed by the UMW people before I knew who they were, through osmosis.

    Here’s the big question though — do people tend to find people that support tham more than those who disagree? In a community of practice, that’s not horrible — trading implementation tips and tricks is important. But do these communities we form have less of a tendency to develop self-doubt?

    I know I’m not the first to ask that, but I’d be interested in some good answers — I’m split, because I feel that when people jump into the conversational stream too soon they can get assimilated very quickly…at the same time we are currently exposed to such a dazzling display of heterodoxies as has ever been available.

Leave a Reply to Coin Magic TricksCancel reply