If you plug the quoted phrase “the data finds the data” into any of the search engines, the first hit will be one of several essays on Jeff Jonas’ blog. Other evocative phrases that lead to Jeff’s blog include “perpetual analytics”, “sequence neutrality,” and “persistent context,” but while those will soon resonate once you scratch the surface of Jeff’s work, none is as broadly compelling as “the data finds the data.” As sound bites go, that one’s a keeper.
Jeff Jonas is chief scientist for IBM’s Entity Analytic Solutions. His long career in data surveillance, and recent interest in privacy-respecting data surveillance, has drawn a lot of media attention lately. In the mainstream he’s appeared in Newsweek and on NPR. In the techsphere, Tim O’Reilly blogged about Jeff’s visit to PC Forum, Dan Farber interviewed him at the Web 2.0 conference and Phil Windley wrote a detailed review of his keynote at ETech 2007.
Given our shared interests — including surveillance, analytics, security, privacy, and manufactured serendipity — it’s surprising that I only recently became aware of Jeff’s work. Of course, we’ve been working different ends of the same street. He’s focused on finding bad guys: casino fraudsters, terrorists, and others who collaborate secretly. I’ve focused on helping people who collaborate openly do so more effectively. And yet…these really are two sides of the same coin.
Here’s an example of “the data finds the data” in Jeff’s world, from his article in IEEE Security and Privacy entitled Threat and Fraud Intelligence, Las Vegas Style. You have two records that refer to the same person, but you don’t know that they do. Then a third record appears which relates to each of the first two, and which establishes that all three refer to the same person. The first two pieces of data find one another, through the agency of a third piece of data.
Here’s an example of “the data finds the data” in my world. On June 17 I bookmarked this item from Mike Caulfield, who is a local friend, the webmaster at Keene State College, and a forward thinker about Net-enabled education. On June 19 I noticed that Jim Groom — who is a distant acquantance at the University of Mary Washington and another forward thinker on the same topic — had responded to Mike’s post. Ten days later I noticed that Mike had become Jim’s new favorite blogger.
I don’t know whether Jim subscribes to my bookmark feed or not, but if he does, that would be the likely vector for this nice bit of manufactured serendipity. I’d been wanting to introduce Mike at KSC to Jim (and his innovative team) at UMW. It would be delightful to have accomplished that introduction by simply publishing a bookmark.
But even if that weren’t the vector, the point is that given the overlap between Jim’s published work and Mike’s published work, it’s likely that they would sooner or later have discovered one another. In the realm of personal publishing, thanks to syndication and search, data tends to finds data. And when it does, people find each other.
This process of discovery works best, of course, when there’s common data available to the syndication and search engines. When the same things have different URLs or different names, the connections are non-obvious.
For non-obvious connections that don’t want to be found, you need a technology like the one Jeff Jonas sold to IBM. It goes by the name NORA: non-obvious relationship awareness.
For non-obvious connections that do want to be found, though, we can help the process along in a variety of ways. Publishing hyperlinks is one way to expose non-obvious relationships. Publishing key words and phrases is another. So, for example, in reading up on Jeff Jonas’ work, I realized that the privacy-assuring version of NORA, called ANNA, which uses one-way hashes to obscure private information while still enabling matching and discovery, is related to Peter Wayner’s notion of translucent databases (1, 2).
I’m not the first one to make that connection — Noah Campbell noted it last fall — but this item will strengthen it, in a way that may help some data find some other data, and some people find some other people.
July 2, 2007 at 1:43 pm
[...] thoughtful piece from Jon Udell on how data finds data. For those of you who know what “dog in the kitchen” means, this is a pretty exciting [...]
July 2, 2007 at 3:17 pm
Jon,
Thanks for the trackbacks, I like it when your data finds my data. An interesting frame on this is that I did see Mike Caulfield’s post in your del.icio.us links and noted it accordingly, but not too long after that he commented on an unrelated post about The Big Lebowski, of all things. This seemingly incongruous intermediary in many ways further reinforces the ideas you and Mike had been talking about which jump started the conversation: “What would e-learning look like if we started from the needs of the student, instead of the institution?” A publishing platform allows someone to write about a whole bunch of things they are interested, and while Mike and I both share ideas about re-thinking web-based learning along the axis of the student rather than a course, the interstitial social spaces in many ways brought us in contact. How does it change my reaction to Mike, or his to me, when we both know that our perspectives on ed tech (whether divergent or similar) connect our ideas in relationship to shared tastes.
What makes this technology that much more amazing to me is that after posting a little bit about “el duderino” (a topic which is a far cry from anything “officially” Ed Tech related -or is it now?), Mike and I have a connection around something as social and personal as movies. This is what we do in “real life,” or at least I do: I talk about movies with people I am getting to know because I love them and I will quickly be able to connect with other people around topics I am excited about. Cross-fertilize that with re-imagining educational technology and you have a combination of social relations that make this medium so rich in forging connections that all too many work cultures artificially divorce themselves. It is this interstitial space between knowing and learning not only concepts and ideas, but more importantly people that beautifully captures the truly social nature of the data. For me, the data is not at all divorced from the means by which we shape it with our personalities and our interests, however far afield they may range from the topic -take this comment for example ;)
Thanks for connecting us through your ever generative ruminations on the intersections of data, ideas, and people -but more importantly for the belief that we can meaningfully harness these connections to trace the anthropomorphic face of data.
July 2, 2007 at 7:04 pm
[...] Udell recently blogged about the way in which he ‘connected’ paths with a number of people or introduced acquaintances via blogging / publishing and bookmark sharing. One of the Important [...]
July 3, 2007 at 4:48 am
Hi Jon,
Sorry for the use of comments to get your attention :)
You have a small bug in your RSS feed or rather your old one
I’m still subscribed to your infoworld feed which is now filling up with trash posts, I’m switching to the new feed but thought I should warn you it seems it is not just picking up your wordpress.com posts but various others as well.
July 3, 2007 at 3:13 pm
[...] New Primitive impulses developing among the older techies and reconciles it with the beauty of the data finds data world Jon Udell recently discussed on his [...]
July 4, 2007 at 8:59 am
Mystery solved — I was actually over at Jim’s site the day before because I had searched on WordPress as an eportfolio engine, and I think I ended up on something on his site about “slow-blogging”.
But I’m a Lebowski fan, so I left a comment to that effect when I saw his tribute —
Then you bookmarked me and the rest is history.
Point being what you said in the article: the weird thing is that this is not a fragile single vector process, but that such a rich network of multiple pathways exist. It’s a highly redundant system.
In fact, I wonder if my WordPress eportfolio leanings might have even been informed by the UMW people before I knew who they were, through osmosis.
Here’s the big question though — do people tend to find people that support tham more than those who disagree? In a community of practice, that’s not horrible — trading implementation tips and tricks is important. But do these communities we form have less of a tendency to develop self-doubt?
I know I’m not the first to ask that, but I’d be interested in some good answers — I’m split, because I feel that when people jump into the conversational stream too soon they can get assimilated very quickly…at the same time we are currently exposed to such a dazzling display of heterodoxies as has ever been available.
July 5, 2007 at 8:23 pm
[...] to something Jon Udell refers to as “manufactured serendipity”, most recently in Data finds data, then people find people). One key concept our team has been exploring is the trokia of resources, tags and people. The idea [...]
July 5, 2007 at 9:44 pm
[...] are words by Jon Udell. Jon writes about work done by Jeff Jonas at IBM on surveillance, security and analytics, and while [...]
July 12, 2007 at 3:09 pm
[...] Jon Udell: Data finds data, then people find people “Here’s an example of ‘the data finds the data’ in Jeff’s world, from his article in IEEE Security and Privacy entitled Threat and Fraud Intelligence, Las Vegas Style [PDF]. You have two records that refer to the same person, but you don’t know that they do. Then a third record appears which relates to each of the first two, and which establishes that all three refer to the same person. The first two pieces of data find one another, through the agency of a third piece of data.” [...]
July 24, 2007 at 12:03 pm
[...] can enable connections, too. Jon goes on to explain: “On June 17 I bookmarked this item from Mike Caulfield… On June 19 I noticed that Jim [...]
September 5, 2007 at 10:28 am
[...] aren’t a panacea, but when we use them to reduce friction and lower activation thresholds, data will find data, and people will find people. To achieve those effects, the essential property of machine readability matters more than its [...]
September 16, 2007 at 11:39 am
[...] aren’t a panacea, but when we use them to reduce friction and lower activation thresholds, data will find data, and people will find people. To achieve those effects, the essential property of machine readability matters more than its [...]
November 27, 2007 at 9:10 am
[...] it’s all the same to me in one fundamental way. When we push information into shared spaces, data finds data, and people find people, and all sorts of magic [...]
December 19, 2007 at 8:30 pm
[...] The idea seems to be that of connecting people with relevant information. In a way, this reminds me of Jon Udell’s post extending Jeff Jonas‘ theory; data finds data, then people find people. [...]
January 8, 2008 at 10:43 pm
[...] I think Jon Udell would like hearing that. [...]
June 29, 2008 at 2:00 pm
[...] quote by Jon Udell, channeling Jeff Jonas is one that, to me at least, defines what the modern web is all about. Too [...]
July 2, 2008 at 8:55 pm
[...] New Primitive impulses developing among the older techies and reconciles it with the beauty of the data finds data world Jon Udell recently discussed on his [...]
July 3, 2008 at 5:12 pm
[...] finds data, then people find people That quote by Jon Udell, channeling Jeff Jonas is one that, to me at least, defines what the modern web is all about. Too [...]
July 3, 2008 at 5:16 pm
[...] finds data, then people find people That quote by Jon Udell, channeling Jeff Jonas is one that, to me at least, defines what the modern web is all about. Too [...]
August 12, 2008 at 3:42 pm
Thanks to reader Meryn Stol, I just found this great post of yours, Jon.
A year later, the challenge is still on for even more effective ways for people to connect around data effortlessly.
http://lamarguerite.wordpress.com/2008/08/12/3-more-things-i-learned-about-social-networks/
August 27, 2008 at 9:00 am
[...] get a vibrant bacteriorhodopsin community. This is essentially an extension of the now infamous data finds data, people get people meme that the whole world should latch on [...]
October 30, 2008 at 10:41 pm
[...] rehashing the data finds data, then people find people meme, it would be interesting to have such discovery engines for biological resources. Not [...]
January 5, 2009 at 10:25 am
[...] January 5, 2009 A conversation with Jeff Jonas about connecting dots Posted by Jon Udell under Uncategorized On this week’s Interviews with Innovators show I spoke with Jeff Jonas whose work (and narration of that work on his blog) first captured my interest in 2007. [...]
May 22, 2010 at 3:08 am
Great article, I just ran across it going through Mixx. Im a bit late though, I mean months late since you submitted it lol.
May 24, 2010 at 3:04 pm
[...] myself or Deepak Singh you will almost certainly have heard the Jeff Jonas/Jon Udell soundbite: ‘Data finds data. Then people find people’. Jonas is referring to data management frameworks and knowledge discovery and Udell is referring [...]
November 7, 2010 at 9:51 pm
[...] with similar content in these broad-use databases, where they are more discoverable and reusable. Data finds data, then people find people. As more scientists use these tools, the more powerful they [...]