A conversation with Ted Okada about the work of Microsoft Humanitarian Systems

The same audio glitch that ruined my interview with Joel Selanikio also affected another interview on the same day. That interview, for my Microsoft Conversations series, was with Ted Okada, who is the director of a small group called Microsoft Humanitarian Systems. So again I’ll have to settle for reporting highlights from the interview along with some quotes I was able to salvage.

Ted came to Microsoft by way of Groove, where he’d been hired to spearhead Groove’s use in the humanitarian sector which had increasingly come to value the product for several interesting properties — technical resilience in the face of intermittent connectivity, and political resilience in the sense that it creates neutral infrastructure owned by no single agency. When I caught up with Ted, as he was packing for a trip to Afghanistan, he gave this example of the latter:

We’ve been working with an NGO that was using Groove to negotiate between the Tamil Tigers and the Sinhalese government in Sri Lanka. The two parties wouldn’t sit in the same room, but they did agree to use Groove to arbitrate the conflict.

In this appearance on Channel 10, Ted talks about how Groove is uniquely well equipped to support collaboration in disaster relief situations, and he demonstrates a Groove-based solution that enabled five different relief organizations responding to the 2005 Kashmir earthquake to synchronize on the same operational picture.

Ted has also been one of Microsoft’s representatives at Strong Angel, an exercise to simulate disaster response that’s been held three times — in 2000, 2004, and most recently 2006. Strong Angel was the brainchild of U.S. Navy Medical Corps commander Dr. Eric Rasmussen. I asked Ted what it’s like to participate, and he replied:

It’s an odd mixture of the early Interop conferences — where people were trying to get routers from different manufacturers to work together — plus a little bit of Burning Man, a little bit of Foo Camp, and a little bit of the military channel. Officially it’s a demonstration, but it involves all those elements and addresses all kinds of questions. How do you cross the civil/military boundary, particularly when trust is low and the need for collective action is high? How do you make sure all the gear works together? Of course it’s also a venue for some interesting gear, like solar reflective yurts that you might find at Burning Man — and in fact actually were taken to Burning Man.

As John Markoff reported, there were some notable interoperability failures at Strong Angel 3 but also some notable successes. One of the latter involved the use of Simple Sharing Extensions (SSE), an extension of RSS, to synchronize location data between Google Earth and Microsoft Virtual Earth.

I wondered what broader role SSE might play, given that it extends a Groove-like data synchronization capability to a diverse set of applications. It turns out that Ted will be testing a prototype SSE adapter for Microsoft Access on a trip to Kabul next week:

From my perspective as a relief and development person for 20 years, you can’t overestimate the value of simple tools like good old Access. What if Access could relay messages and synchronize via SSE, so that you’ve got persistent statefulness and failover on highly intermittent and jittery networks? Suddenly Access becomes a much more lively player in the edge-based mesh. So now in Afghanistan we’ll actually be using this wonderful everyman’s tool, Access, enlivened with SSE adapters, to help out an NGO partner who’s told us that would really help them share data with the other stakeholders in the reconstruction project they’re working on.

Ted has an interesting take on what Microsoft might learn by collaborating with these kinds of partners:

If you make the developer part of an environment that is itself stressed, and build for the extreme case, maybe you can titrate lessons faster and close the loop quicker on accelerated learning. It’s hard to work in a place like Afghanistan. It’s an austere environment and you’re at the mercy of that environment. Very few people know who Microsoft is — or care who we are — and there aren’t many places in the world where that’s true. In some ways, perhaps, immersion in that environment could turn out to be the ultimate sort of extreme programming.

Those were the highlights. It’s painful to have bungled those audio recordings. When I told Phil Windley he said, “I live in fear of that.” Well, the silver lining — for folks who don’t listen to podcasts, at least — is that it forced me to write more about the interviews than I normally do. Tomorrow I’ll record what will be the third in a series of conversations about humanitarian uses of technology, and you can bet I’ll double-check to make sure I’m recording what I think I’m recording!

A conversation with Joel Selanikio about collecting public health data in developing countries

For this week’s ITConversations show I interviewed Dr. Joel Selanikio, co-founder of DataDyne, a non-profit consultancy dedicated to improving the quantity and quality of public health data. DataDyne’s principal tool is EpiSurveyor, a free and open source software product that simplifies the creation of forms for doing field data collection with handheld devices. There’s a Windows-based forms designer, and a runtime for Palm OS-based PDAs which is being ported to Windows Mobile- and Java-based devices.

It was a great interview but, when I opened up the audio file I’d recorded, I was horrified to find that an audio glitch had rendered it unusable. So instead I’ll report here on what Joel told me, and weave in some quotes I was able to salvage.

Our conversation was well-timed because I’d just watched the dustup between Michael Moore and Wolf Blitzer, checked out Moore’s rebuttal of Sanjay Gupta’s report on SiCKo, and tracked down some of the cited sources — including the United Nations Human Development Report. How reliable are these sources, particularly for developing countries? As you might imagine, and as Joel’s experiences confirm, there’s a lot of guessing going on:

It’s amazing how unaware people are of the tenuous nature of our knowledge of, really, anything. One of the things I ask people is: “What’s the population of the United States?” And they’ll say 290 million, or 300 million. But the real answer is: We don’t know. And we check, every 10 years, and we do a pretty good job of checking.

So if you want to know what’s the leading cause of disease in children in rural Africa, what’s the chance that you’ll have any idea what the answer to that question is?

I was a first responder to the tsunami in Southeast Asia. Imagine showing up in a place where the slate has been wiped clean, God just slammed his hand down and flattened everything. The roads are gone, and three or four thousand aid workers are all clustered in the few places they can get to. But of course, you have to come up with an estimate of how many people are dead. So somebody picks a number, and then you hear it on CNN that night. Fifty thousand, a hundred thousand, a hundred and twenty-five thousand, none of those estimates were based on any attempt to really find out.

A friend who works for American Red Cross asked me what I thought they could do. I said the most valuable thing to do with the hundreds of millions of dollars of donations they’d received was to invest in data collection. Normally in a situation like that you do a sample. You go to ten percent of households, and try to extrapolate, and hope that your sample isn’t biased. But I said, with five hundred million dollars, there’s no reason we can’t get the local people to do a census. Go to every refugee area and every household and actually find out — not estimates — but actual numbers, which would be of huge importance for reconstruction.

So of course it didn’t happen.

As a one-time database programmer who went to medical school and then became an epidemiologist, Joel’s acutely aware of the relationship between information technology and epidemiology, a discipline that is, as he reflects on here, profoundly data-driven:

In about 1995, over the course of six weeks, kids start showing up at the university hospital in Port-au-Prince, Haiti. They had different symptoms, but they all died. At about the hundred mark the local docs contacted the World Health Organization who contacted CDC, and a colleague of mine and I went down to Haiti. It’s a high-pressure situation, kids are dying every day. When we got there we began creating a database of responses to questions: what their symptoms had been, what medications they had taken. Within a few days it became apparent that all of the kids who’d died had taken one of several locally-produced Tylenol-like medications. Once we discovered that, they made an announcement and the outbreak ended.

People would ask me, “What magic did you work?” Well, in clinical medicine, the way that we understand things is — if it’s a rash, I look at the rash, I think about it, I look stuff up, but I don’t systematically create a database. For one patient you can juggle the variables in your head. But when you have a population of affected people, you need to collect data and analyze it. That’s the basis of epidemiology.

Unfortunately our standards and methods for data collection are far lower in the realm of public health than in the realm of business:

Imagine if you were the CEO of Toyota, and your CFO said, well, sales are pretty good, we think, but we’re not sure. He’d flip. And yet that’s how things are with public health. We’ve been making a concerted effort for fifty years to get rid of malaria, but the quality of our statistics is terrible, and we’ve just gotten used to that.

A key reason for that poor quality is that the collection of public health data in developing countries is still mostly a paper-based activity. And while handheld devices are obviously a great alternative, what Joel found is that the software available for creating surveys was way too hard for ordinary folks to use:

If you’re the ministry of health in Kenya and you want to survey a hundred thousand households, and use handhelds to do it, you’ll need some knowledge of programming. If I told you to write down your questions in Microsoft Word you could easily do it, it’s frictionless, but with the commercially available software for creating surveys that run on PDAs, you can’t do it. That software can do all kinds of fancy things, of course, but most of the time the information you need to collect is very simple stuff.

So that’s what EpiSurveyor does, Joels says. It makes the simple stuff simple, so that ordinary folks in developing countries can create surveys without having to hire programmers and consultants.

But how can you have any assurance that the data gathered in these kinds of surveys will be usefully comparable? Are there standard forms and standard schemas? Not really, Joel says. The existing forms are hard to find and reuse, and there’s been little progress toward standardization:

If you went to a UN organization and said, we want to standardize how we collect data about child nutrition, the response would be, let’s have a conference. We’ll have experts get together in Rome, and then in Paris, and decide what are the key questions for any standard child nutrition survey. But it’s hard to achieve unanimity, and there’s a built-in incentive not to because every time you get together it’s a trip to Rome.

Coming at the problem from a grassroots web 2.0 persective, Joel’ working to translate the various forms used by international agencies into EpiSurveyor’s XML format, and to make them available in a shareable repository. The notion is that reuse will occur naturally when it lies along the path of least resistance. And he sees that starting to happen. For example, having trained field workers in both Kenya and Zambia, he discovered — after the fact — that the Zambian workers had found, and reused, a Kenyan survey which they’d found on DataDyne’s private project management site.

My Zambian contact said, Joel, I hope this is OK, but I downloaded their form, and opened it up, and made a few changes — basically just the names of provinces — and then I used their form.

Of course it was more than OK, he was delighted. Asking the same questions, in the same ways, is exactly what you want to happen, and yet it rarely does.

The forms repository that Joel envisions doesn’t yet exist, but he’s hoping that as DataDyne builds up a reputation around successful deployments of EpiSurveyor, the company will be able to attract the resources and the attention needed to make that happen.

New expectations (and new opportunities) for stewards of public data

Here’s an update for those who’ve been following the story of my quest for local crime statistics[1,2]. This morning I met with the police chief and some other officials. Given that I began asking for this data in late April or early May, and went through four rounds of telephone and then email contact, it shouldn’t have taken so long to convene the meeting. And it would have taken longer had I not engaged my friend Ted Parent, who is a lawyer and a great champion of democracy, to write a letter to the city attorney. The magic incantation in New Hampshire, by the way, is not Freedom of Information Act (FOIA), but rather Right to Know. It wasn’t enough for me to utter those magic words, though. Ted had to do that, in a letter that went on to describe in great detail my reputation, qualifications, and seriousness of purpose. That description is true, but shouldn’t have been necessary, nor should Ted’s services have been.

In any event, we had a productive discussion and will meet again soon to discuss logistics: what’s unavailable and why, what’s available and how to get it. What will likely be available is an update to this data set, which might or might not reveal trends since 2005. That would be of interest locally because, while there’s a strong sense that crime is worse lately, nobody seems to be clear about the details.

But here’s why that might not help. The feds only gather and report on certain categories of crime. Among those not included, the chief told me, are drunk driving incidents, which he’s been seeing a lot more of lately. Another systematic omission: rapes only count as rapes when inflicted on females by males.

Then there’s the fact that state participation in the National Incident-Based Reporting System (NIBRS) — which is apparently the new name for what used to be called UCR (Uniform Crime Reporting) — is voluntary and spotty.

So it’s unclear what questions can even be answered — in local, state, or national context — by the UCR/NIBRS data that the city’s software can and does report to the feds.

But other questions are entirely outside the scope of that dataset. It includes no location information, for example. I was surprised to learn that while the city does of course collect street addresses when entering crime reports into its database, they’re unaware of any straightforward way to get the location data back out in order to visualize geographic patterns. My hunch is that I can help them with that, if I can get hold of a raw export, so that’s something we’re going to explore at our next meeting.

This has been an interesting process to observe. Today the assistant city attorney said something that crystallized, for me, an insight about the stewardship of public data. Although the city has so far received very few Right to Know requests, one of them, she said, could have proved very costly in terms of the software and consulting services that would have been needed in order to comply. That insight won’t rewrite the legacy system, but it certainly imposes an important new requirement on its successor.

The folks I met with today aren’t familiar with ChicagoCrime.org or CAPStat, but I didn’t get the impression they’re opposed to the idea of citizen participation in the interpretation of government data. On the contrary, I think they may conclude that deploying systems to enable that participation would be as useful to them as it would be to the public.

We have a long way to go at all levels: local, state, national, international. But expectations are being reset, up and down the line, and I’m hopeful that we’ll get where we need to go.

Data analysis as performance art


Hans Rosling has been justly acclaimed for a couple of TED talks on global health in which he makes mesmerizing use of his (and now Google’s) GapMinder software, which he uses to tell compelling stories with data. The software is very cool, but what really makes the stories come to life is Rosling’s narrative. Data analysis, for him, is a performance art.

I’ve been thinking about this because I’ve been trying to investigate a perceived crime wave in my home town. You’d think it would be straightforward to get hold of the data but, after four months, I’m still trying. Meanwhile, however, I found some historical data at the Bureau of Justice, and I decided to see what I could make of that.

The visualizations shown in today’s screencast were done with Many Eyes, which is another very cool piece of software. But what I realized while making them is that narrated animation is really the secret sauce. Analytical software, whether it’s Excel or GapMinder or Many Eyes or something else, is necessary but not sufficient. The stories that people will understand, and remember, are the ones that have been performed well.

Now I’m no Hans Rosling, and you certainly won’t see me swallow a sword at the end of this screencast — as he amazingly does at the end of this video. But I will be trying to emulate his example when I tell stories with data. And I’m struck, once again, by the way in which screencasting can bring software interaction to life.

The charts used in my screencast could have been made in Excel or in any other charting package. By making them in Many Eyes, I added the important new dimension of social analysis. So you can visit the data sets there, comment on the visualizations, and add your own visualizations. But data analysis as performance art goes beyond the snapshots produced by analytical tools. It lives in the interstitial spaces between the snapshots, traces a narrative arc, shows as it tells.

A conversation with Timo Hannay about the scientific web

As director of web publishing for Nature Publishing Group, Timo Hannay’s projects include: Connotea, a social bookmarking service for scientists; Nature Network, a social network for scientists; and Nature Precedings, a site where researchers can share and discuss work prior to publication.

The social and collaborative aspects of these systems are, of course, inspired by their more general counterparts on the web: del.icio.us, Facebook and LinkedIn, the blogosophere. That’s part of what we discussed in this week’s ITConversations podcast. We also talked about my longstanding concern that scientists, like other academics and indeed most professional people, aren’t directly rewarded for being wired into the web. Timo has some great ideas about how to change that. He notes:

This will sound a bit strange coming from someone who works for a journal publisher, but to date, the way that scientists’ output has been measured has been unduly focused on publications in peer-reviewed journals. That is, and will continue to be, a really important part of it, but it’s not the only thing they do.

Here’s one specific proposal for change — measure, and reward, contributions of data:

Biology in recent years has seen a move from what I would characterize as cottage industry science, where everything from data capture through to analysis to writing the paper happens within one lab among a small group of people, to a much more industrial scale where you have different groups, widely dispersed, perhaps who don’t even know each other, doing the data capture versus the analysis versus writing the paper.

But you can’t just publish a data set. So what tends to happen is that, for a really big important data set — like a new major genome — they’ll publish a paper off the back of it, and do a very quick preliminary analysis. But the real news is not the analysis, it’s the data set. They have to make this fig leaf of analysis in order to justify publishing the paper.

We need to make it possible for people to publish data sets — to put them out there, track what use is made of them by other people, and then eventually gain credit for that.

Excellent suggestion!

More broadly, Timo wants to measure activity in the specialized versions of the blogging, bookmarking, and social networking services that Nature Publishing Grouop is creating for scientists. He says NPG is working with funding organizations to figure out what kinds of measurement can support a broader system of credit and recognition.I know it’s hard to nail down this touchy-feely stuff, but it really does matter. Yesterday I found a great quote from E.O. Wilson — in Consilience, which I’ve finally gotten around to reading — that helps explain why:

The creative process is an opaque mix. Perhaps only openly confessional memoirs, still rare to nonexistent, might disclose how scientists actually find their way to a publishable conclusion. In one sense scientific articles are deliberately misleading. Just as a novel is better than the novelist, a scientific report is better than the scientist, having been stripped of all the confusions and ignoble thought that led to its composition. Yet such voluminous and incomprehensible chaff, soon to be forgotten, contains most of the secrets of scientific success.

Narrating the work in openly confessional memoirs can and should be measurable, valuable, credit-worthy.

Show me the data

The emerging discipline of social data analysis and visualization faces two challenges. First, obviously, you need data. Then, more interestingly, you need to figure out ways for people to create, share, and collaboratively refine interpretations of the data. There are a handful of well-known and powerful sources of data. The OECD’s data, for example, drives several of the visualizations at IBM’s Many Eyes site. Where else can you find data for these kinds of tools and services to chew on?

Sources I’ve used and discussed include Washington DC’s CAPStat and the Dartmouth Atlas of Health Care. A number of others are listed in this summary from the session at Foo Camp 07 on liberating government data.

For my own purposes, I’ve decided to keep track of these kinds of public data sources at del.icio.us/judell/publicdata. One of the delightful consequences of doing things that way is that I can pop up a level, to del.icio.us/tag/publicdata, in order to find out what other folks have been storing in the publicdata bucket.

There’s not a whole lot there, yet, but here’s one gem I discovered by way of a link to Gapminder: the United Nations Common Database. From the Gapminder blog on June 7:

UN statistics finally liberated and free of charge!

In a bold move that hopefully will set the standard for all major producers of statistics, UN Statistical Division have made their data accessible and FREE OF CHARGE from May 1 this year. United Nations Common Database (UNCDB) is now available for everyone, with no demand of subscription or user fees on their web-site.

We now look forward to the domino-effect and the liberation of other hidden or locked global statistics from other producers and collectors of data.

Amen. To that end, I invite readers of this blog to contribute these kinds of findings — as you encounter them in your travels — to the publicdata bucket in del.icio.us, to which I’m now subscribed. I’ll in turn curate that list at judell/publicdata, with an eye toward sources that I deem to be noteworthy, conveniently accessible, and likely to yield useful analysis.

A conversation with Pablo Castro about Astoria’s RESTful data services

In the latest episode of my Microsoft Conversations series I talked with Pablo Castro about Astoria, a layer of middleware that makes data readable and writeable by means of a RESTful interface. Even if you don’t know or care about the buzzwords, it’s easy to show what Astoria does and to explain why it’s interesting. One of the sample databases configured to work with the experimental version of Astoria is a subset of the Encarta encyclopedia. You don’t have be a programmer or grok XML in order to appreciate the following dialogue with the Astoria-enhanced version of Encarta.

What are Encarta’s topic areas? encarta/encarta.rse/Areas
<Area uri="Areas[5]">
<ID>4</ID>
<Name>Life Sciences</Name>
<Articles href="Areas[5]/Articles" />
</Area>

...etc...
The answer comes back in exactly the form shown here. It’s XML, but a very webby kind of XML that’s full of links that I’ve rendered as clickable.
So, what’s the fifth Area? encarta/encarta.rse/Areas[5]
<Area uri="Areas[5]">
<ID>5</ID>
<Name>Sports, Hobbies, and Pets</Name>
<Articles href="Areas[5]/Articles"/>
</Area>
Every link asks a question, and gets an answer that embeds links to ask more questions.
OK, what are the articles in that area? encarta/encarta.rse/Areas[5]/Articles
<Article uri="Articles[761553558]">
<ID>761553558</ID>
<Title>Aaron, Hank</Title>
<Preview>
Aaron, Hank, born in 1934, American baseball player,
nicknamed Hammerin’ Hank, whose 755 home runs broke
the all-time record previously held by ...
</Preview>
<Url>
http://encarta.msn.com/encyclopedia_761553558/Hank_Aaron.html
</Url>
<Area href="Articles[761553558]/Area"/>
<ArticleBody href="Articles[761553558]/ArticleBody"/>
<Notes href="Articles[761553558]/Notes"/>
<RelatedArticles href="Articles[761553558]/RelatedArticles"/>
</Article>

...etc...

A database with Astoria layered on top of it isn’t a web application, but it’s within shouting distance of being one, and you don’t even have to shout very loudly.

Pablo’s presentation at MIX is chock full of demos and explanations. Our podcast refers to and complements that presentation.

I’m not even close to being an expert in the underlying data access technologies, including ADO.NET, the Entity Data Model, and LINQ, so parts of the discussion quite frankly went over my head. Nor am I yet familiar with the tooling that’s required to wrap this kind of services layer around a plain data source. But I’m 100% clear that it’s a good idea, and a great example of RESTful web services — a book that Pablo Castro says is “required reading” for members of the Astoria team.

Data finds data, then people find people

If you plug the quoted phrase “the data finds the data” into any of the search engines, the first hit will be one of several essays on Jeff Jonas’ blog. Other evocative phrases that lead to Jeff’s blog include “perpetual analytics”, “sequence neutrality,” and “persistent context,” but while those will soon resonate once you scratch the surface of Jeff’s work, none is as broadly compelling as “the data finds the data.” As sound bites go, that one’s a keeper.

Jeff Jonas is chief scientist for IBM’s Entity Analytic Solutions. His long career in data surveillance, and recent interest in privacy-respecting data surveillance, has drawn a lot of media attention lately. In the mainstream he’s appeared in Newsweek and on NPR. In the techsphere, Tim O’Reilly blogged about Jeff’s visit to PC Forum, Dan Farber interviewed him at the Web 2.0 conference and Phil Windley wrote a detailed review of his keynote at ETech 2007.

Given our shared interests — including surveillance, analytics, security, privacy, and manufactured serendipity — it’s surprising that I only recently became aware of Jeff’s work. Of course, we’ve been working different ends of the same street. He’s focused on finding bad guys: casino fraudsters, terrorists, and others who collaborate secretly. I’ve focused on helping people who collaborate openly do so more effectively. And yet…these really are two sides of the same coin.

Here’s an example of “the data finds the data” in Jeff’s world, from his article in IEEE Security and Privacy entitled Threat and Fraud Intelligence, Las Vegas Style. You have two records that refer to the same person, but you don’t know that they do. Then a third record appears which relates to each of the first two, and which establishes that all three refer to the same person. The first two pieces of data find one another, through the agency of a third piece of data.

Here’s an example of “the data finds the data” in my world. On June 17 I bookmarked this item from Mike Caulfield, who is a local friend, the webmaster at Keene State College, and a forward thinker about Net-enabled education. On June 19 I noticed that Jim Groom — who is a distant acquantance at the University of Mary Washington and another forward thinker on the same topic — had responded to Mike’s post. Ten days later I noticed that Mike had become Jim’s new favorite blogger.

I don’t know whether Jim subscribes to my bookmark feed or not, but if he does, that would be the likely vector for this nice bit of manufactured serendipity. I’d been wanting to introduce Mike at KSC to Jim (and his innovative team) at UMW. It would be delightful to have accomplished that introduction by simply publishing a bookmark.

But even if that weren’t the vector, the point is that given the overlap between Jim’s published work and Mike’s published work, it’s likely that they would sooner or later have discovered one another. In the realm of personal publishing, thanks to syndication and search, data tends to finds data. And when it does, people find each other.

This process of discovery works best, of course, when there’s common data available to the syndication and search engines. When the same things have different URLs or different names, the connections are non-obvious.

For non-obvious connections that don’t want to be found, you need a technology like the one Jeff Jonas sold to IBM. It goes by the name NORA: non-obvious relationship awareness.

For non-obvious connections that do want to be found, though, we can help the process along in a variety of ways. Publishing hyperlinks is one way to expose non-obvious relationships. Publishing key words and phrases is another. So, for example, in reading up on Jeff Jonas’ work, I realized that the privacy-assuring version of NORA, called ANNA, which uses one-way hashes to obscure private information while still enabling matching and discovery, is related to Peter Wayner’s notion of translucent databases (1, 2).

I’m not the first one to make that connection — Noah Campbell noted it last fall — but this item will strengthen it, in a way that may help some data find some other data, and some people find some other people.

Simon’s laws of local blogging

Dryden, New York is a small town near Ithaca. Four years ago, local resident Simon St. Laurent began chronicling the civic life of the town on a blog called Living in Dryden. In a 2004 profile the Ithaca Journal wrote:

St. Laurent can be seen, notebook and digital camera in tow, at Planning Board and Conservation Advisory Council gatherings, as well as at special meetings on fire departments, speeding and comprehensive plans.

And it asked:

What could motivate this seemingly normal man to submit himself to hours of political talk and legalese?

The answer is that Simon St. Laurent is leading the way to an understanding of how local blogging can reflect and enrich the life of a community. Day by day, and year by year, he’s showing his fellow citizens that political blogging doesn’t have to be bombastic and divisive. It can be a civil dialogue that informs and unites.

I first wrote about Simon’s project more than three years ago. I’ve mentioned it in several talks since then, and this week I interviewed him for my weekly ITConversations show. The show’s not posted yet, and I’ll probably be away from my computer when it is, but check here later today if you’re interested. Personally I think Simon’s project is one of the more important things you’ll never read about on TechMeme. Here are some quotes from the interview that highlight two of Simon’s Laws:

Responsiblity is inversely proportional to community size

When you’re doing local stuff, you can’t stay anonymous for long. I think that has a major impact on the tone of things. The content has to be a lot more accurate because people will call you on it. Somehow the level of responsibility increases as the size of community decreases. It really changes the dynamics thoroughly.

Don’t make people spit out their coffee

Dealing with the threshold where people don’t really trust what they read is something I worry about pretty consistently. My usual rule is that nobody should have to spit out their coffee when they’re reading it. I have a neighbor up the hill who’s a conservative Republican, and I count on him to tell me when I’ve gone too far. Having that kind of tight feedback loop makes it possible for me to write things that I know will appeal to a lot of people.

Social network analysis in Facebook

From time to time I like to dabble in social network analysis. Now that Facebook has opened itself up to programmatic access, I thought I’d do some spelunking to see what I could learn. Here are a couple of questions I’d like to answer about the “clubbiness” of tech-company Facebookers:

1. Looking at the tech-company population as a whole, do people socialize within and across corporate networks more than elsewhere?

2. Looking at individual tech companies, which are more or less likely to mingle with other tech companies?

The questions are certainly answerable. Surfing around in Facebook, for example, I can view the profiles of my friends at Microsoft and elsewhere, and find out to what extent they, and their friends, socialize with people in their home corporate networks, with people in other corporate networks, and with people elsewhere. Since Facebook is a web application, the same information is — by definition — available by means of screenscraping, if you want to go to the trouble, which I don’t.

So far as I can see, though, you can’t automate this process using the Facebook API. A Facebook application can enumerate the friends of the logged-in user, but not those friends’ friends. It’s hardly surprising. There’s plenty of risk in allowing that kind of transitive data-mining, and no obvious benefit to Facebook.

I guess the Facebook way of doing this kind of analysis would be to create an application that goes viral, and pools information from the perspective of many different Facebookers. I’m unlikely to do that, but if it’s something you’re considering, here are a few points to consider.

First, in order to avoid the server meltdown problem that Marc Andreessen discusses in his analysis of the Facebook platform, it might be interesting to do a desktop application. I hadn’t known such a thing existed, but I wrote a little one today, using the Python bindings to the Facebook API. In this scenario, client-side code invokes the browser to do an interactive login, and then makes API calls into Facebook. The advantage is that if your application gets more popular than you could support with a service in the cloud, it’s no problem, because users download it and run it locally. The disadvantage, of course, is that they have to download it and run it locally. And especially for an application like this one, which intentionally crosses cultural boundaries, you’d have to be prepared to run on any client OS.

Second, it looks as though, in one respect, the Facebook API doesn’t quite work as advertised. My desktop application should at least be able to report how many of my own friends are in the Microsoft network. But while the documentation says I can query for all of my friends’ affiliations, I’m only seeing one affiliation per friend. So if a Microsoft friend’s primary affiliation is the Seattle network, my application doesn’t know that he’s a Microsoft friend. Am I right in regarding that as either a software or documentation bug?

Accounting for page popularity

Today Lauren Weinstein draws attention to “a fascinating and apparently singular page on Google that you’ve probably never seen.” He’s right, I hadn’t, and apparently not many others have either. The page, http://www.google.com/explanation.html, appears as a sponsored link when you search for the word Jew, and apologizes for the fact that a hate site appears as a highly-ranked result. Although the apology dates back to April 2004, more than three years ago, it has so far attracted fewer citations (currently 50) and bookmarks (currently 26) than some of the blog posts I’ve written since April 2004.

Lauren writes:

The Web, after all, isn’t really computers and routers, fiber and spinning disk arrays, databases and blogs. The Web is people. Our job now is to find the path toward helping make sure that the power of Web search enhances people’s lives while not incidentally creating asymmetric opportunities for seriously damaging innocent lives in the process.

Lauren’s item today points back to a pair of earlier items in which he proposed a dispute resolution mechanism that’s reminiscent of Wikipedia’s:

Question: Would it make sense for search engines, only in carefully limited, delineated, and serious situations, to provide on some search results a “Disputed Page” link to information explaining the dispute in detail, as an available middle ground between complete non-action and total page take downs?

As we see today, that’s already happening in at least this one case. I’m sure it won’t be the only one, and that the kind of mechanism Lauren envisions will emerge.

In parallel, I believe we’ll increasingly need and want more and better explanations of all search results. Today, for example, I am the second and tenth results for the word Jon. As recently as last week I edged out Jon Stewart for the top spot. Why? I have a large Web surface area, it has grown steadily over many years, it’s mostly contained within the link-happy blogosphere.

Five years ago I called this a temporary anomaly, and predicted that a democratization of web presence will adjust the imbalance. It hasn’t happened yet, though. Meanwhile, it’s reasonable to expect that search engines might begin to provide the kinds of explanations that I’ve given here. Yes, ranking algorithms are proprietary, but some evidence — about the number of supporting pages, the structure of collections, the nature of supporting link networks — could go a long way toward helping people contextualize search results.

Web search can create an asymmetric advantage for all kinds of agendas. In exceptional circumstances where such advantage is exploited to do damage to people, I think Lauren’s right, we’ll need a mechanism to handle those exceptions. But in all cases, whether the agenda is positive or negative, better accounting for the nature of the advantage would be helpful.

A conversation with John Willinsky about public participation in the creation of knowledge

It was a great pleasure to speak with John Willinsky for this week’s ITConversations show. We refer to another podcast I mentioned here. As much as I hope people will listen to this week’s show, I think it’s even more important to hear that other one, which is a talk that Dr. Willinsky gave at the UBC Okanagan Learning Conference last year.

If you’re an educator planning an offsite meeting or workshop, I would strongly recommend that you use that time to do two things:

  1. Listen, together, to John Willinsky’s UBC Okanagan talk.
  2. Discuss it.

Mashing up ITConversations and SIConversations

Although my own weekly podcast appears on the ITConversations channel of the Gigavox network, lately I find myself listening more often to our sister channel, Social Innovation Conversations. And I’ve started to wonder: Why are these two different channels, for two different audiences? Increasingly I wish I could mash them together. Dean Kamen’s recent appearance on Tim Zak’s Globeshakers series on SIConversations gives me a sense of what that would be like. Here his pitch:

Given the enormous rate at which technology is moving forward, almost all the ‘Can this be done?’ questions have essentially been answered by ‘Yes.’ The much tougher question right now isn’t ‘What can we do with technology?’ — it’s ‘What should we do with technology?’ That’s a much harder question involving practical issues, moral issues…the haves and the have-nots, in technology, education, and health care, are diverging.

People who can develop new technologies ought to start thinking, more than they have in the last few decades, about where it’s appropriate to deploy the energy and passion to develop the next level of technology. There are just so many video games that we need, and just so many luxury leisure-time products that we need.

If societies start to recognize that we really do get what we celebrate, and we start celebrating the right things, we’ll see a much more effective use of our available technologies and a much more appropriate and focused set of developments of our future technologies. Instead of focusing on what we can do with technology, we should focus on how to be responsible to each other, to the environment, to the future of this delicate little planet.

I’ve transcribed that quote here for two reasons. First, because I know that relatively few people have the time or inclination to listen to as many podcasts as I lately find myself wanting to do. Second, because I know that people who self-identify as technogeeks are more likely to subscribe to ITConversations than to SIConversations.

As I write this entry I’m enroute from one technogeek paradise, the Microsoft campus, to another, the O’Reilly campus. In doing so I’m crossing a bridge between two cultures that are, in some ways, very different. On the Microsoft campus, for example, Windows laptops are ubiquitous and Macintosh laptops are scarce. On the O’Reilly campus it’s the reverse.

In other ways these cultures are very alike. In both places, you’ll routinely see people whizzing around on the invention for which Dean Kamen is best known: the Segway. Geeks of all persuasions are early adopters and everyday users of this machine which, for most people, remains an exotic curiosity.

And yet, there wasn’t a single mention of the Segway in Tim Zak’s 45-minute interview with Dean Kamen. What’s top of mind, for Kamen, is US FIRST — the acronym expands to For Inspiration and Recognition of Science and Technology. The goal is to reframe the idea of success which, he says, too many teenagers define unrealistically in terms of sports and entertainment. He wants them to know that success in science and engineering is, for the vast majority, both more achievable and more socially productive. A Wired article in 2000 called the idea far-fetched, but FIRST’s robotics competitions have grown steadily since 1992, and in 2007, Kamen says, the final event packed 70,000 people into Atlanta’s Georgia Dome.

A key ingredient of the program is the mentoring that’s provided by scientists and engineers on loan from corporate sponsors:

These kids really weren’t building robots. They were building relationships with serious adults, they were building an understanding of what’s possible if you put your energy and passion to things that matter.

I’d love to see a mashup of ITConversations and SIConversations that would produce more shows like that.

RESTful Live Contacts for Internet-scale social networking

It’s been an interesting couple of weeks for folks who care about RESTful web services. Dare Obasanjo kicked things off with a couple of items about the Atom Publishing Protocol (APP) and Google’s use of it for its GData project. Tim Bray bristled at Dare’s characterization of APP, and it looked like we were headed for another summer syndication flamefest. (Why do those always happen in June?) When the Web3S protocol — not the Atom Publishing Protocol — was revealed as the proposed mechanism for granting read/write access to the half-billion Hotmail contacts, Messenger buddies, and Spaces friends that comprise the Live Contacts database, I was sure it’d turn into a flamefest.

But it didn’t. And now that things have settled down a bit, I’d like to note two points that may interest the majority of folks who don’t follow the saga of RESTful web services.

First, there’s the role of the blogosphere. I’ve often talked about how the interplay of voices in the technical blogosophere models a style of professional collaboration that I expect will someday prevail more broadly. We see that happening here. Sam Ruby usefully asks whether another protocol proposed by Microsoft, SSE, might play a role in contact synchronization. Tim Bray usefully analyzes the Web3S spec and offers some excellent advice, in particular:

Get yourself a test suite! APP has already been helped by the existence of code from Joe Gregorio, me, and others. Test suites matter way more than specs, in the big picture.

Along with the technical back-and-forth you can see a social process at work as the always-tricky interplay between competition and cooperation gets sorted out. APP folks: “If you had concerns about APP’s capabilities, why didn’t you voice them sooner?” Microsoft folks: “Well, we were worried about seeming to interfere, but yeah, in retrospect, we should have.” Although some of us have started to take this interplay for granted, it’s still quite novel for most people, and it’s a remarkable thing.

The second point is that, technical back-and-forth notwithstanding, the purpose of Web3S is to open up walled-garden social networks. That’s been another — and more broadly inclusive — conversation in the blogosphere recently. Facebook’s ignorance of web reputation is part of the story. Here’s another, the Facebook friend finder:

In order to find people you may know on one or another of the popular webmail systems, you’re invited to lend Facebook your credentials so it can probe your address book on one of those systems. I understand why this happens, but it’s totally the wrong message about security and digital identity to be sending to a large community of young people.

From that perspective, Web3S is just a small part of a big story: opening up Live Contacts so there’s no need for this kind of impersonation. In his talk at MIX1 (MP3 here2), Yaron Goland lays out two scenarios. For one-off exchanges, there’s the contacts control which a third-party site can embed in one of its pages so people can selectively relay Live Contacts data into the page. For longer-term relationships with trusted services, people can grant the permission to read and update their Live Contacts directly, so that social activities elsewhere will be reflected in their own address books.

In both scenarios, you retain control. You never allow a service to use your name and password to impersonate you to another service. That’s a good thing for Facebook, which would rather not have to impersonate people. It’s a good thing for people who (whether they realize it or not) don’t want to be impersonated. It’s a good thing for Microsoft because the enabling platform services will become a business. It’s a good thing for RESTful web services3 because the API is RESTful. And it’s a good thing for everyone who believes that ultimately the Internet is our social network.


1 The RESTful instinct isn’t yet fully developed. Although the API discussed in the talk is RESTful, I had to extricate this URL from the MIX RSS feed because the navigational apparatus at http://sessions.visitmix.com/ doesn’t disclose it on the URL-line or in a permalink. Note to team: Let’s please make it easier for people to point to these things.

2 I had to extract the soundtrack from the video and then republish it. Note to team: Let’s please make it easier for people to hear these things.

3 I’m wondering, though, like Tim Bray, about the introduction of a new HTTP verb. The session blurb was: “Data wants to be free! So come to this technical deep dive to learn how you can POST/GET/PUT/DELETE your way into Windows Live.” There was no mention of UPDATE. Of course the spec was published in order to solicit feedback, so I’ll be interested to see what comes of that.

Sitemaps, segmentation, and streaming

The audio accompaniment for yesterday’s exercise hour was Tom Raftery’s interview with Brad Abrams, group program manager for Silverlight. I mention it for three reasons.

First, it’s a nice comprehensive overview of the history and mission of the Silverlight project. Now that the flurry of MIX announcements is over, this is a good time to step back and reflect on the big picture. As someone who’s been working on the .NET Common Language Runtime since its inception, Brad’s in a good position to paint that picture.

Second, it reminds me of an obvious strategy for podcasts that I’ve somehow managed to ignore: solicit questions ahead of time! Tom Raftery does that routinely. In this case people asked a bunch of great questions, Brad Abrams engaged straightforwardly with them, and the resulting show was much richer and deeper than it otherwise would have been. Given that I was an avid practitioner of this method in my journalism days, it’s crazy that I haven’t carried it forward into my podcasting. Gotta fix that.

Third, one particular segment of the interview really grabbed me. Referring to his talk at MIX (WMV, MP4), Brad discusses a strategy for exposing videos to search engines. The ingredients of the strategy are:

  1. A feature of the ASP.NET “Futures” release — DynamicDataSearchSiteMapProvider — that helps developers dynamically generate sitemaps that provide the breadcrumb trails otherwise unavailable to search engines when they visit dynamically-generated sites.
  2. An data source from which the sitemap provider can extract titles and timecodes for chapters within a video.
  3. A SMIL wrapper that provides closed captioning both to the video and, indirectly, to the web pages that the sitemap points crawlers to.
  4. A streaming server.

As an industry we’ve gone back and forth on that last point. In the beginning there was Real which primarily relied on streaming servers rather than standard web servers. The downside was that these were specialized and non-ubiquitous. One of the upsides was that they enabled random access. But then, hardly anybody took advantage of that opportunity. As you can see here, although it’s quite feasible to form URLs that point into Real streams, the details are just geeky enough to deter almost everyone.

Then things shifted. Increasingly the media encoders and players conspired to support progressive downloading. In this mode, you only need a standard webserver, serving up static files. The encoders tuck enough extra information into the files so that players can begin playing right away, after only a short buffering delay. It looks like streaming to most people, and a lot of applications and services even call it streaming rather than progressive downloading.

The upside here was that no specialized servers were needed. Any regular webserver would work, so this approach is very blog-friendly. Got an audio or video file? Just upload it to your blog, and bingo, you’re podcasting or videoblogging.

This radically democratized media publishing, and continues to do so. But, although few recognized the tradeoff, there was one. You couldn’t randomly access a static media file.

Or so most of us thought.

As it turned out, that’s not strictly true, at least not for MP3 files. I realized that some players were able to randomly access parts of statically-served MP3 files, found out how, and prototyped a gateway that would enable anyone to form a URL to a timecoded segment from an MP3 file hosted on a remote webserver.

This was an interesting result, but it was even clunkier than the methods already supported by the Real servers and players — and as we’ve seen, hardly anybody ever discovered or used their random-access features. What’s more, my method only worked for MP3 files by virtue of a special property of that format: frames are (usually) independent of one another, so you can reach blindly into the middle of a file, shove bytes at a player, and expect it to find the next frame boundary and start playing. I’m mostly ignorant of the details of video formats but, so far as I can tell, they don’t tend to work that way.

Now I wonder if we’re heading back to the future. Flash (with FLV) and Silverlight (with WMV) don’t require streaming servers on the back end, they can do progressive downloading as well. But in the services era, you’re less likely to worry about deploying your own streaming server and more likely to use an instance of one that runs in the cloud. That instance can react to requests for timecoded segments in a more intelligent way than by seeking to byte offsets.

It’s true that we failed, the first time around, to make the formation of those requests easy and obvious to people using media players. But a new generation of players — again, both Flash-based and now Silverlight-based — can be friendlier to that kind of innovation.

An example of what we should expect appears at 59:50 in Brad Abrams’ MIX talk (WMV, MP4). You search, find some title or caption text (thanks to a sitemap), click the link, and begin playing a segment at a timecode.

The hardest part, of course, is the data preparation. On my trip to the UK in January I mentioned the Open University’s FlashMeeting system which does a great job of segmenting captured video on the fly, then making it randomly accessible.

There are already too many triple-S acronyms so I probably shouldn’t do this, but the formula I’m looking for is: Sitemaps + Segmentation + Streaming.

Screencasting for public speakers

While I’m back on the topic of screencasting, I’ve been meaning to mention another important use of the medium. Recently a colleague reported severe trouble trying to present demos that rely on a live connection to the Internet. My solution is a variation of the old joke:

Patient: It hurts when I do that.

Doctor: Don’t do that!

To avoid the pain I use screencasts instead of live demos. There are a variety of reasons for doing so. An obvious one is that it makes you immune to network glitches.

A subtler reason is that it’s hard to show software in use without wasting effort and motion. You reach for the wrong menu item, you fumble while typing. These are perfectly normal and natural behaviors, but they only add dead time to your presentation and therefore, by definition, they detract from it. When you edit out the wasted motion and false starts you create an effect that isn’t quite real — it’s hyperreal — but that’s exactly the effect that you want (or anyway, I want) a presentation to achieve.

Another subtler reason is that video playback gives you more control over timing. It can be hard or even impossible to replay a piece of a demo in response to an audience question. Likewise, it can be hard or impossible to fast-forward a demo if you’re running short on time, or if you’re losing the audience. When you’ve canned your demos as screencasts, you have a lot more flexibility.

Finally, there’s just the peace of mind that comes with only having to keep track of one single media file, as opposed to lots of moving parts. When you are speaking and showing demos, the fewer moving parts, the better.

A long-delayed response to Beth Kanter’s questions about screencasting

As part of my re-exploration of the walled-garden social networks, I’ve accepted the entire batch of LinkedIn invitations that had queued up in my dormant account. One of them was a request from Beth Kanter for advice on screencasting. From my point of view, LinkedIn was superfluous in this case because the same request had already been made (implicitly) in this blog post in which Beth summarizes what she had figured out for herself, and then invites feedback.

Although we should probably not yet simply assume that linking to a blog post will draw the attention of the author of that post, the blogosphere does in fact propagate awareness in that way, and does so with remarkable speed and reliability. So in this case I’d seen Beth’s item before I began receiving requests from her via LinkedIn intermediaries. Because I was boycotting walled-garden social networks at the time, I thought this was a good opportunity to show how, in a case like this, the open Net can obviate the need for a closed network. So I replied to Beth’s blog item in a comment. Or rather, I thought I did. But although I wrote the reply it seems I never managed to post it. Oops. I’m sorry about that, Beth, and I’ll try to make up for it here.

But first, I want to note that your item is a textbook example of how to construct an online query for information. By summarizing what you’ve already learned, you’re helping bring other folks up to speed. At the same time, you’re helping me understand where and how I can add value. This custom is just good common sense, of course, but one that’s honored more often in breach than in the observance. If I were teaching this kind of thing in grade school, I’d use del.icio.us to keep lists of good examples of netiquette, and I’d put this example on one of those lists.

Now, in the context of the genre that I’ve called the conversational demo, here are your questions and my answers.

Q: How much scripting does he do prior to the interview?

A: None.

Q: Does he “rehearse” with his guest?

A: No. I do, of course, choose topics in which I’m interested, and to which I bring plenty of domain knowledge.

Q: Or does he capture everything and edit?

A: Yes. As with my podcasts, I lean heavily on editing when making these conversational screencasts. The editing happens on two levels: macro and micro. On the macro level, because we (interviewer and interviewee) know that whole scenes can be cut, we don’t need to worry about the performance. If something doesn’t work we can just call it a bad take and try again. We can also plan, on the fly, where to go next, again knowing that such discussion is effectively out of band and will be deleted.

On the micro level, there’s internal editing. The term comes from the audio domain and it applies in the same way here. If I can eliminate ums and you knows and false starts without compromising the video, I do.

Q: What tools does he use to capture these interviews?

A: For video I mostly use Camtasia in conjunction with one or another of the many screen-sharing tools. Since most software demos don’t require a high frame rate and don’t push lots of screen bits, that’s usually OK. However in this screencast about tagging in Photo Gallery — which, by the way, was edited down from 35 minutes to 14 minutes — the screen-sharing setup couldn’t keep pace with all the images. So I had Scott Dart record locally using Windows Media Encoder, and then ship me the resulting WMV file. I was able to follow along in screen-sharing well enough to carry on the conversation, even though what was displayed on my screen would have been useless for production. This is a variation of a technique that’s really useful for podcasters who are struggling with expensive and/or poor-quality phone lines. If you can get both parties to record high-quality audio locally, you can use a marginal VoIP setup to converse and then join up the high-quality audio later. I did that here and I’d love to do more shows that way.

For audio I also mostly use Camtasia. Originally I didn’t, I used Audacity, because I hadn’t figured out how to get Camtasia to record the two channels (caller, callee) from my Telos as a stereo track. Eventually I found that setting. It’s in Tools Options -> Streams -> Audio Setup.

Because the conversational screencast is a superset of a podcast, you’re dealing with all of the same audio production issues as in podcasting. For me, working remotely, that’s been an ongoing challenge. Telephone recording is just plain hard. Although I’ve been using a Telos for a while, for example, I only recently discovered that I’d been using it incorrectly. On the other hand, VoIP recording is hard too.

Granted, I wasn’t born with an audio chromosome, but then neither were most folks. So, remote audio is going to be a problem for most of us — a problem that, I reckon, somebody is going to make money by solving. At this point I can muddle through fairly well. But if I hadn’t already invested in the Telos I’d be looking really hard at the technique of recording locally on both ends and then joining the results in post-production. It’s not particularly hard to do that, and it’s really nice to simply abolish all the problems associated with the voice channel — whether it’s the public telephone system or the Internet.

Because it rarely applies to me, I haven’t mentioned the scenario in which both parties are together in the same place looking at the same computer. In that case I’d use whatever capture software was convenient. If the software were on the interviewee’s computer, I’d ask the interviewee to install the free Windows Media Encoder and capture video that way. And I’d probably use a standalone digital audio recorder with a handheld microphone to separately capture audio.

One final point from my recent conversation with Doug Kaye: a lot of people who think they don’t have digital audio recorders overlook the fact that they have camcorders which can perform that function. A related point: if you use a camcorder, it’s tempting to let it do the whole job — that is, screen capture as well as audio capture. Although my Channel 9 colleagues do that all the time, I don’t recommend it. You’d much rather use perfect screen capture than fuzzy camcorder capture. And ideally you’d like to be able to do that without installing any software on the target computer, using a direct-capture device. I’ve never seen one of those, but next week in Redmond I’ll be visiting our new production studio where I’m told we have such a beast. I’m curious to see it in action.

Q: Does he edit in Camtasia?

A: Yes, I do. I’d honestly rather edit in iMovie instead, because I find it to be more elegant and more capable, but it’s a huge hassle to get stuff in and out of iMovie so I usually take the path of least resistance and edit in Camtasia. If you want to do micro-edits in Camtasia, one important tip is to record at a higher frame rate than you will ultimately produce. A screencast is legible at 5 or even fewer frames per second. But if you only capture at that rate, you’ll find that you can’t make intra-frame audio micro-edits. So record at 15 or more frames per second, then produce at a lower rate.

Q: What are some best practices in terms of production and editing?

A: It’s tempting to jump in and start editing right away, and to be honest I often do. But I think it’s better to just watch the raw recording all the way through, setting markers along the way to annotate the segments that you want to include, discard, or perhaps rearrange. Ultimately you’re trying to tell a story, and those markers will help you visualize the outline of the story.

Sorry this took so long, Beth. I hope it helps.

A conversation with Jeannette Wing about computational thinking

This week’s show on ITConversations explores what Jeannette Wing means by computational thinking. As I noted here, she has coined that evocative phrase to suggest how the intellectual tools of computer science — including abstraction, naming, composition, state machines, refactoring, and separation of concerns — can add up to “a universally applicable attitude and skill set that everyone, not just computer scientists, would be eager to learn and use.”

At Carnegie Mellon, where Dr. Wing is head of the computer science department, this way of thinking pervades many other academic disciplines. But in her view, it’s really as fundamental as reading, writing, and arithmetic, and like those skills it should be taught in grade school. Since that’s not likely to happen anytime soon, I wonder if computer games — which already teach kids certain aspects of computational thinking — could help advance this agenda in a more deliberate way.

How do I know this person? Through the Web!

Like other social applications, Facebook wants to know how you’re connected to people. So it asks: “How do you know this person?” and presents these choices:

The choice I usually want — “Through the Web” — isn’t available. One friend coerced “Met randomly” by adding “The web as a conversation engine” — but that’s an unsatisfactory workaround. There was nothing random about how we met. Given our shared interests and our online expression of them, it was inevitable that we would come into contact.

“Through the Web” should be a first-class answer for “How do you know this person?”

Facebookizing the Web, Webifying Facebook

In preparation for a panel at the MIT Enterprise Forum I summarized my thoughts about walled-garden social networks. It came back around the other day in conversation with Avi Bryant, who — like a great many people — is appreciating the refinements Facebook brings to various protocols that work less smoothly (if at all) on today’s open Net. He reflected our exchange into a blog post where he also asks: “How do we Facebookize the open Web?”

As Avi notes in his posting, I think Gary McGraw is right when he says: “People keep asking me to join the LinkedIn network, but I’m already part of a network, it’s called the Internet.” It’s also obviously true that the walled gardens are petri dishes growing cultures that will merge with or (some believe) recreate the open Web. So although I don’t spend a lot of time in the walled gardens, I do visit them in order to learn about the cultures growing there.

Last night, I noticed that I’m one of four 1974 graduates of my high school who appear in Facebook. That’s fewer than 1% of the 600 grads. This year, 302 grads appear in Facebook. That’s 50% of the graduating class. Here’s a nice picture of the bimodal distribution over time:

In 2012, what percentage of that year’s class do we think will be participating in Facebook?

While we ponder our answers, I’d like to digress for a moment on the data supporting the chart, and on the chart itself.

Although I thought there were about 600 students in my graduating class, I couldn’t remember for sure. And I had no idea how the student population might have changed over the years. How can you check these facts? Here’s the Google query I ran:

“cheltenham high school” “student population” 100..5000

The third term exploits a powerful but little-known and rarely-used feature called numeric range search. In this case I picked a lower limit that I was sure was smaller than the population of any one grade, and an upper limit that I was sure was higher than the population of the whole school. The query finds pages with both of the exact phrases (the quotation of which is another underutilized feature of search) plus a number in the given range. From the result set, I culled two documents.

First was the curriculum vitae of Joseph Cifelli, which confirmed that in my era “the high school student population was divided into three houses of about 600 students each.” That’s 1800 total, and there were three grades — 10, 11, 12 — so 600 per grade sounds right.

Second was a page on a real estate information site which, though frustratingly undated, I presume to be reasonably current. It gives the student population as 1706. That’s close enough to 1800 for my purposes.

The chart itself is the first I’ve attempted using Excel 2007. The process feels very different from earlier versions, but I’m no expert in this area so I can’t make a detailed comparison. I do have a question for Excel wizards, though. In following Edward Tufte’s recommendation to subtract ink wherever possible, I was able to get Excel to remove almost all of the unnecessary cruft: grid lines, tick marks. But there were still a bunch of zeros that were unnecessarily reported out on the long tail of the data labels. I wound up taking a picture of the chart and then using Paint to remove those, in order to achieve what I think is an admirably clean and spare result. Is there a way to get Excel to do that directly?

Anyway, back to our question: In 2011 will Facebook’s penetration among CHS grads approach totality? Beats me. That’ll depend on the Facebookization of the Web, but also on the Webification of Facebook.

Avi’s post has examples of the former:

  • “a smart feed, which aggregates and filters all of my subscriptions in a holistic way”
  • “an API which allows access both to your blogroll data and to your smart feed…so that if I post and tag a photo of A on my blog, anyone subscribed to A’s blog – even if they have no idea who I am and aren’t subscribed to me – will get an item in their feed about it.”

Here’s an example of the latter. Facebook invites me to manage streams of photos and events. But I have other ways to manage streams of photos (e.g., Flickr) and events (e.g., Eventful). Why not enable me to hook into them?

The process of diffusion can flow in both directions, I guess, and I hope that it will.

Exeter Hospital gets WiFi right

I recently spent a long day waiting for and visiting with someone who had surgery at the Exeter Hospital in Exeter, NH. It’s a wonderful facility that’s got all sorts of things right: pleasant decor, free valet parking, an excellent and inexpensive cafeteria. But for me, it was the public WiFi that made my day. Everywhere I flipped open my laptop — in a physician’s office, in a surgical waiting room, in a patient’s room — there was always a strong signal, and it always Just Worked.

If any of the IT staff at Exeter Hospital are within earshot of this blog: Thanks! That made a huge difference for me. Twelve hours of disconnectedness would have compounded the stress of being there. Instead I had twelve hours of connectedness, I got a lot done while waiting, and was spared the tyranny of Fox News.

Of course I realize, as you folks do, that providing that experience for me is more than a courtesy. It’s a smart business decision. If I had to choose between hospitals, and if yours were only an hour away instead of two, your robust WiFi setup — as opposed to the always-spotty and now apparently nonexistent setup at my local hospital — would weigh heavily in the decision. I notice that you don’t advertise your WiFi capability. You probably should!

Airplanes, cars, sticks and stones: Brian Beckman on the physics of simulation

My wish for Channel 9 to augment the trademark videos with downloadable audio files has been granted. As a result, I was able to listen to Charles Torre’s interview with Brian Beckman during my exercise hour yesterday. Even if you’re not a gamer (I’m not), there’s a good chance you’ll enjoy Brian’s lively explanation of the physics that govern the simulation of planes and cars. I guess I’d been vaguely aware that PC flight simulators predated PC racecar simulators by many years, but I’d never thought about why.

To do a credible simulation of a plane, Brian says, you only need to account for two coordinate systems (the earth’s and the plane’s) and four forces (lift, drag, thrust, gravity). Because you can do table look-up for the values of variables such as lift and drag, and because those tables are small, you don’t need a “fast and fat” computer (fast CPU, big memory) to do the job. That’s why Flight Simulator was possible on the earliest PCs.

Since only some of us fly, but most of us drive, you might have expected automotive simulators to have arrived sooner than the mid-1990s. And they would have, were it not the case that car physics is, counter-intuitively, way more complicated than airplane physics:

If you analogize a tire to a wing — that is, the thing that generates the force — you’ve got four of them. And they’re connected to the car by these complicated linkages. Maybe they’re McPherson struts, maybe they’re independent suspensions, maybe they’re leaf springs. All of these things act differently, so each different car is going to have not just different shades of physics, but completely different kinds of physics, completely different equations.

It goes on from there: more coordinate frames to account for, complex dynamics of steering, acceleration, and braking. It’s fascinating to learn why accurate simulation of cars only recently became possible on PCs. Along the way, you’ll be reminded about the properties of the safety envelope that ordinary drivers seldom push but that race drivers always do.

The interview concludes with another counter-intuitive observation. Although the ability to crunch complex nonlinear equations at 30 or 60 frames per second is what makes these simulations possible today, Brian thinks this radically different approach is the way of the future. The Rigs and Rods system shown in that video, created by Pierre-Michel Ricordel, uses a technique that Brian calls “sticks and stones”:

Stones, which have mass, and sticks between them which are little flexible things we can model with very simple physics: harmonic oscillators. That’s all you need, just one physics model, a damped harmonic oscillator connecting pairs of stones, and, by gum, you can simulate really amazing things.

The computer is now fast enough to be able to simulate hundreds or thousands of these independent systems simultaneously, every step. We can now dispense with the really hard mathematics. If you’d asked me to bet money that this were even possible, I’d have said no. But that’s because I had this long experience where you just didn’t think about doing thousands of particles iterated in a system like this. Here’s a guy who took a fresh approach, he said let me see what I can do, and sure enough, this is a magnificent system.

We’ve heard it before, we’ll hear it again: a network of many simple parts trumps one big complex monolith. It’s a story that keeps on surprising us, but probably shouldn’t.

So, you might wonder how a video with as much visual content as this one — equation-filled slides, Mathematica screens, YouTube videos — fares in audio format. In this case, surprisingly well. It’s amazing how much information voice alone can carry. Later I did review parts of the video, and having listened I knew just which parts I wanted to see. But this exercise confirmed my belief that downshifting from video to audio is a really useful way to give people access to long-form material they might not have time to watch.

Configuration debugging for normal folks

From time to time, Phil Windley and I marvel at the vast reservoirs of tacit knowledge that we bring to the table when working with computers and software. And we ask one another: “What do normal folks do?”

Case in point is this debugging exercise in which Phil discovers, after a round of installation and reconfiguration on his Mac, that “the display was washed out.” Here was his 7-step recovery program:

  1. I noticed that the washed out effect didn’t happen until after the desktop picture had been painted.
  2. I created a clean user account and logged into it. The problem didn’t appear leading me to believe it was a user problem, not a system problem.
  3. I looked at console and system logs to see if there were any error messages. Not much information there on this problem.
  4. I removed all the start-up items. Still no joy.
  5. I compared the activity monitor processes for the account with the problem to those in the account without the problem. No significant differences.
  6. I now suspected something in the preferences. So, I moved the preferences directory to another place. The problem went away confirming that there was something messed up in the preferences.
  7. I used a binary process to put preferences back, logging in and out each time until I found the culprit.

The culprit, Phil reports, was the Universal Access area in Preferences, where a contrast slider had been pushed to the wrong setting.

Wow. It’s been more 20 years since my old BYTE pal, Tom Thompson, first showed me that binary search process. Have we made no progress since then?

Actually, we have. In both OS X and Vista, system settings are reasonably well corralled into System Preferences and the Control Panel. What’s more, you can search within those areas. So, as is always true with search, if you know the magic word you can find the right answer. The magic word, in this case, was contrast. Click this image to see a brief screencast of what Phil would have seen if he’d searched System Preferences for that word:

Style points to OS X for the graduated halos around the icons.

Click this image to see a brief screencast of the comparable feature in Vista:

This contextual (and incremental) search is really helpful! It might even be more helpful to those of us who don’t have twenty years of experience doing the binary process of elimination. But because this search feature is relatively new, I suspect that many — both old hands and relative novices alike — have yet to discover it, or develop the habit of using it.

Even so, search depends crucially on the right vocabulary. There’s always that Catch-22: if you could name the thing you’re looking for, you’d already have your answer. What complementary techniques can help?

Yesterday’s item about Koala suggests one. Although that project focuses on recording, playing back, and sharing sequences of actions in the browser, the principle could apply much more broadly. When you’re debugging a system configuration, the key question is ultimately: What changed?

Although Phil (unlike most folks) knew enough to check his logs, they didn’t tell him which system and application settings had recently changed. Logging those changes, and surfacing them in ways that we who make the changes can easily review, would be really helpful.

But we’re in the Internet era now, and our desktop systems are connected. Why not exploit that? Imagine a service that trickles the changes you make to your system and application settings up to the cloud. When you go online to research a problem, that data connects you to other people who have made similar changes. There’s already a tremendous amount of user-to-user support happening on the Net. Think how much more effectively it might happen in the context of pooled data.

Even if you don’t want to share your data with anybody else, it can be helpful just to you. Recently, for example, I moved a copy of Adobe Audition from XP to Vista. The settings didn’t come along for the ride, and it was a pain to reconstruct them. If systems and applications routinely reflected settings to a private space in the cloud, migration wouldn’t be such a headache.

Interactive data: The Dartmouth Atlas of Health Care points the way

Today’s New York Times has a story on regional variation in the availability and cost of health care. The story is accompanied by a “multimedia interactive graphic” — that is, a Flash visualization that charts the following variables on a U.S. map:

  • Reimbursements
    • Total
    • Acute care
    • Outpatient
    • Surgical
  • Surgery Rates
    • All
    • Heart bypass
    • Knee replacement
    • Mastectomy
  • Enrollees

For each mapped variable, mousing over the displayed hospital referral regions yields the local, state, and national values for that variable.

It’s nicely done. There’s no question that, as of mid-2007, this is cutting-edge data interactivity for the mainstream. But times are changing fast. The Times sourced this data from the Dartmouth Atlas of Health Care. It took me five minutes to download the surgical data, upload it to Dabble DB, and publish a similar map along with a complete tabular dump.

Of course I cheated by aggregating only to the state level, which Dabble DB can do easily, rather than to the level of hospital referral regions. And I left out the national averages. But still, it’s striking to see what can be accomplished in a few minutes with no programming.

It’s even more striking to see what you can do directly on the Dartmouth site. Suppose you were thinking about having a knee replacement done in Keene, NH, and you wondered how many of these procedures are done at the Keene hospital, at other hospitals around the state, and in Boston and New York. Here’s the answer:

Knee Replacement per 1,000 Medicare Enrollees
HSA Level Rates (2003)
Area Population Rates Ratio to
Benchmark
Surplus/
Deficit
*Keene , NH 8,047 5.07
Laconia , NH 8,116 8.60 1.70 29
Concord , NH 13,428 8.49 1.68 46
National Average 28,767,985 6.88 1.36 52,049
Lebanon , NH 9,483 6.73 1.33 16
State: New Hampshire 148,431 6.54 1.29 218
Manchester , NH 20,748 6.32 1.25 26
Rochester , NH 5,806 6.25 1.23 7
Nashua , NH 16,706 6.16 1.22 18
Dover , NH 8,462 5.16 1.02 1
Exeter , NH 10,624 4.96 0.98 -1
Boston , MA 67,651 4.90 0.97 -11
Manhattan , NY 166,112 3.32 0.66 -290

Wow! Hats off to the team at the Dartmouth Atlas of Health Care. This is the sort of thing that will change expectations about what interactive data ought to mean.

A conversation with Tessa Lau about Project Koala

For this week’s ITConversations show I talked with Tessa Lau about Project Koala, a “a system for recording, automating, and sharing business processes performed in a web browser.” I’ve been interested in that idea for a long time, and mentioned it most recently in this item on pooling citizens’ collective knowledge about the services of government websites, and about how to make effective use of those services. In a comment on that item, Koranteng Ofosu-Amaah mentioned Project Koala and suggested that I speak with Tessa about it, so I did.

Of course we’ve had macro recorders since the dawn of computing, and Koala is yet another of those. What’s different? Crucially, the ability not only to capture and replay, but also to share, performances of tasks. The descriptions of those tasks are shared on a Wiki, and they’re written in an English-like syntax that’s very close to what you’d write if you were narrating instructions, e.g.: “Enter 94301 into the Search By Zipcode textbox, then click the Continue button.” These instructions can be edited, tagged, searched, and indexed by the URLs embedded in them. In theory that will enable us to pool our experiential knowledge of web applications. In practice, we’ll see. Koala has yet to emerge from IBM’s research lab. But you’ve got to love the idea.

Language lessons

For a while now I’ve been uncomfortable with the words user and content, and with the phrase user-generated content. But although produced or created are almost certainly better generic terms than generated, I’ll admit that I’ve failed to come up with a generic alternative to user or content (a bullshit word as Doc Searls rightly notes).

The commentary attached to Jimmy Guterman’s recent plea — Don’t call me a user! — has convinced me that there may not be superior generic alternatives. But several of the comments there reach the same conclusion that I have. Use the generic terms when necessary. But wherever possible, be more specific. The word user connotes a role, but so does member or contributor or participant and, even more specifically, writer or photographer or indexer or webjay.

The latter is a complete neologism, of course, but note the effect that it had at webjay.org. The reflex would have been to say things like “OddioKatya is my favorite Webjay user.” But because the term webjay was so active and so evocative — a DJ for the web — it became natural to simply say “OddioKatya is my favorite webjay.”

Likewise content. Because a more distinctive and evocative term was available — playlist — you’d never think of saying “I love OddioKatya’s content.” Instead you’d say “I love OddioKatya’s playlists.”

While we’re at it, webjays are not generators of playlists, they are curators of them. This isn’t just pedantry. Language governs thought, and when we enrich our language we enrich our individual and shared mental lives. With evocative and precise vocabulary, we can imagine more and accomplish better.

Building conceptual bridges to a new media world

When Ryan Sholin’s manifesto on the future of newspapers appeared the other day, the blogosphere cheered loudly. “Great summary,” said one commenter, “Too bad they’re not listening.”

“They” are the newspaper writers, editors, and journalists — and the J-school teachers — whose attitudes and skills require a major overhaul:

Get over the whole bloggers vs. journalists thing…

…you and Mr. Notebook need to make some new friends, like Mr. Microphone and Mr. Point & Shoot.

Although everything on Ryan’s 10-point list is devastatingly true, it’s important to consider all the reasons why “they’re not listening.” In an item linked from his manifesto, Ryan expounds on the new skillset. The list of desiderata includes:

  1. Can you code a Flash stage for chaptered Soundslides?
  2. Can you edit audio, photos, and video into a compelling multimedia presentation?
  3. Can you manage a community of users?
  4. Can you moderate comments and forums and reader-contributed stories and photos and video?
  5. Can you build a maps mashup that feeds itself with data scraped from public records?
  6. Can you design interactive graphics in Flash?

Most of these things I’m able to do by virtue of long experience in a variety of disciplines and much intensive self-training. A few (items 1 and 6) I’ve never done. Sure, the world would be a better place if more journalists — and indeed more professionals in all fields — possessed these and related skills. It’s precisely for that reason that those of us who do possess them ought to think carefully about how to help others come up to speed. Hitting them over the head with a bewildering list of new skills foreign to them seems more likely to alienate than to inspire. Perhaps we could use our vaunted new media storytelling ability to tell stories that demystify what we’ve learned to do. What we see as stubborn foot-dragging is, in many cases, just fear. It’s a scary chasm to cross. Among our new skills, we should include the ability to build conceptual bridges that help people cross it.

One approach that’s helpful in all situations, but perhaps especially so here, is to teach by analogy. I understand why we focus on the newness of new media. There’s a lot that really is new, and it’s natural to celebrate that. But much remains the same, and it’s important to point that out too.

In my case, for example, Mr. Microphone and Mr. Camcorder were new modes of expression that I adopted only recently. As I discussed in this interview for the IDG, it was (and still is) a challenge to become proficient with these tools. But I probably haven’t said enough about how comfortably my writing and editing skills transfer into these domains. Editing audio and video feels different in some ways, but very familiar in other ways. The literary experience I brought with me into the audiovisual realm counts for a lot.

It’s the same story with interactive and data-rich information display. True, I bring some programming chops to the table that many others would lack, but the core skills are common-sense numeracy and a good sense of composition.

In my interview with Bill Buxton I asked Bill how you can help people adopt unfamiliar user interfaces. He said the same thing I’m saying here: You rely on analogy. By way of example, he proposed a force-sensitive mouse button. The harder you press, the faster it makes things scroll. Nobody’s ever used one of those, but he suggests (and I agree) that everyone could, because everyone drives a car with an accelerator pedal.

If we want newspapers to reinvent themselves, we might want to tone down the paradigm-shift rhetoric and focus more on how the important skills can and will transfer.

WS-JustRight revisited

The audio interview mentioned in my review of the new Leonard Richard and Sam Ruby book, RESTful Web Services, is now available at ITConversations. This one presented a bit of an audio challenge because Leonard speaks loudly and Sam speaks very softly. In the course of working on this show, ITConversations’ ace audio engineer Paul Figgiani reviewed my recording setup and we discovered that what I thought was a mix-minus configuration actually wasn’t. So thanks to Paul it is now, and things should be sounding better from now on.

This week’s show is timely, given the recent remarks about REST by the Burton Group’s Anne Thomas Manes. Predictably, that led to another round of WS-* naysaying. For what it’s worth, I stand by what I’ve been saying all along, in a variety of places including this InfoWorld cover story: there’s WS-Heavy, there’s WS-Lite, and for every situation there’s a WS-JustRight that may rely on elements of one, the other, or both. The Richardson/Ruby book brings much-needed clarity to the WS-Lite end of the tolerance continuum, and that’s a great thing. But when we celebrate one end of the continuum, why must we deprecate the other? As Mike Champion recently noted:

The WS technologies are taking hold, deep down in the infrastructure, doing the mundane but mission critical work for which they were designed.

There are plenty of nails to be hammered and, thankfully, we’ve got more than one hammer.

Commercial software and social innovation

ITConversations, the channel where my weekly podcast appears, has a sister channel called Social Innovation Conversations, which is an initiative of Stanford’s Center for Social Innovation. Over the weekend, while working around the house and yard, I caught up on my SIConversations listening. One compelling episode was a conversation with Michael Pollan, author of The Omnivore’s Dilemma, and John Mackey, CEO of Whole Foods. The two met initially after Mackey responded to criticism, in Pollan’s book, that Whole Foods is fostering the industrialization of the organic movement. Here’s a review of their joint appearance at Berkeley, which helpfully includes links to the back-and-forth correspondence published prior to the event.

I should probably take it for granted, by now, that issues raised in that book (which I haven’t yet read) are being discussed by the author and one of his subjects. And that those ongoing discussions are available to everybody in written form as well as in audio and video. And that the reactions to those ongoing discussions are similarly available to everybody. But somehow I can’t yet take any of this for granted. It still amazes me every day.

In any event, one of the points Mackey makes in that conversation is that although there are certainly problems associated with large-scale commercial production of organic food, it’s not inherently a bad idea to operate such a business at scale. On the contrary, sustaining a business at scale is part of what will make it possible to sustain a healthier food network.

That’s a common refrain on SIConversations. Most speakers believe, and argue, and in many cases have proved, that the changes they seek — in the realms of food, energy, health, and the environment — not only represent sound business opportunities and, in fact, but can only be accomplished with the help of sound business practices. So I was a bit surprised by the following comment from Nic Frances in another episode of SIConversations featuring three social entrepreneurs who “discuss what is takes to unleash the power of business to make the world a better place.” He said:

I looked at Microsoft and Gates, and thought, this man has changed AIDS like nobody else has on the planet. He has brought more money than has ever been brought to the issue. He’s brought the focus of somebody who knows how to grow a business. And he said ‘We’re going to change it.’ But actually if you really wanted to change AIDS or poverty in the world, what you would do is give away Microsoft free as an open platform for people to share information.

That was an oddly discordant note. Why would a social innovator who sees business opportunity everywhere else see none in software? Now of course, software is a very different kind of commodity from food, energy, or healthcare. But it’s also a fundamental enabler of the reorganization of the networks that deliver those commodities. Free software will play a key role in that reorganization, and so will commercial software. I’d love to hear an SIConversations show that explores the business opportunity for commercial software as an enabler of social innovation.