A conversation with Joel Selanikio about collecting public health data in developing countries

For this week’s ITConversations show I interviewed Dr. Joel Selanikio, co-founder of DataDyne, a non-profit consultancy dedicated to improving the quantity and quality of public health data. DataDyne’s principal tool is EpiSurveyor, a free and open source software product that simplifies the creation of forms for doing field data collection with handheld devices. There’s a Windows-based forms designer, and a runtime for Palm OS-based PDAs which is being ported to Windows Mobile- and Java-based devices.

It was a great interview but, when I opened up the audio file I’d recorded, I was horrified to find that an audio glitch had rendered it unusable. So instead I’ll report here on what Joel told me, and weave in some quotes I was able to salvage.

Our conversation was well-timed because I’d just watched the dustup between Michael Moore and Wolf Blitzer, checked out Moore’s rebuttal of Sanjay Gupta’s report on SiCKo, and tracked down some of the cited sources — including the United Nations Human Development Report. How reliable are these sources, particularly for developing countries? As you might imagine, and as Joel’s experiences confirm, there’s a lot of guessing going on:

It’s amazing how unaware people are of the tenuous nature of our knowledge of, really, anything. One of the things I ask people is: “What’s the population of the United States?” And they’ll say 290 million, or 300 million. But the real answer is: We don’t know. And we check, every 10 years, and we do a pretty good job of checking.

So if you want to know what’s the leading cause of disease in children in rural Africa, what’s the chance that you’ll have any idea what the answer to that question is?

I was a first responder to the tsunami in Southeast Asia. Imagine showing up in a place where the slate has been wiped clean, God just slammed his hand down and flattened everything. The roads are gone, and three or four thousand aid workers are all clustered in the few places they can get to. But of course, you have to come up with an estimate of how many people are dead. So somebody picks a number, and then you hear it on CNN that night. Fifty thousand, a hundred thousand, a hundred and twenty-five thousand, none of those estimates were based on any attempt to really find out.

A friend who works for American Red Cross asked me what I thought they could do. I said the most valuable thing to do with the hundreds of millions of dollars of donations they’d received was to invest in data collection. Normally in a situation like that you do a sample. You go to ten percent of households, and try to extrapolate, and hope that your sample isn’t biased. But I said, with five hundred million dollars, there’s no reason we can’t get the local people to do a census. Go to every refugee area and every household and actually find out — not estimates — but actual numbers, which would be of huge importance for reconstruction.

So of course it didn’t happen.

As a one-time database programmer who went to medical school and then became an epidemiologist, Joel’s acutely aware of the relationship between information technology and epidemiology, a discipline that is, as he reflects on here, profoundly data-driven:

In about 1995, over the course of six weeks, kids start showing up at the university hospital in Port-au-Prince, Haiti. They had different symptoms, but they all died. At about the hundred mark the local docs contacted the World Health Organization who contacted CDC, and a colleague of mine and I went down to Haiti. It’s a high-pressure situation, kids are dying every day. When we got there we began creating a database of responses to questions: what their symptoms had been, what medications they had taken. Within a few days it became apparent that all of the kids who’d died had taken one of several locally-produced Tylenol-like medications. Once we discovered that, they made an announcement and the outbreak ended.

People would ask me, “What magic did you work?” Well, in clinical medicine, the way that we understand things is — if it’s a rash, I look at the rash, I think about it, I look stuff up, but I don’t systematically create a database. For one patient you can juggle the variables in your head. But when you have a population of affected people, you need to collect data and analyze it. That’s the basis of epidemiology.

Unfortunately our standards and methods for data collection are far lower in the realm of public health than in the realm of business:

Imagine if you were the CEO of Toyota, and your CFO said, well, sales are pretty good, we think, but we’re not sure. He’d flip. And yet that’s how things are with public health. We’ve been making a concerted effort for fifty years to get rid of malaria, but the quality of our statistics is terrible, and we’ve just gotten used to that.

A key reason for that poor quality is that the collection of public health data in developing countries is still mostly a paper-based activity. And while handheld devices are obviously a great alternative, what Joel found is that the software available for creating surveys was way too hard for ordinary folks to use:

If you’re the ministry of health in Kenya and you want to survey a hundred thousand households, and use handhelds to do it, you’ll need some knowledge of programming. If I told you to write down your questions in Microsoft Word you could easily do it, it’s frictionless, but with the commercially available software for creating surveys that run on PDAs, you can’t do it. That software can do all kinds of fancy things, of course, but most of the time the information you need to collect is very simple stuff.

So that’s what EpiSurveyor does, Joels says. It makes the simple stuff simple, so that ordinary folks in developing countries can create surveys without having to hire programmers and consultants.

But how can you have any assurance that the data gathered in these kinds of surveys will be usefully comparable? Are there standard forms and standard schemas? Not really, Joel says. The existing forms are hard to find and reuse, and there’s been little progress toward standardization:

If you went to a UN organization and said, we want to standardize how we collect data about child nutrition, the response would be, let’s have a conference. We’ll have experts get together in Rome, and then in Paris, and decide what are the key questions for any standard child nutrition survey. But it’s hard to achieve unanimity, and there’s a built-in incentive not to because every time you get together it’s a trip to Rome.

Coming at the problem from a grassroots web 2.0 persective, Joel’ working to translate the various forms used by international agencies into EpiSurveyor’s XML format, and to make them available in a shareable repository. The notion is that reuse will occur naturally when it lies along the path of least resistance. And he sees that starting to happen. For example, having trained field workers in both Kenya and Zambia, he discovered — after the fact — that the Zambian workers had found, and reused, a Kenyan survey which they’d found on DataDyne’s private project management site.

My Zambian contact said, Joel, I hope this is OK, but I downloaded their form, and opened it up, and made a few changes — basically just the names of provinces — and then I used their form.

Of course it was more than OK, he was delighted. Asking the same questions, in the same ways, is exactly what you want to happen, and yet it rarely does.

The forms repository that Joel envisions doesn’t yet exist, but he’s hoping that as DataDyne builds up a reputation around successful deployments of EpiSurveyor, the company will be able to attract the resources and the attention needed to make that happen.

Posted in .

6 thoughts on “A conversation with Joel Selanikio about collecting public health data in developing countries

  1. Incredible! Too bad the audio was lost, but I’m happy you transcribed it. This stuff makes my IA heart beat faster :)

Leave a Reply