This week’s ITConversations show is a chat with Carl Malamud, whose exploits I’ve followed ever since he launched podcasting a decade ahead of schedule with a project called Internet Talk Radio. Since then, Carl’s mainly known for his tireless crusade to release troves of public information to the Net: SEC filings, patents, Congressional video, historical photographs, and most recently, U.S. case law.
One of the questions I wanted to explore with Carl is also raised here by John Montgomery:
Popfly, a mashup tool, depends on three things: data that is simple to access programmatically, interesting, and available under terms that enable users to work with it. As with most software endeavors, you can pick two.
The government has a huge amount of interesting data that’s available under really great terms. Weather? Check out http://www.noaa.gov. Financial information? Start with http://www.sec.gov. Crime statistics? Dig around in http://www.usdoj.gov/. But how much of this is programmatically accessible? Very little, as it turns out.
John mentions the Sunlight Foundation’s efforts to provide an intermediary layer of services that make raw data easier to access and manipulate, and I raised that point with Carl. From his perspective, of course, it all starts with the data which he is rightly focused on providing. Even though the U.S. is far ahead of many other countries in this regard, there are oceans of important information not yet available even in raw form.
Carl has enormous faith in the Net’s ability to interconnect and enhance these raw sources, and I do too. Here’s a small but significant example. If you view source on 28 Fed.R.Serv.3d 415, you’ll see one of my favorite strategies at work: semantic metadata encoded using CSS style tags. That enables an important kind of programmatic access. Now it’s true that today, Internet search engines don’t support queries that ask for documents where Shelby Reed appears as a plaintiff in an appeal to the U.S. Court of Appeals, Fifth Circuit. Someday, though, that kind query will be supported, and the latent semantics of this rendering of U.S. case law will emerge.
These enhanced services don’t necessarily just arise from the grassroots, however. Resource-rich organizations are often in the best position to provide them. One example, we agreed, is the New York Times’ stunningly effective visualization of presidential election debates. Ideally we’d be able to visualize all of the proceedings of Congress in the same way. That’s probably too much to expect of public-interest groups running shoestring operations. But what such groups can do is apply Carl’s favorite technique: Create a few high-profile examples, and then pressure the government into internalizing the process.
For programmatically accessible government (and other) statistics, please take a look at my site, numbrary.com. There are other sites working on this problem as well (e.g. infochimps.org). Swivel and Many Eyes are doing a little of this, but since both depend entirely on user contributions they don’t provide comprehensive source coverage.
-Jason