A conversation with Carl Malamud about access to public information

This week’s ITConversations show is a chat with Carl Malamud, whose exploits I’ve followed ever since he launched podcasting a decade ahead of schedule with a project called Internet Talk Radio. Since then, Carl’s mainly known for his tireless crusade to release troves of public information to the Net: SEC filings, patents, Congressional video, historical photographs, and most recently, U.S. case law.

One of the questions I wanted to explore with Carl is also raised here by John Montgomery:

Popfly, a mashup tool, depends on three things: data that is simple to access programmatically, interesting, and available under terms that enable users to work with it. As with most software endeavors, you can pick two.

The government has a huge amount of interesting data that’s available under really great terms. Weather? Check out http://www.noaa.gov. Financial information? Start with http://www.sec.gov. Crime statistics? Dig around in http://www.usdoj.gov/. But how much of this is programmatically accessible? Very little, as it turns out.

John mentions the Sunlight Foundation’s efforts to provide an intermediary layer of services that make raw data easier to access and manipulate, and I raised that point with Carl. From his perspective, of course, it all starts with the data which he is rightly focused on providing. Even though the U.S. is far ahead of many other countries in this regard, there are oceans of important information not yet available even in raw form.

Carl has enormous faith in the Net’s ability to interconnect and enhance these raw sources, and I do too. Here’s a small but significant example. If you view source on 28 Fed.R.Serv.3d 415, you’ll see one of my favorite strategies at work: semantic metadata encoded using CSS style tags. That enables an important kind of programmatic access. Now it’s true that today, Internet search engines don’t support queries that ask for documents where Shelby Reed appears as a plaintiff in an appeal to the U.S. Court of Appeals, Fifth Circuit. Someday, though, that kind query will be supported, and the latent semantics of this rendering of U.S. case law will emerge.

These enhanced services don’t necessarily just arise from the grassroots, however. Resource-rich organizations are often in the best position to provide them. One example, we agreed, is the New York Times’ stunningly effective visualization of presidential election debates. Ideally we’d be able to visualize all of the proceedings of Congress in the same way. That’s probably too much to expect of public-interest groups running shoestring operations. But what such groups can do is apply Carl’s favorite technique: Create a few high-profile examples, and then pressure the government into internalizing the process.

Perspectives: Understanding CardSpace with Vittorio Bertocci

The second installment of Perspectives is up, with Vittorio Bertocci, author of Understanding Windows CardSpace. This interview was recorded a few months ago, and has been waiting for the Perspectives site to launch. In January I excerpted the part about omnidirectional identity, a difficult phrase that I continue to struggle with. Maybe a better one is Internet persona: the social mask that you project when you self-publish online, and to which reputation attaches. Whatever we call this phenomenon, its Laws of Identity — not only for people, but also for digital object — are not yet well defined.

Most of the interview, though, concerns the existing “unidirectional” mechanisms supported by CardSpace. I asked Vittorio to relate those mechanisms to precursors like SSL client certificates and Kerberos, and also to the complementary OpenID system. As discussed in my ITConversations podcast with Dick Hardt, the principles that govern this identity machinery are abstract and, until we experience them firsthand, will be hard for most of us to grasp. But Vittorio does a good job of explaining those principles in terms of concrete examples.

A close call: photos lost, then found

While reviewing a white paper by a colleague on the subject of personal digital archives, I realized that I hadn’t followed through on a plan to consolidate a few different caches of digital photos from various digicam and computer eras. So of course, when I went looking, things weren’t exactly the way I remembered. One particular batch was missing, and there were some anxious moments while I booted up dormant computers and mounted shelved disks. In the end I found the missing set, but although I could have sworn they were in three safe places, there was really only one.

In these moments of panic, the need for a lifebits service becomes crystal clear. But the moments pass, and we move on. Most people, most of the time, don’t yet feel the need for that kind of service.

Inevitably that will change. I wonder how, and when?

When the LazyWeb gets too lazy

I’m running a couple of services that make automatic use of Amazon wishlists, and today I noticed that the current version of the API is going away:

503 – Service Unavailable

ECS3 is currently unavailable due to a planned outage in preparation for the complete shutdown of ECS3 on March 31, 2008.

After March 31, 2008, we will no longer accept Amazon ECS 3.0 requests. Please upgrade to the Amazon Associates Web Service (previously called Amazon E-Commerce Web Service 4.0) by then to ensure that you or your customers are not affected by the upcoming deprecation.

Amazon ECS 3.0 deprecation was announced a year ago in February 2007. You can read the original post at http://developer.amazonwebservices.com/connect/ann.jspa?annID=164.

In preparation of the March 31st deprecation, the Amazon ECS 3.0 web service will experience several outages. The complete outage schedule can be viewed at http://developer.amazonwebservices.com/connect/ann.jspa?annID=276.

Please refer to the migration guide for assistance in mapping Amazon ECS 3.0 calls to their Amazon Associates Web Service 4.0 equivalents. You can find the migration guide at http://developer.amazonwebservices.com/connect/entry.jspa?categoryID=12&externalID=627. Please use the Amazon Associates Web Service forum to ask technical questions and share answers with your fellow developers.

We thank you for being part of Amazon’s Developer community and look forward to your continued support.

Like Rich Burridge, I’ll be needing a replacement for PyAmazon, the Python module Mark Pilgrim wrote long ago to simplify use of the original Amazon API.

In our modern world of aggregation, search, and syndication, it’s easy to wait and see what will happen. I went to bloglines and searched for blog items that — like Rich’s and now mine — point to Amazon’s page about migrating to the new API. And then I subscribed to that search.

In a way, this is too easy. I can imagine a bunch of people camped on that query, watching the clock and waiting for someone else to step up to the plate before March 31. The first time around, when Amazon web services were new and shiny, it was cool to be that person. Now, not so much.

Update: A couple of folks have pointed to PyAWS. As mentioned in Rich Burridge’s blog entry, it doesn’t seem to offer, e.g., a single call to retrieve all items from a wishlist. However, when I reviewed my use of the earlier PyAmazon, in terms of raw interaction with the RESTful API and its XML output, I remembered how simple that interaction was. It’s just as simple in the new Amazon API, just slightly different. Encapsulating what I needed to do required only a few lines of code.

Generalizing that encapsulation is much harder. And when you have to repeat that hard work for many different languages, and for many different APIs, the inevitable result is that these per-language API wrappers tend to lag.

That’s one reason I’m looking forward to services built on Astoria ADO.NET Data Services, or an equivalent normalization layer. I think it can substantially narrow the gap between RESTful APIs and the convenience wrappers we enjoy in various programming languages.

A conversation with Ward Cunningham about visible workings and aboutus.org

This week on ITConversations I have a two-part interview with Ward Cunningham. In part one, we explore his implementation of Brian Marick’s visible workings idea, which combines software testing with business process transparency. This is one of those transformative ideas that will not, at first, seem interesting and important to most people. And maybe it never will. But then again, Ward has a track record. The wiki idea didn’t at first seem interesting and important to most people either, and look what’s happened there. So, you never know. Maybe in 2020 we’ll notice that business software is a lot more reliable and understandable than it used to be, and we’ll look back and say: Ward did it again.

In part two, we discuss Ward’s new wiki-based venture, aboutus.org. It’s a directory that aims to become a sort of extended WHOIS database, where domain name owners — along with anyone who reads the websites attached to those domains — can collaboratively describe the people, companies, and organizations represented by those websites. I like the concept, but I wish it weren’t necessary to sign up in order to update http://aboutus.org/jonudell.net. Instead I’d prefer to describe myself on my own hosted lifebits service, wherever that might be, and then syndicate the information to aboutus.org and elsewhere.

Missing the cluetrain

I wasn’t going to post this humorous anecdote but Mike Caulfield reminded me that it’s too funny not to share. After musing about a subscription service for running shoes, I walked in my local store, bought a new pair, and invited them to notify me in three months. Hilarity ensued.

He: We’re not really set up to do that.

Me: You could email me.

He: Yeah, but then we’d have to keep some kind of customer database on the computer.

Oh, right. Having a database of customers who’ve invited you to contact them on a regular basis … that’d suck, wouldn’t it?

Perspectives, a new interview series, launches today

Today I’m launching a new Microsoft-oriented interview series called Perspectives. The show will touch on a variety of topics including robotics, digital identity, e-science, and social software. I’ll be speaking mostly with passionate Microsoft innovators, and sometimes also with key partners from academia and industry.

The format is an audio podcast and a blog, where the blog provides a partial (but substantial) text transcription in order to make these conversations accessible to folks who don’t listen to podcasts, and also to expose them to the Net’s ecosystem of search, linking, and aggregation. Where appropriate, I’ll also use screencasts to show software in action.

Perspectives runs on the same publishing platform that supports Channel 10 (for enthusiasts), Channel 8 (for students), TechNet Edge (for IT pros), and VisitMIX (for Web designers and developers). (Channel 9, the original site, will migrate to this platform too.) Perspectives intersects with the interests of all these sites, but it doesn’t really belong in any of them, so we’ve created an independent home for it. Thanks to the EvNet team, especially Duncan Mackenzie, David Shadle, and Jeff Sandquist, for making that happen.

The first episode, with Henrik Nielsen and Tandy Trower, explores the Microsoft Robotics initiative. We discuss why robotics is — as futurist Paul Saffo believes — a Next Big Thing. And Henrik and Tandy explain how the concurrency and decentralized-services infrastructure that supports the robotics platform is broadly relevant in an era of loosely-coupled services.

Ann Arbor’s public library is a beacon of progress

On the Ann Arbor public library’s website you can find a wonderful example of how two local institutions — the library and the police department — can work together to curate an online exhibit. In 2002, history buff and police sergeant Michael Logghe self-published the lavishly illustrated True Crimes and the History of the Ann Arbor Police Department. The library worked with Logghe to produce an online version of the book. And when he visited the library to speak about the book and the online exhibit, his talk was recorded and made available for download (as video or audio-only) from the library’s podcast feed. Nicely done!

In my Remixing the library talk, I said that the two-way web paves the way for this kind of productive teamwork. It’s not a natural reflex, as Cassandra Targett points out:

It’s a shift from being passive recipients of the world’s knowledge to active participants in its creation, a shift that in many ways goes against some of the deepest core principles of what has become library science.

For a profession steeped in the idea that our role is to describe packaged knowledge and then help people find it (and play no role in how they use it once we point the way to it), the idea that we can not only modify some types of packages or even create substantially new ones is quite foreign still.

As I noted in my interview with Adrian Holovaty about EveryBlock, the curatorial collaboration among local governments, newspapers and libraries can encompass more than text, images, audio, and video. Those same institutions can work together to curate data about the operation of government (crime, taxes, maintenance), about social and civic life (event calendars), about the environment (weather, air quality), and more.

Although it’s starting to happen more in the scientific realm, I haven’t yet found a good example of that kind of data-oriented collaboration in the civic realm. But the teamwork shown by Ann Arbor’s police department and public library embodies the spirit that will make it happen.

Linking to excerpts from the MIX keynotes

John Lam asked how to excerpt fragments of Steve Ballmer’s keynote, and the principle of keystroke conservation requires me to answer here. The VisitMIX page for the keynote lists three streams. The links point to .asx files, which are wrappers around references to media files or streams. In this case, the references point to streams, which means that you can excerpt fragments by specifying the starttime and duration parameters.

Here’s the medium-bandwidth .asx file into which I’ve inserted starttime and duration parameters to create a fragment that points to a question and answer about HealthVault.

<asx version="3.0">
  <title>mix08: steve ballmer</title>
  <entry>
    <title>mix08: steve ballmer on healthvault</title>
    <starttime value = "52:50.0"/>
    <duration value="1:45"/>
    <copyright>copyright 2008. all rights reserved.</copyright>
    <ref href="mms://istreampl.wmod.llnwd.net/a269/o2/microsoft/300_microsoft_mix_080306.wmv" />
  </entry>
</asx>

I’ve posted the file at http://channel9.msdn.com/media/ballmer-keynote-healthvault.asx. It should play in Windows Media Player, and also in VLC on the Mac or Linux though I can’t check those at the moment.

In general, launching appropriate media players from a web page is a complex process. I’m hoping and expecting that Silverlight, over time, will simplify it, and help make rich media more granularly linkable.

A conversation with Michael Lenczner about community wifi in Montreal

In Montreal this Friday, McGill professor Darin Barney will be giving a version of his talk on citizenship and technology. Here’s an excerpt:

Each of the telegraph, telephone, radio and television was accompanied by its own heroic rhetoric of democratic transformation and reinvigorated civic engagement. None have delivered fully on this promise, but each has been crucial for the maintenance of a system of political and economic power in which most people are systematically distanced from the practice of citizenship most of the time. For the most part, these technologies have been means of anything but citizenship: spectacular entertainment; docile recreation; habituation to the rhythms of capitalist production and consumption; cultural normalization. The internet, as a radically decentralized medium whose capacity for publication and circulation far surpasses that of its broadcast predecessors, has certainly provided the means by which politically-engaged citizens can access and produce politically-charged information that would never have seen the light of day under the regime of the television and newspaper. This information can be an important resource for political judgment. But the Internet also surpasses its predecessors as an integrated medium of enrolment in the depoliticized economy and culture of consumer capitalism. This is why we should be wary of equating more and better access to information and communication technology with enhanced citizenship.

One Montreal resident deeply influenced by Barney’s critique of the Internet as an enabler of citizenship is Michael Lenczner, whom I interviewed for this week’s ITConversations show. Mike is a co-founder of ÃŽle Sans Fil, Montreal’s community wireless network. With over 150 access points and nearly 60,000 users, the project is a huge success, all the more so given that municipal wi-fi projects in other cities have failed to materialize. And yet, Mike questions the value of what’s been accomplished. The project’s goal was not merely to light up hotspots in downtown Montreal, but to enhance the “sociality” of the city and elicit more and better civic engagement. He doubts these goals have been achieved, and asks himself hard questions about how technology can be deployed to these ends.When I met Mike recently in Montreal, I said: “It amazes that you’re asking yourself these questions. He replied: “It amazes me that others don’t.”

Automation and accessibility in Silverlight and IE8

In this interview at MIX, Mark Rideout explains how Silverlight will use the same UIA (User Interface Automation) mechanisms that make Windows apps (and will make Linux applications) accessible by way of assistive technologies like screenreaders.

If you’re not somebody who needs that kind of assistance, you may not think this matters to you. But as I’ve pointed out in a series of essays, the flip side of accessibility is automation, and that’s something we all need.

For software developers, the automation framework provides the hooks needed to test the interactive behavior of applications.

For users, it provides the hooks needed to record, exchange, and replay software interactions. In The social scripting continuum I showed how IBM’s CoScripter enables people to share their knowledge of how to use web applications. It’s fabulous, but it’s restricted to the domain of simple web apps running in Firefox. IE, Ajax, Flash, Silverlight, and desktop apps are all out of scope.

With an automation/accessibility framework common to browsers, rich runtimes in browsers, and desktop apps, you could in theory enable a common way for people to describe and share their knowledge of how to use software across the full range of application types, for any browser, any rich runtime, and any operating system.

We’re not there yet, and we may never get there, but this Silverlight announcement points toward a future that’s worth imagining.

Update: In related news, John Resig notes that IE8 supports the W3C’s ARIA (Accessible Rich Internet Applications), which makes Ajax applications accessible to screenreaders. Here’s a brief guide for the perplexed, myself included, because this stuff is a layer cake.

Native accessibility toolkits, like MSAA (Microsoft Active Accessibility) and ATK (Linux Accessibility Toolkit), are the foundation. The Mozilla implementation of ARIA rests on this layer, as does the IE8 implementation. User Interface Automation (UIA), meanwhile, is part of the .NET Framework. It can be used to automate unmanaged apps like Word, as well as managed apps on the desktop or (now) in Silverlight. How UIA will be realized on Linux is something I don’t know, but would like to find out.

I can’t formulate a unified field theory that joins all these pieces, on various platforms, but I hope one will emerge.

Permalinking the Hard Rock Memorabilia exhibit

The Hard Rock Memorabilia exhibit is a great example of what becomes possible now that Seadragon Deep Zoom is integrated into Silverlight 2. The exhibit includes:

Madonna’s page in her high school yearbook:

Pat Boone’s shoes:

John Lennon’s handwritten lyrics to Imagine:

And there’s much more. When you choose subsets — by artist, decade, type (e.g. clothing, instruments), genre, location — the images retile, and they’re all navigable using Deep Zoom’s extreme zoom and pan capability.

Note that the links above lead directly into the exhibit and focus on the indicated asset. You acquire these from the Share link in the right pane, which exposes URLs of the form:

http://memorabilia.hardrock.com/Default.aspx?AssetId=8191

It’s great to see this permalink feature included. Deep Zoom is going to open up vast spaces for exploration, and in order to explore those spaces together we’ll need shared coordinate systems.

To that end, I’m hoping that future incarnations of this sort of exhibit will expose richer URL namespaces. If I want to show you Madonna’s yearbook in the context of the 1970s, I have to tell you to click Decade, then 1970, then choose the 2nd item in the 3rd row. It’d be great to be able to get you there directly:

memorabilia.hardrock.com/decade/1970/20352

And of course I’d want to locate Madonna for you, among her other classmates, by zooming to the desired view and then tacking those coordinates onto the URL.

If these precise locators are made available, conversations about the views they identify can form on the web. To see why it’s crucial to expose a public namespace, consider the David Rumsey map collection. There you can explore and precisely annotate an extraordinary collection of historical maps. And you search for those annotations within the Java-based viewer. But when you annotate a feature within a map, it doesn’t — so far as I can tell — produce a shareable URL. If those URLs were available, the collection would be woven into public discourse to a far greater degree than it is.

A couple of years ago, I asked whether rich Internet apps can be web-friendly. One of the reponses came from Kevin Lynch at Adobe, who made this example showing how navigation within a Flash exhibit of images can be reflected on the URL-line.

I don’t think it matters much whether you expose the RIA’s state on the URL-line or by means of a permalink. What matters is that you do it, and do it in as granular way as makes sense for the application.

PS: For extra credit, it’s nice to provide the underlying data for this sort of exhibit. When you’re exploring the Cubism timeline, for example, you can grab the data and mix it as you please.

WebSlices can help popularize feed syndication

With the release of the first public beta of Internet Explorer 8, two new features come to light: Activities and WebSlices. You can see a demo of both in Joshua Allen’s interview with Jane Kim. I think of Activities as next-generation bookmarklets, and also as kissing cousins to the OpenSearch providers that you can add to the browser’s search box.

WebSlices are something else again. They transform pieces of web pages into little feeds that you can subscribe to. For all its power and utility, feed syndication hasn’t yet really sunk into the consciousness of most people. I’m hoping that WebSlices, which are dead simple to create, will help bridge the gap.

Here’s a complete working example of a page with two slices:

<div class="hslice" id="1">  
<p class="entry-title">Slice 1</p>  
<p class="entry-content">This is slice 1.</p>
</div> 

<div class="hslice" id="2">  
<p class="entry-title">Slice 2</p>  
<p class="entry-content">This is slice 2.</p>
<div>

The syntax is based on the hAtom microformat, which in turn is a subset of the Atom feed format. For my purposes here, ’nuff said about that. I’m much more interested in what users will see, do, and understand. Let’s view that page in IE8:

The orange feed icon in the toolbar changes to a (presumably not final) purplish thingy. And when I hover over the second slice, another of those pops up. Both are lit, indicating there’s fresh content.

From either the toolbar or the inline hover, I can subscribe (to just the second slice) like so:

It shows up as a favorite, bolded to indicate fresh content:

From another page, I can peek at the slice’s content by clicking its button:

But when you click Favorites->Feeds, you’ll see it’s also a conventional feed:

I like this for a couple of reasons. First, because it will give microformats a big boost, and propel the data web forward. Second, because it will introduce many more people to the whole idea of subscribing to feeds. There’s a big conceptual barrier there that we haven’t yet brought most people across. I’m hoping that a new way of subscribing to a new kind of feed will also raise awareness about the old ways of subscribing to conventional feeds.

Ward Cunningham’s implementation of Brian Marick’s “Visible Workings”

In Portland last week I visited with Ward Cunningham, whose pragmatic and humane approach to the art of software informs everything he touches: the Wiki, object-oriented, agile, and test-driven programming, the framework for integrated test. (InfoWorld stuff about Ward here, here, and here.) Ward’s living the startup life these days, at aboutus.org, which describes itself as a “socially editable directory of the internet.” Think WHOIS morphed into a Wikipedia where you are not only permitted, but actively encouraged, to write the biography of your company or community.

But that’s not what we mostly talked about. Instead Ward took me behind the scenes at the portal for the Eclipse Foundation. Only members can participate in the workflows accessible through this portal: electing new committers, scheduling project reviews. But it turns out that anybody can explore the portal use cases.

Here’s a simple one: Change Personal Address. This is the part of the system that runs when a member changes facts about his or her address. You can see a test script that exercises this part of the system. You can even run the test script and inspect the results. Try that, and you’ll see that the output interleaves lines of script with renderings of what the users sees: screenshots, emails.

Finally you can swim the test. Here the steps and results are laid out in a table. Time advances as you move down the rows, and there’s a column for every actor in the workflow.

When you hover over an action step or a notification, the corresponding screenshot or email message pops up. This is a great way to visualize a complex email-mediated workflow that can involve many actors, and unfold over many days. But here’s the kicker: the visualization is also available to users, directly from the interface. Here’s the screen that you see when you’re changing your address:

Next to the Save button there’s an Explore link. If you click it, you’ll discover the same swim visualization that anyone, anytime, can explore here. Note the variations, most interestingly the one for the case where the person is a committer, and where the address change either does or does not coincide with a change of employer. If you did change employer, you’re going to get this email informing you that additional paperwork is required:

This isn’t just an innovative approach to software testing and workflow visualization. It’s also a radical statement about business process transparency. For most of us, most of the time, business systems are black boxes whose internal workings we can only discern in the outcomes of our (often painful) interactions with them. But what if you could find out, before pressing the Save button, what’s going on in that black box? And what if your way of finding out wasn’t by reading bogus documentation, but instead by probing the system itself using its own test framework?

It’s a huge idea. In a blog about this project, Ward writes:

The MyFoundation portal, once again, respects the curiosity and intellect of its users by exposing all aspects of the processes it supports. Who asked for this? No one. No one thought to. That doesn’t mean it isn’t needed.

Brian Marick calls this Visible Workings. He identifies a middle ground, between the traditional GUI presentation and the raw source code that produces it. This middle ground makes the application both explanatory and tinkerable. The portal’s swim diagram is our middle ground. We know it makes our work explanatory and look forward to investigating the tinkerable aspects too.

And elsewhere:

Online forms have too much in common with income tax forms. Nobody likes filling out either one. Each is a sea of fields, each field another question, one question after another. It is like being interrogated. Can we make filling out a form more like a conversation than an interrogation? The portal’s explore links suggest a way toward this goal. These links let you ask a question every now and then. You get to ask, “why do you ask?” Wouldn’t it be great if you could always do that?

Amen, brother.

A conversation with Adrian Holovaty about EveryBlock.com

For this week’s ITConversations show, Adrian Holovaty joins me to chat about EveryBlock, a new website that gathers and publishes “address-specific” information such as crime reports, building-code violations, and restaurant inspections.

Acquiring this information isn’t frictionless and raises questions about how this kind of data can be published usefully, as opposed to merely published. EveryBlock also raises broader questions about news gathering and reporting. The project, which is funded by a Knight Foundation grant, has attracted some criticism for not being journalistic in spirit. But Adrian Holovaty suggests that EveryBlock actually redefines news.

The previous criterion for something being covered in the newspaper was that it has to affect a lot of people in the readership. But if the pothole is fixed on your block, it’s news to you, just like what your friends are doing on Facebook is news to you. Instead of a friend feed, we’re making an address feed.

More broadly, as information that used to yield only to investigative shoe-leather starts to flow freely on the Net, journalists will be able to divert energy from data collection to analysis.

I get a little frustrated when the high-falutin’ journalists look at EveryBlock and say ‘How is this journalism? Why do you think this is replacing newspapers?’ Well, this isn’t intended to replace journalism at all, if anything it’ll help you find trends going on in the world.

There’s also an open question as to which social institutions can best organize and curate these sources of information. Governments? Newspapers? Libraries? Self-organizing groups of citizens? I’m really curious to see how it plays out.

Where can I subscribe to a running-shoe-replacement service?

A few years back I realized that my knees and ankles were hurting because I’d put too many miles on my running shoes. No permanent injuries resulted, but a friend who outran his shoes wasn’t so lucky, and he’s got back problems for life.

This is a business opportunity. If you’re a runner, spending $100 every six (or even three) months is infinitely preferable to injury. You’d think that shoe sellers would make it easy to do that, but they don’t. I’d happily authorize regular replacements, but nobody’s ever offered me that option.

Partly I guess this is a failure of service-oriented thinking. My local seller thinks service means taking good care of me when I walk in, and he does. But I think service should also mean helping me manage a lifelong shoe-replacement regimen, and that notion seems not to have sunk in.

Of course planned obsolescence also gets in the way. Once I find a shoe I like, I try to stick with it, but the manufacturers won’t let me. The model I know works well usually isn’t available next time around, so I have to try something different. That’d complicate any kind of subscription service.

I can sort of understand the difference between, say, prescription drugs, which are commodities that I can replace on a subscription basis, and running shoes, which are both fashion items and (supposedly) evolving technologies. But for me, and maybe for a lot of people, what I really want is to regard the running shoe as a commodity I can replace on a subscription basis.

I wonder what else belongs in this category: Products that sellers don’t want to commodify, but that if managed this way would produce recurring revenue and create the opportunity for lifelong service relationships.

A conversation with Valdis Krebs about social network analysis

For this week’s ITConversations show, introduced by special guest introducer Lynne Windley, I got together with Valdis Krebs, who’s been mapping and analyzing social networks since Mark Zuckerberg was in diapers.

I can’t remember how I first got to know Valdis, but this snippet from a 2004 interview — for an InfoWorld cover story on enterprise social software — gives you a sense of what he does and how he thinks:

IW: Social network analysis can reveal that highly connected people are more valuable than the org chart or salary plan suggests. Is this becoming a factor?

VK: Yes. I did a project with an investment bank, and they took into account who was most valuable in getting a deal done, and factored that into the bonus. I’ve had execs inside and outside IBM saying, “If this data is true, then I’m not paying the people who bubbled up to the top what they’re worth.”

IW: Does it cut the other way, too?

VK: We wouldn’t take a job that we knew would lead to a resource action.

IW: Resource action?

VK: Layoff.

Now that everybody in Silicon Valley has become an armchair social network analyst with an opinion about the nature and uses of the “social graph” I thought it’d be useful to check in with Valdis for a long-range perspective on current trends. Bottom line: He thinks social networks that you have to explicitly join are artificial and ungraphable. But we agreed that these first-generation online social networks are fostering a culture of self-disclosure, and that they may lead to a second generation of more naturalistic systems: bottom-up, ad-hoc, peer-to-peer.

Code4Lib 2008

I’ve interviewed a couple of people who attended and/or spoke at last year’s Code4Lib conference: Art Rhyno, and Beth Jefferson. Code4Lib brings together IT-oriented librarians, and library-oriented IT folk, to create what seems like a truly unique event. I’m really looking forward to attending Code4Lib 2008 next week in Portland, where, as an adopted member of this strange tribe, I’ll be giving a talk on Thursday morning.

HealthVault protocols will be released under the Open Specification Promise

Back in October I interviewed Sean Nolan, chief architect for HealthVault. Now he’s launched a blog, and in his latest post he writes:

  • Microsoft will make the complete HealthVault XML interface protocol specification public.
  • With this information, developers will be able to reimplement the HealthVault service and run their own versions of the system.
  • Microsoft will irrevocably promise that we will not make patent claims against you for implementing the specification, subject to the terms of the OSP.

Excellent! My take on HealthVault is that it’s doing the right things in the right ways. This announcement confirms that.

Want to bootstrap the data web? Make batch data entry easier for civilians.

People are trying, once again, to kickstart the music scene here in my town. The other day I received two emails, each containing a schedule for a newly-activated local venue. In the past, I’ve advised folks to add this information to Eventful, which in turn feeds my my local aggregator. That hasn’t happened much, and when I sat down and did some of the data entry myself, I could see why. It’s such a drag!

There are really two very different scenarios for managing event data online: one personal, the other public. On the personal front, using services like Evite or Windows Live Events, you’re doing a single event: a meeting, a birthday party. It’s OK to fill in a form field by field.

But for public events, venue operators will typically want to do batch entry. And when you’ve got a schedule of dozens of events, it’s painful to decompose everything into fields and pump them into forms.

Here’s a piece of one of the schedules that was emailed to me:

March 15, 2008 (Saturday) Chris Fitz Band
March 20, 2008 (Thursday) Blues Jam w/ Otis Doncaster
March 22, 2008 (Saturday) Groove Theory

It was quick and easy for the author of that email to write out the schedule in that way. But it was really slow and difficult for me to input the same information to Eventful. Even though venue operators are highly motivated to do it, I can see why they often don’t.

So here’s how I speeded things up. I started with a template for a URL that invokes the Eventful API:

http://api.evdb.com/rest/events/new?app_key=XX&venue_id=XX&title=XX&start_time=2008-XX-XX+20:30

Then I made a bunch of copies, and tweaked them like so:

…title=Chris+Fitz&start_time=2008-03-15+20:30
…title=Otis+Doncaster&start_time=2008-03-20+20:30
…title=Groove+Theory&start_time=2008-03-22+20:30

Because all the events start at 8:30, I only need to adjust the title, month, and day for each record. It’s not only way quicker and easier to enter data this way, it’s also quicker and easier to check and correct. When I was done I put the email into one window, the new file into another, and compared. Corrections here are way easier than corrections that require you to navigate to an online database record and edit it in a form.

Finally I inserted the curl command in front of each record, yielding a script that invokes the set of URLs:

curl http://api.evdb.com/ … title=Chris+Fitz&start_time=2008-03-15+20:30
curl http://api.evdb.com/ … title=Otis+Doncaster&start_time=2008-03-20+20:30
curl http://api.evdb.com/ … title=Groove+Theory&start_time=2008-03-22+20:30

I saved this script as eventful.cmd, ran it on the Windows command line, and produced this result.

Now clearly this method is too geeky for a typical venue operator. But an online service like Eventful could smooth out the rough edges. I can easily imagine an unstructured input form that includes a template like the one I’ve shown here, invites people to copy and tweak it, and runs a batch insertion. It would need to let people preview the results before committing them, but that’s doable.

It seems to me that a lot of information systems expect civilians to do per-item data entry, but not batch entry. For that, they provide APIs for geeks to use. But as we see here, these two styles of data entry aren’t necessarily very far apart. And by applying a bit of Wiki-like inferencing to a more English-like script, they could be brought even closer.

The friction of data entry remains the single largest obstacle to bootstrapping the data web. Efforts to overcome that friction, and reduce the distance between what civilians can do with forms and what geeks can do with scripts, could make a huge difference.

Overcoming data friction

This headline from Adrian Holovaty’s blog speaks volumes about the state of online data in 2008: EveryBlock hiring a Python screen-scraping expert. The recently-launched EveryBlock, a generalization of ChicagoCrime.org, extends that model to other cities and to a broader range of data types. I interviewed Adrian this week for an upcoming ITConversations show, and he confirmed that while some structured data sources are available from the first three EveryBlock cities — Chicago, San Francisco, and New York — the bulk of the data comes from scraping web pages.

One day soon, the person who lands that job will find himself or herself having this converation at a cocktail party:

Friend: So, what do you do in this new job?

Screen Scraper: I write software to extract data from websites.

F: Where does the data come from?

S: It’s in a database. The website’s software reads the database and turns it into web pages.

F: So somebody got paid to write software to turn the database into web pages, and now you’re getting paid to write software that turns those web pages back into a database?

S: Yeah, basically.

F: So if they just gave you the database you’d be out of a job?

S: No. I’d have a much more interesting job. I’d be able to spend more time finding useful patterns in the data, and writing software to enable other people to find useful patterns in the data.

The irony is that I’d be great at that job. For me, web screen-scraping provides the kind of challenge that other people get from, say, solving crossword puzzles. But it’s not the highest and best use of anyone’s time.

Data friction can be intentional or not. When it’s intentional, you might have to file a FOIA request to get it. But in a lot of cases, it’s unintentional. The data is public, and intended to be widely seen and used, but isn’t readily reusable.

Consider the following two restaurant inspection records for Bully’s Deli in New York:

1. in the NYC Department of Health website

2. in EveryBlock

It’s the same data, from the same source, but EveryBlock makes better use of it. In the NYC website, you can search by ZIP code and number of violations. In EveryBlock you can search more powerfully, and you can ask and answer questions that matter to you. Maybe you care about shellfish. Have any Manhattan restaurants been cited recently for use of unapproved shellfish? Yes: five since January 21.

What EveryBlock is doing is completely aligned with the interests of the NYC Department of Health. Tax dollars are paying for those restaurant inspections. The information is published in order to make New York a safer and healthier place. It’s great to have this data available in any form, and it’s great to see EveryBlock adding value to it.

Now it’s time to grease the wheels.

Here’s one way that can happen. An enlightened city government can decide to publish this kind of data in a resuable way. I’ve written extensively about Washington DC’s groundbreaking DCStat program which does exactly that. I can’t wait to see what happens when EveryBlock goes to Washington.

But city governments shouldn’t have to go out of their way to provide web-facing data services and feeds. Databases should natively support them. That’s the idea behind Astoria (ADO.NET Services), which is discussed in this interview with Pablo Castro. If the NYC Department of Health had that kind of access layer sitting on top of its database, it wouldn’t put EveryBlock’s screen-scraper out of a job, it would just make that job a whole lot more interesting and effective.

A conversation with Joel Selanikio about cellphones and SMS in developing countries

For this week’s ITConversations show I interviewed Joel Selanikio — a pediatrician, former CDC epidemiologist, and co-founder of DataDyne, a non-profit consultancy dedicated to improving the quantity and quality of public health data. DataDyne’s EpiSurveyor is:

…a free, open source tool enabling anyone to very easily create a handheld data entry form, collect data on a mobile device, and then transfer the data back to a desktop or laptop for analysis.

I’ve actually interviewed Joel once before, but an audio glitch torpedoed the podcast. I did, however, rescue chunks of that interview which I published as a transcript on my blog.

The launching point for this interview was an article Joel published, at the BBC News site, entitled The invisible computer revolution. Joel wrote:

The question we should be asking ourselves, then, is not “how can we buy, and support, and supply electricity for, a laptop for every schoolteacher” (much less every schoolchild), but rather “what mobile software can we write that would really add value for a schoolteacher (or student, or health worker, or businessperson) and that could run on the computer they already have in their pocket?”

Joel’s point, which was also a central theme of my conversation with Ken Banks, is that SMS is the only pervasive data network in places like sub-Saharan Africa. It can, and should, be pressed into service in ways that don’t occur to those of us swimming in an ocean of high-speed Internet connectivity.

You wouldn’t think that 140-character messages would be a useful way to deliver, say, medical information — at least, I wouldn’t. But then, even for those of us with bandwidth to burn, Twitter is demonstrating all kinds of unexpected uses for SMS.

A publishing system optimized to deliver documents to SMS readers wouldn’t be of interest to those of us who can easily browse the web. But it would be a big deal to billions who can’t.

Popfly and Pipes

On Sunday, in a New York Times story about Popfly, John Markoff wrote:

Because the company chose to design Popfly using a Microsoft Web graphics and animation technology called Silverlight, it will be treated with suspicion by an Internet universe that is increasingly committed to open standards.

Disclaimer: I work for Microsoft, and John Montgomery, who leads the Popfly project, has been a friend since our days together at BYTE. That said, I think this overstates Popfly’s relationship to Silverlight. Although the Popfly designer runs in Silverlight, mashups created in Popfly don’t require it. Most are just made of plain old HTML and JavaScript.

Elsewhere in the article, this quote from Tim O’Reilly appears:

Popfly shows me that Microsoft still thinks this is all about software, rather than about accumulating data via network effects, which to me is the core of Web 2.0. They are using Popfly to push Silverlight, rather than really trying to get into the mashup game.

Silverlight, as I’ve said, isn’t Popfly’s focus. I do agree that Popfly doesn’t operate in the cloud in the same way as, say, Yahoo! Pipes. While the article doesn’t mention Pipes, I often hear Popfly and Pipes mentioned in the same context. Both are mashup creators, but they are architecturally very different — in complementary ways. Because Pipes is a great example of data-oriented network effects, and because I’ve sometimes confused myself about the differences between Popfly and Pipes, it’s helpful to spell them out.

Mashup engine

Popfly’s mashup engine is a hybrid. There’s a service running in the cloud, but your browser can do a lot of work too.

Pipes’ mashup engine lives entirely in the cloud.

Sharing and discovery

Both systems provide a cloud-based environment for sharing and discovering mashups, and the components of mashups.

Inputs

Both systems can mash up data flowing from a variety of services on the web, including those that produce RSS feeds and other kinds of XML outputs.

Outputs

Popfly ends at the glass. The output is an HTML/JavaScript page or widget that renders in your browser. Although the components used to produce that output live in the cloud, the final result ends in your browser and isn’t available for downstream processing in the cloud.

Pipes can keep going. The output is a data feed that may or may not drive a browser-based display. But even when it does, the output is still available for downstream processing in the cloud — for example, as RSS.

Programming

In Popfly, you do basic stuff with a special-purpose visual programming language that runs in the cloud. You do advanced stuff with JavaScript running in the browser.

In Pipes, you only use a special-purpose visual programming language that runs in the cloud.

It gets confusing, even to me, because you can sometimes use both systems to achieve the same result. Consider this FluxnetTowers mashup in Popfly, which maps the locations of a worldwide network of C02 flux sensors. I just now made a simplified version in Pipes. I’m sure it’s possible to reproduce the annotations shown in the Popfly version. But from my perspective it’d be harder, because Pipes lacks the general-purpose programming available in Popfly thanks to JavaScript.

Suppose you wanted to include that same tower data in a widget on your WordPress blog, though. Here, Pipes would be the choice. WordPress lives in the cloud, and so does Pipes, so you can make Pipes produce a feed that WordPress consumes. But you couldn’t use Popfly in this scenario because a cloud-based service like WordPress can’t access the output of Popfly’s browser-based mashup engine.

Pipes likes to aggregate, transform, and filter data feeds within the cloud, and can produce a few kinds of renderings in your browser. Popfly likes to aggregate, transform, and filter data feeds from the cloud, and can produce arbitrary renderings in your browser. They’re complementary because Popfly can consume and render data feeds coming from Pipes.

We are all watchers now

Reacting to a Washington Post story on crime in Second Life, Gardner Campbell is troubled by calls for increased surveillance in virtual worlds. But while the notion of being watched by the authorities is as creepy in cyberspace as it is in the real world, we pay less attention to another kind of surveillance. Whether I am piloting my avatar through Second Life, or walking around in my hometown, I am myself a watcher who can, increasingly, record what I see. Whether the authorities surveil or not, we’re doing it to one another.

The funniest screencast I ever made was this snarky 3-minute video report on an IBM press conference I attended in Second Life. It’s a side-splitter, really, you should watch it, and yet it makes me slightly uncomfortable. Anyone in Second Life can, at any time, switch on a virtual movie camera and record everything that’s happening. And there’s no indication of that, nobody sees a camera.

As a teenager, I loved taking candid photos with my dad’s 35mm Exacta. At one point he told me you can get a side-looking lens so people won’t know they’re being photographed. At that point I started to think about the aboriginal notion that a photograph can steal a bit your soul. I’ve been conflicted about candid photography ever since.

Last week I was in the Alewife station on Boston’s Red Line, and saw something I’ve always been curious about. The escalator was completely disassembled for repair. Here’s what the steps look like:

And here’s a worker replacing the rollers on the giant bicycle chain that drives the thing:

As I was taking this shot, one of the workers joked about how I might be a spy for the MBTA, checking up on their work. He was mostly, but I think not entirely, kidding. It was a slightly uncomfortable moment.

Collectively, all of us now wield immense powers of surveillance. Whether the subjects of that surveillance are avatars or real people is beside the point. It isn’t necessarily the authorities who are doing the surveillance. We are doing it to one another. It happens every time somebody is tagged in a photo on Facebook or Flickr. It gets easier all the time.

Is this a good thing or bad thing? A bit of both, I think, hence my inner conflict, and my eternal fascination with David Brin’s The Transparent Society. Who will watch the watchers? The question becomes very different when we are all watchers, recorders, and publishers.

Mythbusting the ‘Google generation’ report

Hugh McGuire recently pointed to a New Scientist blog entry that begins:

A bunch of sources are reporting on a University College London study into how people born after the arrival of the internet – sometimes dubbed the Google generation – handle information. The top line is, they’re not very good at it.

The link points to a press release, entitled Pioneering research shows ‘Google Generation’ is a myth, which summarizes a 35-page report in PDF format. That report in turn summarizes a whole series of “work packages” (more PDF files) identified as the full project documentation.

Let’s trace one of the assertions made in the report, as retransmitted by Information Week:

Also, it’s not true that young people pick up computer skills by trial-and-error. “The popular view that Google Generation teenagers are twiddling away on a new device while their parents are still reading the manual is a complete reversal of reality,” researchers said.

Fascinating. I’d like to know more. How did the researchers arrive at this conclusion? Here’s the piece of the report summary that Information Week sourced:

They pick up computer skills by trial-and-error

Our verdict: This is a complete myth. The popular view that Google generation teenagers are twiddling away on a new device while their parents are still reading the manual is a complete reversal of reality, as Ofcom survey(22) findings confirm.

Ofcom? There’s no link, but footnote 22 says Ibid, referring to footnote 21, which says: Communications Market Report: Converging Communications Markets. Ofcom, August 2007. No link.

Maybe the “work packages” say more about this? In package 2 I found this:

The source? Ofcom (2006). No link. Unclear what the superscript 6 means, as the references in this report are not numbered, but they do mention:

Ofcom (2007) Communications Market Report: Converging Communications Markets. Research Document. London, UK: Office for Communications

Ofcom (2006). The Consumer Experience. London, UK: Office for Communications

So I searched for Ofcom (2006), The Consumer Experience, and found, you guessed it, another PDF, the relevant part of which appears to be section 2.4.2: Profile of those who experience difficulties when using technology. But nothing I can find there, or elsewhere in this report, says anything about who is or isn’t likely to learn about technology by reading the manual. And nothing in Ofcom(2007) either.

At this point I have to stop and remind myself what I was looking for in the first place: Evidence that it is a myth that kids learn by doing, and adults by reading the manual. All I have found is a flock of PDF files that mention one another obliquely, and in ways I cannot even correlate. No links. No data.

Now, the message of this highly-touted “Google generation” report, as refracted through the lens of Information Week, is:

Information literacy has not improved with the widening access to technology. Instead, the speed of Web searching means little time is spent evaluating information for relevance, accuracy, or authority.

And that may well be true. But do you see the irony here? The study making this claim was constructed and published in a way that resists all efforts to evaluate its relevance, accuracy, or authority. Which hardly matters, since none of the reporting about the study seems to have made any such effort.

Pioneering research shows ‘Google Generation’ is a myth? So far as I can see, that report says more about the researchers who wrote it, and about the reporters who reacted to it, than it says about any real or imaginary trends.

The anxiety (and celebration) of influence

Larry Lessig’s video in support of Barack Obama is making the rounds in the blogosphere. Scanning the transcript I found a comment entitled Andrew Sullivan which reads:

Consider this hypothetical. It’s November 2008. A young Pakistani Muslim is watching television and sees that this man—Barack Hussein Obama—is the new face of America. In one simple image, America’s soft power has been ratcheted up not a notch, but a logarithm. A brown-skinned man whose father was an African, who grew up in Indonesia and Hawaii, who attended a majority-Muslim school as a boy, is now the alleged enemy. If you wanted the crudest but most effective weapon against the demonization of America that fuels Islamist ideology, Obama’s face gets close. It proves them wrong about what America is in ways no words can.

I’ve read that paragraph before. But not in the Lessig transcript. It comes from this Andrew Sullivan article in The Atlantic.

Why append it to the Lessig transcript? I think the anonymous commenter — who, however, chooses to identify himself or herself with the law firm Latham and Watkins — is drawing attention to the similarity between that paragraph and this one which does appear in the Lessig transcript:

So I want you to shut your eyes and imagine what it will seem like to a young man in Iraq or in Iran, who wakes up on January 21st, 2009, and sees the picture of this man as the president of the United States. A man who opposed the war at the beginning, a man who worked his way up from almost nothing, a man who came from a mother and a father of mixed cultures and mixed societies, who came from a broken home to overcome all of that to become the leader in his class, at the Harvard Law Review, and an extraordinary success as a politician. How can they see us when they see us as having chosen this man as our president?

Was Lessig’s paragraph influenced by Sullivan’s, which it’s reasonable to suppose he has read? My guess is that it was. If so, was the influence conscious or unconscious? My guess: unconscious.

This reminded me of Malcolm Gladwell’s 2004 New Yorker article on plagiarism, Something Borrowed, in which he recounts how one of his own New Yorker articles was pretty blatantly plagiarized by Bryony Lavery, the author of a play called Frozen. The incident prompts him to reflect on the nature of influence, and he muses:

When I read the original reviews of “Frozen,” I noticed that time and again critics would use, without attribution, some version of the sentence “The difference between a crime of evil and a crime of illness is the difference between a sin and a symptom.” That’s my phrase, of course. I wrote it. Lavery borrowed it from me, and now the critics were borrowing it from her. The plagiarist was being plagiarized. In this case, there is no “art” defense: nothing new was being done with that line. And this was not “news.” Yet do I really own “sins and symptoms”? There is a quote by Gandhi, it turns out, using the same two words, and I’m sure that if I were to plow through the body of English literature I would find the path littered with crimes of evil and crimes of illness.

Now here’s where it gets really twisty. In Something Borrowed, Gladwell refers to Lessig:

Creative property, Lessig reminds us, has many lives — the newspaper arrives at our door, it becomes part of the archive of human knowledge, then it wraps fish. And, by the time ideas pass into their third and fourth lives, we lose track of where they came from, and we lose control of where they are going.

See also several of Gladwell’s blog entries about a more recent case in which:

Harvard sophomore Kaavya Viswanathan plagiarizes a series of passages from Megan McCafferty’s teen novels “Sloppy Seconds” and “Second Helpings” for her debut novel: “How Opal Mehta Got Kissed, Got Wild, and Got a Life.”

On his blog, Gladwell initially makes the same sort of defense for Viswanathan that he made for Lavery in the New Yorker piece. Then his readers call him out, and he winds up agreeing with them that it was a different case.

But I digress. The real point here is that nowadays, even as ideas pass into their third and fourth lives, we don’t necessarily lose track of where they came from. A couple of years ago, Tim O’Reilly wrote a blog post entitled Act your way into a new way of thinking, which he said was “a fabulous quote from Richard Pascale’s book Delivering Results.” Tim added this postscript:

P.S. Very cool to be able to find the original source for the first quote via Google book search. As it came to me, it was simply labeled “Richard Pascale, Stanford Business School.”

At the time, I commented:

> Very cool to be able to find the original source”

And, to track the meme! From this it does appear Pascale is the original source:

http://books.google.com/books?q=%22act+our+way+into+a+new+way+of+thinking%22

Fascinating to see who cites him and who doesn’t.

Weirdly, I only just now noticed that both Tim and I wrongly attributed Delivering Results to Richard Pascale. In fact, the author is David Ulrich, not Richard Pascale.

But that doesn’t affect my point. Whether or not Lessig’s paragraph was influenced by Sullivan’s, the ways in which we influence one another are becoming more transparent, more traceable.

To complete this twisty excursion, I was looking up The Anxiety of Influence, by Harold Bloom, and found this eponymous blog posting from Lorcan Dempsey, in which he was surprised to find Bloom so prominent in the original WorldCat Identities tag cloud, and in which he cites a Tim O’Reilly post expressing similar surprise:

Who knew that as far as libraries are concerned, Harold Bloom is right up there with Brahms and Chopin. That’s one influential literary critic!

OK, I’ve reached my connection limit for now. But the fact that all these connections are traceable is a wonderful thing.

A conversation with Phil Windley about online reputation

For this week’s ITConversations podcast I asked Phil Windley to review the work he’s done — with several groups of his students — to develop a software framework for managing online reputation. Phil explains:

Reputation is a very personal thing. The way you think about a person we both know in common, and the way I think about that person, is different. We talk about Joe having a reputation, but in fact, Joe doesn’t have a reputation, every single person has a different feeling and way of thinking about Joe. Reputation is your story about me. I don’t control my reputation, I only control some factors that you might or might not use to calculate it. I don’t control all of them, and you may take factors into account that I have no control over.

If we’re going to bring that social system, developed over thousands of years, to the Net, we need to mimic that opportunity as closely as possible. So the idea of our rules language was to allow you to create your own algorithms abouthow you determine the reputation of something or someone, and to allow me to create a different one.

Of course, if my calculations about Joe and your calculations about Joe refer to the same public, or omnidirectional, digital identity, then they can be merged. And by referring to my digital identity and yours, somebody else will be able to aggregate our calculations about Joe, and propagate them transitively.

That scenario entails both risks and benefits. At the moment, it’s easier for most people to imagine the risks. Phil says:

Offline we all give up information about ourselves all the time, trading privacy for convenience, and we have a pretty good feel for how that information is compartmentalized — not always, and there are obvious problems — but if I tell somebody in one business my name, that won’t mean the business down the street finds out about my transactions. Online, all of those intuitions have been switched around, and we’ve come to believe that giving up as little information as possible is the right thing.

The phrase “giving up information” has a negative connatation. We haven’t yet established norms for “declaring information” in a positive sense, and we have no intuitions about the benefits that doing so might yield. But we may find that by declaring information about ourselves, we can help make the stories that are being told about us — whether we participate in them or not — truer and more useful.

Why oil heat? Because it’s local!


Well, that was a nice change of pace. Back in the land of the “wintry mix” — rain/sleet/snow — the first thing that caught my eye was a full-page ad in the local paper promoting the benefits of oil heat. Sponsored by the Oil Heat Council of New Hampshire, and featuring local icon Fritz Weatherbee wearing his trademark bowtie, the ad is a mosaic of smiling faces with captions like:

“New technology reduced my oil consumption by 25%.”

“Oil heat is safe.”

“It’s local. My oil heat dealer is also my neighbor.”

“Budgeting programs make oil heat affordable.”

“Oil heat made it easier to sell our home.”

I guess the emerging alternatives are being taken seriously. You’ve gotta love the rhetoric. Oil is local? Should’ve put that on the top ten list.

Undisclosed location

For the next 8 or so days I will be at an undisclosed location. The following items will be absent from the scene:

  • ice
  • snow
  • the internet

The following items will be present:

  • sun
  • sea
  • books
  • music
  • rum

A conversation with Stefano Mazzocchi about Cocoon and SIMILE

I’ve written a lot about MIT’s Project SIMILE since I visited the team back in December. In this week’s Interviews with Innovators I talk with Stefano Mazzocchi, the creator of Apache Cocoon, about his work on the SIMILE project. Early in the interview I asked whether he thought he was more well-known for Cocoon than for SIMILE, and he said:

Different crowds know me for different activities. And rarely do these people talk together. Well, it’s happening more now, but when I started I was one of the few people who could talk to the open source, industrial, XML-ish crowd, and to the academic, RDF, AI-ish crowd. I was kind of in the middle, and both sides didn’t really understand what I was doing with the other crowd.

I can relate to that! I seem to spend a lot of time between different worlds, trying to connect them.

It was a pleasure to finally meet Stefano for the first time in person last month, after years of correspondence and cross-blog chatter, and then also to record this interview about an approach to the semantic web that feels to me like light at the end of the tunnel.