2008


People are trying, once again, to kickstart the music scene here in my town. The other day I received two emails, each containing a schedule for a newly-activated local venue. In the past, I’ve advised folks to add this information to Eventful, which in turn feeds my my local aggregator. That hasn’t happened much, and when I sat down and did some of the data entry myself, I could see why. It’s such a drag!

There are really two very different scenarios for managing event data online: one personal, the other public. On the personal front, using services like Evite or Windows Live Events, you’re doing a single event: a meeting, a birthday party. It’s OK to fill in a form field by field.

But for public events, venue operators will typically want to do batch entry. And when you’ve got a schedule of dozens of events, it’s painful to decompose everything into fields and pump them into forms.

Here’s a piece of one of the schedules that was emailed to me:

March 15, 2008 (Saturday) Chris Fitz Band
March 20, 2008 (Thursday) Blues Jam w/ Otis Doncaster
March 22, 2008 (Saturday) Groove Theory

It was quick and easy for the author of that email to write out the schedule in that way. But it was really slow and difficult for me to input the same information to Eventful. Even though venue operators are highly motivated to do it, I can see why they often don’t.

So here’s how I speeded things up. I started with a template for a URL that invokes the Eventful API:

http://api.evdb.com/rest/events/new?app_key=XX&venue_id=XX&title=XX&start_time=2008-XX-XX+20:30

Then I made a bunch of copies, and tweaked them like so:

…title=Chris+Fitz&start_time=2008-03-15+20:30
…title=Otis+Doncaster&start_time=2008-03-20+20:30
…title=Groove+Theory&start_time=2008-03-22+20:30

Because all the events start at 8:30, I only need to adjust the title, month, and day for each record. It’s not only way quicker and easier to enter data this way, it’s also quicker and easier to check and correct. When I was done I put the email into one window, the new file into another, and compared. Corrections here are way easier than corrections that require you to navigate to an online database record and edit it in a form.

Finally I inserted the curl command in front of each record, yielding a script that invokes the set of URLs:

curl http://api.evdb.com/ … title=Chris+Fitz&start_time=2008-03-15+20:30
curl http://api.evdb.com/ … title=Otis+Doncaster&start_time=2008-03-20+20:30
curl http://api.evdb.com/ … title=Groove+Theory&start_time=2008-03-22+20:30

I saved this script as eventful.cmd, ran it on the Windows command line, and produced this result.

Now clearly this method is too geeky for a typical venue operator. But an online service like Eventful could smooth out the rough edges. I can easily imagine an unstructured input form that includes a template like the one I’ve shown here, invites people to copy and tweak it, and runs a batch insertion. It would need to let people preview the results before committing them, but that’s doable.

It seems to me that a lot of information systems expect civilians to do per-item data entry, but not batch entry. For that, they provide APIs for geeks to use. But as we see here, these two styles of data entry aren’t necessarily very far apart. And by applying a bit of Wiki-like inferencing to a more English-like script, they could be brought even closer.

The friction of data entry remains the single largest obstacle to bootstrapping the data web. Efforts to overcome that friction, and reduce the distance between what civilians can do with forms and what geeks can do with scripts, could make a huge difference.

This headline from Adrian Holovaty’s blog speaks volumes about the state of online data in 2008: EveryBlock hiring a Python screen-scraping expert. The recently-launched EveryBlock, a generalization of ChicagoCrime.org, extends that model to other cities and to a broader range of data types. I interviewed Adrian this week for an upcoming ITConversations show, and he confirmed that while some structured data sources are available from the first three EveryBlock cities — Chicago, San Francisco, and New York — the bulk of the data comes from scraping web pages.

One day soon, the person who lands that job will find himself or herself having this converation at a cocktail party:

Friend: So, what do you do in this new job?

Screen Scraper: I write software to extract data from websites.

F: Where does the data come from?

S: It’s in a database. The website’s software reads the database and turns it into web pages.

F: So somebody got paid to write software to turn the database into web pages, and now you’re getting paid to write software that turns those web pages back into a database?

S: Yeah, basically.

F: So if they just gave you the database you’d be out of a job?

S: No. I’d have a much more interesting job. I’d be able to spend more time finding useful patterns in the data, and writing software to enable other people to find useful patterns in the data.

The irony is that I’d be great at that job. For me, web screen-scraping provides the kind of challenge that other people get from, say, solving crossword puzzles. But it’s not the highest and best use of anyone’s time.

Data friction can be intentional or not. When it’s intentional, you might have to file a FOIA request to get it. But in a lot of cases, it’s unintentional. The data is public, and intended to be widely seen and used, but isn’t readily reusable.

Consider the following two restaurant inspection records for Bully’s Deli in New York:

1. in the NYC Department of Health website

2. in EveryBlock

It’s the same data, from the same source, but EveryBlock makes better use of it. In the NYC website, you can search by ZIP code and number of violations. In EveryBlock you can search more powerfully, and you can ask and answer questions that matter to you. Maybe you care about shellfish. Have any Manhattan restaurants been cited recently for use of unapproved shellfish? Yes: five since January 21.

What EveryBlock is doing is completely aligned with the interests of the NYC Department of Health. Tax dollars are paying for those restaurant inspections. The information is published in order to make New York a safer and healthier place. It’s great to have this data available in any form, and it’s great to see EveryBlock adding value to it.

Now it’s time to grease the wheels.

Here’s one way that can happen. An enlightened city government can decide to publish this kind of data in a resuable way. I’ve written extensively about Washington DC’s groundbreaking DCStat program which does exactly that. I can’t wait to see what happens when EveryBlock goes to Washington.

But city governments shouldn’t have to go out of their way to provide web-facing data services and feeds. Databases should natively support them. That’s the idea behind Astoria (ADO.NET Services), which is discussed in this interview with Pablo Castro. If the NYC Department of Health had that kind of access layer sitting on top of its database, it wouldn’t put EveryBlock’s screen-scraper out of a job, it would just make that job a whole lot more interesting and effective.

For this week’s ITConversations show I interviewed Joel Selanikio — a pediatrician, former CDC epidemiologist, and co-founder of DataDyne, a non-profit consultancy dedicated to improving the quantity and quality of public health data. DataDyne’s EpiSurveyor is:

…a free, open source tool enabling anyone to very easily create a handheld data entry form, collect data on a mobile device, and then transfer the data back to a desktop or laptop for analysis.

I’ve actually interviewed Joel once before, but an audio glitch torpedoed the podcast. I did, however, rescue chunks of that interview which I published as a transcript on my blog.

The launching point for this interview was an article Joel published, at the BBC News site, entitled The invisible computer revolution. Joel wrote:

The question we should be asking ourselves, then, is not “how can we buy, and support, and supply electricity for, a laptop for every schoolteacher” (much less every schoolchild), but rather “what mobile software can we write that would really add value for a schoolteacher (or student, or health worker, or businessperson) and that could run on the computer they already have in their pocket?”

Joel’s point, which was also a central theme of my conversation with Ken Banks, is that SMS is the only pervasive data network in places like sub-Saharan Africa. It can, and should, be pressed into service in ways that don’t occur to those of us swimming in an ocean of high-speed Internet connectivity.

You wouldn’t think that 140-character messages would be a useful way to deliver, say, medical information — at least, I wouldn’t. But then, even for those of us with bandwidth to burn, Twitter is demonstrating all kinds of unexpected uses for SMS.

A publishing system optimized to deliver documents to SMS readers wouldn’t be of interest to those of us who can easily browse the web. But it would be a big deal to billions who can’t.

On Sunday, in a New York Times story about Popfly, John Markoff wrote:

Because the company chose to design Popfly using a Microsoft Web graphics and animation technology called Silverlight, it will be treated with suspicion by an Internet universe that is increasingly committed to open standards.

Disclaimer: I work for Microsoft, and John Montgomery, who leads the Popfly project, has been a friend since our days together at BYTE. That said, I think this overstates Popfly’s relationship to Silverlight. Although the Popfly designer runs in Silverlight, mashups created in Popfly don’t require it. Most are just made of plain old HTML and JavaScript.

Elsewhere in the article, this quote from Tim O’Reilly appears:

Popfly shows me that Microsoft still thinks this is all about software, rather than about accumulating data via network effects, which to me is the core of Web 2.0. They are using Popfly to push Silverlight, rather than really trying to get into the mashup game.

Silverlight, as I’ve said, isn’t Popfly’s focus. I do agree that Popfly doesn’t operate in the cloud in the same way as, say, Yahoo! Pipes. While the article doesn’t mention Pipes, I often hear Popfly and Pipes mentioned in the same context. Both are mashup creators, but they are architecturally very different — in complementary ways. Because Pipes is a great example of data-oriented network effects, and because I’ve sometimes confused myself about the differences between Popfly and Pipes, it’s helpful to spell them out.

Mashup engine

Popfly’s mashup engine is a hybrid. There’s a service running in the cloud, but your browser can do a lot of work too.

Pipes’ mashup engine lives entirely in the cloud.

Sharing and discovery

Both systems provide a cloud-based environment for sharing and discovering mashups, and the components of mashups.

Inputs

Both systems can mash up data flowing from a variety of services on the web, including those that produce RSS feeds and other kinds of XML outputs.

Outputs

Popfly ends at the glass. The output is an HTML/JavaScript page or widget that renders in your browser. Although the components used to produce that output live in the cloud, the final result ends in your browser and isn’t available for downstream processing in the cloud.

Pipes can keep going. The output is a data feed that may or may not drive a browser-based display. But even when it does, the output is still available for downstream processing in the cloud — for example, as RSS.

Programming

In Popfly, you do basic stuff with a special-purpose visual programming language that runs in the cloud. You do advanced stuff with JavaScript running in the browser.

In Pipes, you only use a special-purpose visual programming language that runs in the cloud.

It gets confusing, even to me, because you can sometimes use both systems to achieve the same result. Consider this FluxnetTowers mashup in Popfly, which maps the locations of a worldwide network of C02 flux sensors. I just now made a simplified version in Pipes. I’m sure it’s possible to reproduce the annotations shown in the Popfly version. But from my perspective it’d be harder, because Pipes lacks the general-purpose programming available in Popfly thanks to JavaScript.

Suppose you wanted to include that same tower data in a widget on your WordPress blog, though. Here, Pipes would be the choice. WordPress lives in the cloud, and so does Pipes, so you can make Pipes produce a feed that WordPress consumes. But you couldn’t use Popfly in this scenario because a cloud-based service like WordPress can’t access the output of Popfly’s browser-based mashup engine.

Pipes likes to aggregate, transform, and filter data feeds within the cloud, and can produce a few kinds of renderings in your browser. Popfly likes to aggregate, transform, and filter data feeds from the cloud, and can produce arbitrary renderings in your browser. They’re complementary because Popfly can consume and render data feeds coming from Pipes.

Reacting to a Washington Post story on crime in Second Life, Gardner Campbell is troubled by calls for increased surveillance in virtual worlds. But while the notion of being watched by the authorities is as creepy in cyberspace as it is in the real world, we pay less attention to another kind of surveillance. Whether I am piloting my avatar through Second Life, or walking around in my hometown, I am myself a watcher who can, increasingly, record what I see. Whether the authorities surveil or not, we’re doing it to one another.

The funniest screencast I ever made was this snarky 3-minute video report on an IBM press conference I attended in Second Life. It’s a side-splitter, really, you should watch it, and yet it makes me slightly uncomfortable. Anyone in Second Life can, at any time, switch on a virtual movie camera and record everything that’s happening. And there’s no indication of that, nobody sees a camera.

As a teenager, I loved taking candid photos with my dad’s 35mm Exacta. At one point he told me you can get a side-looking lens so people won’t know they’re being photographed. At that point I started to think about the aboriginal notion that a photograph can steal a bit your soul. I’ve been conflicted about candid photography ever since.

Last week I was in the Alewife station on Boston’s Red Line, and saw something I’ve always been curious about. The escalator was completely disassembled for repair. Here’s what the steps look like:

And here’s a worker replacing the rollers on the giant bicycle chain that drives the thing:

As I was taking this shot, one of the workers joked about how I might be a spy for the MBTA, checking up on their work. He was mostly, but I think not entirely, kidding. It was a slightly uncomfortable moment.

Collectively, all of us now wield immense powers of surveillance. Whether the subjects of that surveillance are avatars or real people is beside the point. It isn’t necessarily the authorities who are doing the surveillance. We are doing it to one another. It happens every time somebody is tagged in a photo on Facebook or Flickr. It gets easier all the time.

Is this a good thing or bad thing? A bit of both, I think, hence my inner conflict, and my eternal fascination with David Brin’s The Transparent Society. Who will watch the watchers? The question becomes very different when we are all watchers, recorders, and publishers.

Hugh McGuire recently pointed to a New Scientist blog entry that begins:

A bunch of sources are reporting on a University College London study into how people born after the arrival of the internet – sometimes dubbed the Google generation – handle information. The top line is, they’re not very good at it.

The link points to a press release, entitled Pioneering research shows ‘Google Generation’ is a myth, which summarizes a 35-page report in PDF format. That report in turn summarizes a whole series of “work packages” (more PDF files) identified as the full project documentation.

Let’s trace one of the assertions made in the report, as retransmitted by Information Week:

Also, it’s not true that young people pick up computer skills by trial-and-error. “The popular view that Google Generation teenagers are twiddling away on a new device while their parents are still reading the manual is a complete reversal of reality,” researchers said.

Fascinating. I’d like to know more. How did the researchers arrive at this conclusion? Here’s the piece of the report summary that Information Week sourced:

They pick up computer skills by trial-and-error

Our verdict: This is a complete myth. The popular view that Google generation teenagers are twiddling away on a new device while their parents are still reading the manual is a complete reversal of reality, as Ofcom survey(22) findings confirm.

Ofcom? There’s no link, but footnote 22 says Ibid, referring to footnote 21, which says: Communications Market Report: Converging Communications Markets. Ofcom, August 2007. No link.

Maybe the “work packages” say more about this? In package 2 I found this:

The source? Ofcom (2006). No link. Unclear what the superscript 6 means, as the references in this report are not numbered, but they do mention:

Ofcom (2007) Communications Market Report: Converging Communications Markets. Research Document. London, UK: Office for Communications

Ofcom (2006). The Consumer Experience. London, UK: Office for Communications

So I searched for Ofcom (2006), The Consumer Experience, and found, you guessed it, another PDF, the relevant part of which appears to be section 2.4.2: Profile of those who experience difficulties when using technology. But nothing I can find there, or elsewhere in this report, says anything about who is or isn’t likely to learn about technology by reading the manual. And nothing in Ofcom(2007) either.

At this point I have to stop and remind myself what I was looking for in the first place: Evidence that it is a myth that kids learn by doing, and adults by reading the manual. All I have found is a flock of PDF files that mention one another obliquely, and in ways I cannot even correlate. No links. No data.

Now, the message of this highly-touted “Google generation” report, as refracted through the lens of Information Week, is:

Information literacy has not improved with the widening access to technology. Instead, the speed of Web searching means little time is spent evaluating information for relevance, accuracy, or authority.

And that may well be true. But do you see the irony here? The study making this claim was constructed and published in a way that resists all efforts to evaluate its relevance, accuracy, or authority. Which hardly matters, since none of the reporting about the study seems to have made any such effort.

Pioneering research shows ‘Google Generation’ is a myth? So far as I can see, that report says more about the researchers who wrote it, and about the reporters who reacted to it, than it says about any real or imaginary trends.

Larry Lessig’s video in support of Barack Obama is making the rounds in the blogosphere. Scanning the transcript I found a comment entitled Andrew Sullivan which reads:

Consider this hypothetical. It’s November 2008. A young Pakistani Muslim is watching television and sees that this man—Barack Hussein Obama—is the new face of America. In one simple image, America’s soft power has been ratcheted up not a notch, but a logarithm. A brown-skinned man whose father was an African, who grew up in Indonesia and Hawaii, who attended a majority-Muslim school as a boy, is now the alleged enemy. If you wanted the crudest but most effective weapon against the demonization of America that fuels Islamist ideology, Obama’s face gets close. It proves them wrong about what America is in ways no words can.

I’ve read that paragraph before. But not in the Lessig transcript. It comes from this Andrew Sullivan article in The Atlantic.

Why append it to the Lessig transcript? I think the anonymous commenter — who, however, chooses to identify himself or herself with the law firm Latham and Watkins — is drawing attention to the similarity between that paragraph and this one which does appear in the Lessig transcript:

So I want you to shut your eyes and imagine what it will seem like to a young man in Iraq or in Iran, who wakes up on January 21st, 2009, and sees the picture of this man as the president of the United States. A man who opposed the war at the beginning, a man who worked his way up from almost nothing, a man who came from a mother and a father of mixed cultures and mixed societies, who came from a broken home to overcome all of that to become the leader in his class, at the Harvard Law Review, and an extraordinary success as a politician. How can they see us when they see us as having chosen this man as our president?

Was Lessig’s paragraph influenced by Sullivan’s, which it’s reasonable to suppose he has read? My guess is that it was. If so, was the influence conscious or unconscious? My guess: unconscious.

This reminded me of Malcolm Gladwell’s 2004 New Yorker article on plagiarism, Something Borrowed, in which he recounts how one of his own New Yorker articles was pretty blatantly plagiarized by Bryony Lavery, the author of a play called Frozen. The incident prompts him to reflect on the nature of influence, and he muses:

When I read the original reviews of “Frozen,” I noticed that time and again critics would use, without attribution, some version of the sentence “The difference between a crime of evil and a crime of illness is the difference between a sin and a symptom.” That’s my phrase, of course. I wrote it. Lavery borrowed it from me, and now the critics were borrowing it from her. The plagiarist was being plagiarized. In this case, there is no “art” defense: nothing new was being done with that line. And this was not “news.” Yet do I really own “sins and symptoms”? There is a quote by Gandhi, it turns out, using the same two words, and I’m sure that if I were to plow through the body of English literature I would find the path littered with crimes of evil and crimes of illness.

Now here’s where it gets really twisty. In Something Borrowed, Gladwell refers to Lessig:

Creative property, Lessig reminds us, has many lives — the newspaper arrives at our door, it becomes part of the archive of human knowledge, then it wraps fish. And, by the time ideas pass into their third and fourth lives, we lose track of where they came from, and we lose control of where they are going.

See also several of Gladwell’s blog entries about a more recent case in which:

Harvard sophomore Kaavya Viswanathan plagiarizes a series of passages from Megan McCafferty’s teen novels “Sloppy Seconds” and “Second Helpings” for her debut novel: “How Opal Mehta Got Kissed, Got Wild, and Got a Life.”

On his blog, Gladwell initially makes the same sort of defense for Viswanathan that he made for Lavery in the New Yorker piece. Then his readers call him out, and he winds up agreeing with them that it was a different case.

But I digress. The real point here is that nowadays, even as ideas pass into their third and fourth lives, we don’t necessarily lose track of where they came from. A couple of years ago, Tim O’Reilly wrote a blog post entitled Act your way into a new way of thinking, which he said was “a fabulous quote from Richard Pascale’s book Delivering Results.” Tim added this postscript:

P.S. Very cool to be able to find the original source for the first quote via Google book search. As it came to me, it was simply labeled “Richard Pascale, Stanford Business School.”

At the time, I commented:

> Very cool to be able to find the original source”

And, to track the meme! From this it does appear Pascale is the original source:

http://books.google.com/books?q=%22act+our+way+into+a+new+way+of+thinking%22

Fascinating to see who cites him and who doesn’t.

Weirdly, I only just now noticed that both Tim and I wrongly attributed Delivering Results to Richard Pascale. In fact, the author is David Ulrich, not Richard Pascale.

But that doesn’t affect my point. Whether or not Lessig’s paragraph was influenced by Sullivan’s, the ways in which we influence one another are becoming more transparent, more traceable.

To complete this twisty excursion, I was looking up The Anxiety of Influence, by Harold Bloom, and found this eponymous blog posting from Lorcan Dempsey, in which he was surprised to find Bloom so prominent in the original WorldCat Identities tag cloud, and in which he cites a Tim O’Reilly post expressing similar surprise:

Who knew that as far as libraries are concerned, Harold Bloom is right up there with Brahms and Chopin. That’s one influential literary critic!

OK, I’ve reached my connection limit for now. But the fact that all these connections are traceable is a wonderful thing.

For this week’s ITConversations podcast I asked Phil Windley to review the work he’s done — with several groups of his students — to develop a software framework for managing online reputation. Phil explains:

Reputation is a very personal thing. The way you think about a person we both know in common, and the way I think about that person, is different. We talk about Joe having a reputation, but in fact, Joe doesn’t have a reputation, every single person has a different feeling and way of thinking about Joe. Reputation is your story about me. I don’t control my reputation, I only control some factors that you might or might not use to calculate it. I don’t control all of them, and you may take factors into account that I have no control over.

If we’re going to bring that social system, developed over thousands of years, to the Net, we need to mimic that opportunity as closely as possible. So the idea of our rules language was to allow you to create your own algorithms abouthow you determine the reputation of something or someone, and to allow me to create a different one.

Of course, if my calculations about Joe and your calculations about Joe refer to the same public, or omnidirectional, digital identity, then they can be merged. And by referring to my digital identity and yours, somebody else will be able to aggregate our calculations about Joe, and propagate them transitively.

That scenario entails both risks and benefits. At the moment, it’s easier for most people to imagine the risks. Phil says:

Offline we all give up information about ourselves all the time, trading privacy for convenience, and we have a pretty good feel for how that information is compartmentalized — not always, and there are obvious problems — but if I tell somebody in one business my name, that won’t mean the business down the street finds out about my transactions. Online, all of those intuitions have been switched around, and we’ve come to believe that giving up as little information as possible is the right thing.

The phrase “giving up information” has a negative connatation. We haven’t yet established norms for “declaring information” in a positive sense, and we have no intuitions about the benefits that doing so might yield. But we may find that by declaring information about ourselves, we can help make the stories that are being told about us — whether we participate in them or not — truer and more useful.


Well, that was a nice change of pace. Back in the land of the “wintry mix” — rain/sleet/snow — the first thing that caught my eye was a full-page ad in the local paper promoting the benefits of oil heat. Sponsored by the Oil Heat Council of New Hampshire, and featuring local icon Fritz Weatherbee wearing his trademark bowtie, the ad is a mosaic of smiling faces with captions like:

“New technology reduced my oil consumption by 25%.”

“Oil heat is safe.”

“It’s local. My oil heat dealer is also my neighbor.”

“Budgeting programs make oil heat affordable.”

“Oil heat made it easier to sell our home.”

I guess the emerging alternatives are being taken seriously. You’ve gotta love the rhetoric. Oil is local? Should’ve put that on the top ten list.

For the next 8 or so days I will be at an undisclosed location. The following items will be absent from the scene:

  • ice
  • snow
  • the internet

The following items will be present:

  • sun
  • sea
  • books
  • music
  • rum

I’ve written a lot about MIT’s Project SIMILE since I visited the team back in December. In this week’s Interviews with Innovators I talk with Stefano Mazzocchi, the creator of Apache Cocoon, about his work on the SIMILE project. Early in the interview I asked whether he thought he was more well-known for Cocoon than for SIMILE, and he said:

Different crowds know me for different activities. And rarely do these people talk together. Well, it’s happening more now, but when I started I was one of the few people who could talk to the open source, industrial, XML-ish crowd, and to the academic, RDF, AI-ish crowd. I was kind of in the middle, and both sides didn’t really understand what I was doing with the other crowd.

I can relate to that! I seem to spend a lot of time between different worlds, trying to connect them.

It was a pleasure to finally meet Stefano for the first time in person last month, after years of correspondence and cross-blog chatter, and then also to record this interview about an approach to the semantic web that feels to me like light at the end of the tunnel.

Investigating the location of the WUMB transmitter, Doc Searls notes that while the Live Maps bird’s eye view is awesome, it’s way too hard to find and share.

Finding: For example, if I plug 42° 15′ 27″N, 71° 01′ 44″W into maps.google.com, I go straight to a real x/y place on a map. Live Maps doesn’t know what to do with it.

That appears to be true. If you know the coordinates of a location, you can find it in Google Maps using any of these formats in the search box:

42.257500, -71.028889
42° 15′ 27″N, 71° 01′ 44″W
+42° 15' 27.00", -71° 1' 44.00"

The Live Maps search box doesn’t like any of these. You could search for Hatherly Road, Quincy MA, and find it that way, but the locator page where Doc probably found the coordinates for the transmitter doesn’t know about that address.

True, most people will search for addresses, not coordinates, but I agree with Doc here, there’s no reason not to also support searching by coordinates.

Sharing: I’d show you the Live Maps views, but there’s no way to link to them. Not that I can find, anyway.

Another fair criticism. The workaround is to click Sharing -> Send in email. This launches a mail client with a new message containing the URL:

http://maps.live.com/default.aspx?v=2&cp=r18mqy92dxp9&style=o&lvl=1&tilt=-90&dir=0&alt=-1000&scene=3327738&encType=1

It turns out that you don’t need the default.aspx, and it seems that the minimal working version of that URL is:

http://maps.live.com/?v=2&cp=r18mqy92dxp9&lvl=1&style=o

(The backstory on the URL syntax is here.)

The cp parameter is the map’s center point, and I’m not sure how Live Maps computed the r18mqy92dxp9 in the above URL, but you can also use lat/lon coordinates in decimal form, so a more human-writeable form of the URL is:

http://maps.live.com/?v=2&cp=42.257500~-71.028889&lvl=1&style=o

So it’s doable. And to Doc’s point about lock-in, I’ve done this whole exercise in Firefox on a Mac, so there’s nothing Windows-specific going on here.

But he’s right. I shouldn’t have to work so hard to find, and link you to, the very cool bird’s eye view of the WUMB transmitter.

In response to this item and its follow-on discussion, Alf Eaton shows how you can, in fact, discover the (open access) scientific commentary surrounding an (open access) scientific article. Outstanding!

Here’s the interactive version of the service. You can feed it an URL, a DOI, or a PubMed id, and it fetches conversations about that item from Postgenomic, PubMed, Connotea, and Scopus.

I took the liberty of converting this service into a bookmarklet which I’ve labeled sc (scientific conversations). It’s the analog to my standard dc bookmarklet (del.icio.us conversations) and bc bookmarklet (bloglines conversations).

WordPress won’t let me post javascript: URLs so I can’t post the installable versions of these bookmarkets, but here they are in textual form:

sc: javascript:location.href=’http://scintilla.nature.com/conversations?uri=’+encodeURIComponent(location.href)

dc: javascript:location.href=’http://del.icio.us/url?v=2&url=’+encodeURIComponent(location.href)

bc: javascript:location.href=’http://bloglines.com/search?q=Bcite:’+encodeURIComponent(location.href)

If you make a new bookmarket, edit its properties, and copy one of these javascript: thingies into the URL or Location box, you’ll be good to go.

So, this is great! Now if I’m visiting a PLoS Medicine article I can just click dc, bc, and sc to assess how both the general-interest and scientific communities are reacting to it.

Thanks Alf!

I was in Montreal on Saturday to give a talk at CUSEC 2008, a great conference that’s organized by Canadian software engineering students (and recent grads) who want to congregate, exchange views, and hear from speakers they think will provide useful insight.

I gave the morning keynote, and Jeff Atwood spoke in the afternoon. Our messages dovetailed in an interesting way. Jeff’s title was: “Is writing more important than programming?” As a wildly successful blogger, the influence of his own writing has eclipsed the influence of his programming. He admits that this result is “not typical,” but argues that every programmer will reap benefits from narrating the work: influence, collaboration, clarity of thought.

My title was “Hacking the Noosphere”. The themes are shared tools and data, social engineering, language, the semantic web, human augmentation, and Doug Engelbart’s vision of the true purpose of information technology.

Although I’m trying to be more extemporaneous these days, I had a lot I wanted to say, and I wanted to say it carefully, so this turned into one of those talks that I wrote out completely. The upside is that I can make it available to read.

When chatter in the mainstream media and in the blogosphere intersects with scientific discourse, I’m always interested in the ways that citations do, or don’t, cross the border between those domains. In 2006, for example, while checking references for a podcast with Steve Burbeck about multicellular computing, I traced a meme about how we humans are really a hybrid of human and bacterial cells. The mainstream vector was a New York Times magazine story on obesity. It got to the blogosophere by way of a Wired News story. But the original Nature Biotechnology article mentioned in the Wired story was linked nowhere that I could find.

A comment from Gordon Mohr on yesterday’s item about Many Eyes prompted a similar analysis. Gordon asks:

…do the Many Eyes founders consider the statistical paradox that when testing large numbers of hypotheses, *most* recognized ’statistically significant’ results may in fact be false?

A good discussion of the issue is here:

http://www.marginalrevolution.com/marginalrevolution/2005/09/why_most_publis.html

To answer Gordon’s question, I don’t know, it didn’t come up in our conversation. But lets look at the conversation surrounding the PloS Medicine article cited in the blog entry to which Gordon points.

The blog entry itself was widely noticed, it has 31 del.icio.us bookmarks. What about the PloS Medicine article cited in this popular blog entry? It has only 6 del.icio.us bookmarks.

This is the URL cited by the marginalrevolution blog:

http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=16060722

It’s not the most canonical form of the article’s URL. A more canonical form would be the base PubMed record:

http://www.ncbi.nlm.nih.gov/pubmed/16060722

That URL has 0 del.icio.us citations. However, now we cross over into the realm of scientific discourse. When you visit that PubMed URL, you’ll discover citations in the PubMed domain:

Comment in:
PLoS Med. 2005 Aug;2(8):e272.
PLoS Med. 2005 Nov;2(11):e361.
PLoS Med. 2005 Nov;2(11):e386; author reply e398.
PLoS Med. 2005 Nov;2(11):e395.
PLoS Med. 2007 Apr;4(4):e168.

There’s another canonical form for the PloS Medicine article, by the way. It has a Digital Oject Identifier (DOI):

http://dx.doi.org/10.1371/journal.pmed.0020124

Interestingly, there is 1 del.icio.us citation for that DOI.

So, what did the PloS Medicine folks have to say about the claim in the cited August 2005 PloS Medicine article? Here’s an April 2007 reaction:

The mathematical proof offered for this in the PLoS Medicine paper shows merely that the more studies published on any subject, the higher the absolute number of false positive (and false negative) studies. It does not show what the papers’ graphs and text claim, viz, that the number of false claims will be a higher proportion of the total number of studies published (i.e., that the positive predictive value of each study decreases with increasing number of studies).

I’m not interested here in the claim and counterclaim. I’m interested in the process of discourse, in citation as the engine of that discourse, in the role that canonical identifiers play in citation, and in the disconnect between scientific and mainstream discourse.

It’s all happening on the web, but it’s happening in isolated ghettoes with few points of actual contact. How could we bring those worlds into closer contact?

Here’s one approach that could help. When the citation engines in the blogosphere find references in blog entries to scientific articles on the web, they could resolve those to their most canonical forms: DOIs, PubMed records. And they could make equivalences among those forms. That way, conversation in the blogosophere about a scientific article, and scientific conversation about the same article, would tend to hang together and would be discoverable in the same contexts.

Why does this matter? Well, the marginalrevolution blog is influential, widely cited in the blogosphere. The entry that cited the PLoS Medicine article was itself widely cited. But the PLoS Medicine reaction to the article is not part of the blog conversation. I had to work really hard to find it, and to include it here.

The conversation-tracking tools used by bloggers should discover scientific discourse related to a scientific article as easily as they discover blog discourse. Conversely, the conversation-tracking tools used by scientists should discover blog discourse as readily as scientific discourse. Public understanding of science would improve, and so would scientific understanding of the public.

For this week’s ITConversations podcast I spoke with Fernanda Viégas and Martin Wattenberg about Many Eyes, a project at the forefront of a new category called social data visualization. I was particularly interested to hear about how civic or political argumentation, which tends to devolve into posturing — especially online — might improve when it’s grounded in shared data.

Martin
I don’t think we’ve reached data analysis utopia, but there are intriguing first steps. We’ve had a couple of solid political arguments happen on the site, where someone will put up a vehemently argumentative piece, saying for example that they believe people on welfare get too much money, and they’ll point to their statistics and charts to support that. In the skirmish that follows, people often get beyond the standard red-state/blue-state divide because it is rooted in the numbers.

Of course this argumentation doesn’t happen only on the Many Eyes site. Because the visualizations are linkable and (now) embeddable, it spills over to blogs as well.

I’m also intrigued by the notion that, as more people spend more time investigating official sources of data, we’ll start to uncover problems with the quality of that data.

Fernanda
We’ve seen that people have found mistakes on official data sets from authoritative sources. And the reason they were able to do that so easily is because visualization will show you something you didn’t have time or patience to discover in a spreadsheet.

This does leave Many Eyes open to the criticism that it invites people who lack statistical expertise to draw fallacious conclusions.

Martin
We’ve certainly run into objections that visualization can be deceptive. People are afraid that visualizations will be created that are inappropriate and misleading. And in fact that’s a well-founded objection in some ways, because every visualization is a simplification of the underlying data. There’s a point of view involved, and people are suspicious of that. But my belief is we have to give people as much power as we can.

Fernanda
Also, even though we created Many Eyes for the lay person, because we felt that this was something that was needed, that there was nothing out there for people to play with in terms of interactive visualization, it seems to be powerful enough to attract scientists too.

I think that kind of cultural mashup will be really good for everyone involved. Fernanda talked about how, at one point, a user put up a visualization of the Alberto Gonzales testimony that highlighted lots of “I don’t recall” kinds of statements. An hour later, another user put up a similar visualization of Bill Clinton’s testimony about Monica Lewinsky.

Fernanda
You could see the same sorts of phrases. The really exciting part, to us, was a couple of things. One, there was a conversation going on through these visualizations. Also, people usually think of informational visualization as this neutral tool, because it is based on data. Part of what people are beginning to understand, hopefully, is that visualizations have a point of view.

It’s been a decade since Tim Berners-Lee wrote Hypertext Style: Cool URIs don’t change, the first contribution to what is still not a very extensive literature on designing namespaces for the web. Recently, when I made the suggestion that a blog engine ought not produce URLs that end with .aspx, I was asked: “Why does it matter?” For me it boils down to two reasons:

  1. Futureproofing
    A blog posting is, in theory, a permanent artifact. You’d like its URL to remain constant. Sometimes, of course, change is unavoidable. URLs aren’t digital object identifiers, and for most web resources, there’s no way to insulate yourself from an ownership change that results in a domain name change. But you don’t want to subject yourself to avoidable change, and file extensions fall into that category. Last year foo.asp, this year foo.aspx, next year something else, the only meaningful part of the name is foo. The technology that produces the name, never mind the version of that technology, is a variable that need not, and should not, be exposed. If links are pointing to foo.asp, and your upgraded blog engine produces foo.aspx, you broke those links. That’s unnecessary and avoidable.

  2. Style
    Names without extensions are cleaner and simpler. Why does that matter? I guess if you think of URLs as constructs generated by machines for machines, then it doesn’t, because machines don’t care about style. But I believe that even when they’re machine-generated, URLs are for people too. We read, cite, and exchange them. Their structure and content conveys meaning in ways that warrant thoughtful analysis. Elements that don’t convey meaning, and that detract from clarity, should be omitted.

The Strunk and White Elements of Style for the literary form that is the web’s namespace hasn’t really been written yet, but Tim Berners-Lee’s 1998 essay belongs in that genre. So does the Richardson and Ruby book RESTful Web Services which, as I noted in my review, recommends that URIs use forward slashes to encode hierarchy (/parent/child), commas to encode ordered siblings (/parent/child1,child2), and semicolons to encode unordered siblings (/parent/red;green). We can, and we should, think and act in principled ways about the web namespaces we create.

I guess I’m extra-sensitive to the .aspx thing now that I work for Microsoft, because I know that to folks outside the Microsoft ecosystem it screams: We don’t get the web. It’s true there are plenty of .php and other extensions floating around on the web. But non-Microsoft-powered sites are far more likely to suppress them than are Microsoft-powered sites, because you have to go out of your way to get IIS and ASP.NET to do that.

Happily, that’s changing. The URL rewriting capability that’s long been standard in Apache, and is integral to modern web frameworks like Rails and Django, is coming to ASP.NET. From Scott Guthrie’s introduction to the ASP.NET MVC Framework:

It includes a very powerful URL mapping component that enables you to build applications with clean URLs. URLs do not need to have extensions within them, and are designed to easily support SEO and REST-friendly naming patterns. For example, I could easily map the /products/edit/4 URL to the “Edit” action of the ProductsController class in my project above, or map the /Blogs/scottgu/10-10-2007/SomeTopic/ URL to a “DisplayPost” action of a BlogEngineController class.

I hope that cool URIs will become the default for Microsoft-powered websites and services. Meanwhile, there are a variety of add-on URL rewriters for IIS that can streamline and normalize web namespaces. I wish they were used more extensively.

My longtime correspondent Raymond Yee, who I finally got to meet when I visited Berkeley last year, is writing a book on remixing data and services. The book mentions my elmcity.info experiment but, when Raymond visited the site the other day, all he saw was the text of the FastCGI script that runs the whole show. It turns out that when BlueHost upgraded Apache, they broke the mechanism that’s used to invoke arbitrary FastCGI scripts like my Django launcher.

It’s fixed now1, but the incident reminds me that I haven’t yet fully developed the line of thinking that I’ve now tagged servicemanagement. What I am increasingly feeling, as I’m sure many others are — and not only geeks, but also and especially civilians — is that it’s becoming impossible to sanely manage and control our growing numbers of service relationships.

That lack of control was the real point of my Verizon story, for example. And my story is tame compared to this one from John Halamka, which begins:

On Thursday, December 20, my FiOS internet/TV service was shut off by Verizon without any notice or warning.

and concludes:

As CIO of Harvard Medical School and CareGroup, I spend millions every year with Verizon and I cannot navigate Verizon Customer Service.

One aspect of the service management console I envision here would be a common view of all the services I depend on, green/orange/red for healthy/sick/dead.

In the case of a website, it’s not enough to know that the box (or webserver) is alive, healthy means delivering the expected page. There are a million ways not to, and so in the past — although not in this case — I’ve used a cache-and-compare method to verify. It would be entirely feasible for a web hoster to wrap a user-friendly interface around that method, so that anybody could easily declare expected outputs, but I haven’t seen this done by a commercial hoster.

Of course every service relationship sets up expectations, so there are all kinds of assertions you might want to make. I received the e-bill by the expected date. My paycheck was deposited. The payment I made was credited to the right account. The package I sent was delivered. The book I returned to the library was received and checked in.

I want a common way to make all these assertions, and to subscribe to notification of assertion failure.

I’d also like my personal service management console to be subscribed to streams of events, from service providers, about the operation of their services. The expiry notice from EZPass, the Apache migration email I may or may not have received from BlueHost, the WordPress upgrade notice that’s only visible when I log in to WordPress — every time I turn around, another of these alerts pops up, but they’re all coming at me in different contexts, and through different channels. It feels scattered and random because it is. But logically there’s just a handful of communication patterns — like event notification, assertion failure notification, and upgrade/renewal confirmation — that could be abstracted into a common interface. As services multiply, and as their management friction increases, the need will become more apparent.


1 Fixed for me, thanks to a specific intervention, but not, I observed and the BlueHost support guy confirmed, for others who wish to map arbitrary FastCGI scripts by declaring handlers in the control panel.

The always-interesting Jeff Jonas wrote recently about outbound record-level accountability, i.e., tracking where sensitive data is sent.

Without outbound record-level accountability … ensuring data currency across information sharing ecosystems can be problematic. The challenge being when a record changes in the originating system, how will one be certain which recipients of the original record need to be notified?

He adds that while such accountability is desirable, “not every mission will warrant the cost.”

I wonder, though, how much of the cost might evaporate if we make the architectural shift from sending data around, like email, to publishing it, like blogging.

I love the phrase data blogging, which Gavin Carr coined in response to some of the articles in my hosted lifebits series. One of the things that falls out naturally, in a syndication-oriented architecture, is the ability to audit who your subscribers are, and which chunks of data they access.

Note also that Jeff’s caveat about “which recipients of the original record need to be notified” implies owner-initiated push. But if the recipient is a subscriber, that update channel is already open and ready for use.

In terms of the value that the syndication pattern can provide, both for inter-personal as well as for cross-organizational communication, I think we’ve hardly scratched the surface.

On Saturday I’ll be in Montreal giving a talk at CUSEC, the Canadian University Software Engineering Conference. It’s an unusual event, organized by and for students. According to the backstory, the goal was:

…to bring the most passionate software engineering students from across Canada together under one roof, to listen and learn from the smartest and the greatest software engineers the world has ever seen.

As another of this year’s speakers, Tim Bray, has noted, we are in excellent company. Of course I’m not really a software engineer, and I don’t even play one on TV, but I’m glad that the planners thought my own unusual perspective would be valuable, and I’m looking forward to hanging out with a crowd of passionate students.

For this week’s ITConversations show I spoke with fellow Keene resident Neil Giarratana, a software entrepeneur whose 12-person company reverses the usual geographic pattern. A number of folks around here, me included, operate as remote outposts of companies located in metropolitan areas. But Neil’s company, Lucidus, is headquartered here, with field offices in larger cities elsewhere.

In an era of accelerating migration to cities, this counter-cyclical pattern fascinates me. As Neil candidly admits, there are tradeoffs. But thanks to ever-improving telecommunications and the evolving decentralization of work, it’s feasible to combine a high-tech career with the lifestyle advantages of our quintessential New England town.

In a tech industry that is obsessively if not pathologically dedicated to the Next Big New Thing, it’s hard to make the case for refining, reinterpreting, and consolidating what we already have. Bill Buxton does so eloquently in a recent BusinessWeek column, The Long Nose of Innovation, which I found by way of Kevin Schofield. You may recall Bill’s name from this introduction to our podcast interview about his book, Sketching User Experiences. In the BusinessWeek column Bill writes:

The heart of the innovation process has to do with prospecting, mining, refining, and goldsmithing. Knowing how and where to look and recognizing gold when you find it is just the start. The path from staking a claim to piling up gold bars is a long and arduous one.

That resonates powerfully with me. I’ve always been a prospecter, miner, refiner, and goldsmith who finds new value in mature technologies like NNTP conferencing, HTTP GET, and screencasting. Bill goes on to say:

Any technology that is going to have significant impact over the next 10 years is already at least 10 years old.

We might quibble. Was the web 10 years old in 1997? Yes and no. But I’ll grant poetic license because I think the statement is mostly true, and I’ve been wrestling with some of the consequences that flow from it.

Here’s one. Advocates for powerful ideas and methods that are long extant but have yet to fully bear fruit may tend to become nostalgic, appear misguided, act bitter, lose focus. These are counterproductive behaviors. So how do you avoid them? How do you stay the course, keep your eye on the ball, move forward, remain excited, and find ways to explore the same old things in new and different ways?

One answer, I think, is to keep engaging with different people in different contexts. Yesterday I was showing and discussing some things that I’ve known for so long, and documented so extensively, that I worried about sounding like a broken record. But in that context it was fresh information, a new perspective. People got excited. And their excitement rekindled my passion.

I finally got around to reading Michael Pollan’s excellent The Omnivore’s Dilemma which traces several different food chains from source to prepared meal. As I’ve mentioned here before, a remarkable follow-on dialog took place beween Pollan and Whole Foods’ CEO John Mackey, in the form of a blog exchange and a joint public appearance. That dialogue, which explores the book’s critique of “industrial” or “big” organic operations, is a great example of how in the blog era a book can sustain a lively follow-on conversation.

It’s a huge book, and there are a number of other conversations that might spring from it. One I’d like to see would focus on the possibility of a more transparent food supply chain. It’s true that there’s much about that supply chain we’d rather not know, but it’s also true there’s much that we simply cannot know. As service-oriented information systems increasingly control the supply chain, that knowledge — of how food is produced, processed, and transported — becomes, at least in principle, more discoverable.

The same applies more broadly to all supply chains. When Jeff Bezos spoke at MIT last year, several different folks asked variations of the same question: Can you expose more information about the production and transportation methods employed by the makers of your products, so we can factor those into our decisions?

Unlike government data, which is nominally ours, most corporate data is not something we’re obviously entitled to. Governments might compel a certain level of supply-chain transparency. Corporations with good stories to tell about ethical/sustainable practices might reveal them voluntarily. One way or another, we may begin to expect that supply chains ought to be more transparent than they are today.

In the case of our food system, making the supply chain more transparent would be a radical innovation — painful, but health-promoting.

When Verizon recently and erroneously canceled the online bill presentment service that I’d signed up for, I told them to just start sending paper bills again. I just couldn’t face the hassle of repeating their signup process.

For me, paper and electronic bills converge on the payment screen of my bank’s online service. So while the e-bills save me typing in amounts, versus clicking on a payment option, there aren’t many amounts to type and it’s really not a big deal.

I chose this method because, again, I couldn’t face the hassle of signing up individually for a bunch of per-biller payment systems. One obvious conclusion is that the long-awaited user-centric identity technologies now emerging — OpenID, CardSpace, and more broadly the identity metasystem — will grease the wheels, eliminate a huge amount of friction, and hugely accelerate e-commerce. If we think it’s big now, we ain’t seen nothing yet.

But beyond the convenience of single sign-on, and of common registration profiles that we can transmit with a click, a deeper issue looms on the horizon. It’s not just the psychic burden of signing up for services that weighs on our minds. Increasingly it’s the psychic burden of being in many service relationships, each of which needs to be managed and monitored individually.

Consider, for example, the problem of renewing those relationships. Just yesterday, I was confronted with three different renewal scenarios involving WordPress, EZPass, and GoDaddy. In each case I had to locate and jump through a differently-shaped hoop. That kind of thing wears you down. It’s never easy enough, your past experience is always too remote to guide you in the present, and if you fail or just forget, the consequences can range from annoying to severe.

What you really want, of course, is a renewal policy. When you set up a new service relationship, you define the policy: Renew automatically, on request, or never. In my case, I’d make all three of those relationships renew automatically. That would mean that WordPress gets to take ten bucks from my PayPal account every year for domain mapping, EZPass gets to refresh the expiration date on my credit card, and GoDaddy gets to charge my credit card for domain renewals.

What would it take to be able to review and manage all of your service policies in one place? Enterprises, for whom the need to do that is much more acute than it currently is for individuals, have concluded that service-oriented architecture is the answer. The much-maligned WS-* bells and whistles, which seem so overblown for simple point-to-point interaction on the web, come into their own in a fabric of cooperating services governed by policy-based intermediaries.

I predict that as individuals find themselves embedded in more and more service relationships, and begin to feel the need to manage those relationships more sanely, one of the current distinctions between the enterprise and the “consumer web” will start to erode. We’ll find that we are all embedded in many service relationships. And we will all benefit from technologies that enable us to flow those relationships through management consoles.

It was great to see my interviews with Beth Kanter and Dick Hardt appear on the ITConversations top 10 list for 2007. Since it’s the listmaking season, I want to make one as well. Not a list of favorites, because there are so many, but instead of conversations that best exemplify the theme of using technology in socially innovative ways.

Last summer I realized that this theme had become really important to me. It also occurred to me that, while there hadn’t been much overlap between ITConversations and its sister channel, Social Innovation Conversations, there should be. Doug Kaye and Phil Windley agreed, and I was delighted when my interview with Ned Gulley became my first crossover show to appear on both channels.

I think that many of my shows, including the interviews with Beth Kanter on working with digital immigrants in non-profit organizations, and with Dick Hardt on user-centric identity, touch on the theme of socially innovative uses of technology. Here’s a rundown of some others, in alphabetical order by last name.

Barbara Aronson: Making medical research literature available online, at low or no cost, to poor countries. (blog)

Ken Banks: Using SMS to create communication networks in Africa and other places ill-served by the Internet. (blog)

Gardner Campbell: Using the tools and methods of Web 2.0 to reimagine higher education. (blog)

Mike Caulfield: Bootstrapping and running a state-level community-based political blog. (blog)

Brian Dear: Enabling performers to measure and respond to demand for personal appearances.

Greg Elin: Extracting, reformulating, and making sense of the operational data of government. (blog)

Beth Jefferson: Federating the online catalogs of public libraries, and pooling the participation of patrons. (blog)

Ned Gulley: Designing problems to be solved by gameplay that teaches advanced skills using an optimal mix of cooperation and competition. (blog)

John Halamka: Modernizing the exchange of health care information, and putting patients in charge of it. (blog)

Timo Hannay: Bringing the tools and methods of Web 2.0 to the scientific world. (blog)

Ed Iacobucci: Creating a decentralized alternative to the hub-and-spoke air travel system. (blog)

Doug Kaye: Helping volunteers capture and publish audio recordings of civic events. (blog)

Matt MacLaurin: Recapturing the joy of creative expression in software, in a game inspired by LOGO and implemented using modern software principles. (blog)

Hugh McGuire: Bootstrapping and running a collective effort to record and publish public-domain audiobooks. (blog)

Simon St. Laurent: Chronicling the civic and political life of a small town. (blog)

Jim Russell: Analyzing the dynamics of the Pittsburgh diaspora. (blog)

Greg Whisenant: Enabling cities and towns to publish crime data online, and imagining the citizen/government collaborations that can flow from that. (blog)

John Willinsky: Advocating open access to academic literature, and reimagining education in the era of Net participation. (blog)

Jeannette Wing: Explaining why the principles of computational thinking will become part of everyone’s educational foundation. (blog)

On a recent flght to Seattle, Microsoft identity expert Vittorio Bertocci wrote:

I want to take some time writing down some hallucinatory (=vision without execution) thoughts about omnidirectional identities. Be warned, this may be just pointless rambling.

It isn’t pointless, not by a longshot, but the term omnidirectional identity needs to be unpacked — and maybe even revised to something like public (versus private) identity, or broadcast (versus narrowcast) identity. I had a long talk with Vittorio last month, for a new interview series I’ll be launching soon, and in the part where we discussed OpenID and CardSpace he discussed omnidirectional and unidirectional identity:

VB: OpenID is actually a kind of omnidirectional identifier, which is something that sooner or later we have to deal with. Whereas cards are metaphors that help me to do things that are unidirectional. Every time I use a card, it’s for a transaction specifically with one relying party.

The same happens with OpenID, but you have the perception that there’s a URI which describes you. This opens the way to future developments which, in my view, we desperately need. What we see happening with Facebook is just a signal that the industry needs to do for omnidirectional identifiers what we are now doing for unidirectional identifiers.

JU: Can you define those terms?

VB: The idea is that your identity, or identity in general, can have different audiences. An omnidirectional identifier is something you use for being recognized by everybody. So if you go to the Verisign website, using HTTPS, their certificate declares their public identity.

Then you have unidirectional identities. So if I land on a website that, for business purposes, asks my age, then I obtain a token specifically for that website. We call this unidirectional. The flow goes straight to that website and nobody else. When you use a card today, or OpenID, you’re in a unidirectional context. You’re transmitting attributes to one specific relying party.

But in the case of OpenID, I have my account, vibro.openid.com, and it’s a URI, it’s my identifier, and it’s omnidirectional in the sense that everybody knows it. While in the case of my cards, there’s nothing that I tell to everybody. So I think OpenID is a good starting point for thinking about an ecology of omnidirectional identity. How do I handle identity that I want projected everywhere, not just to a specific relying party?

Also, the concept of an identity provider — in both CardSpace and OpenID — is for giving you attributes about yourself. I go on a website, I want to buy wine, I am the one who is asking the identity provider to certify me. While in the world of social networks, the requester of an identity may be somebody other than me. If somebody is looking at my profile, it’s not me. But the request is still for identify information about me. This is an area that needs thought. As an industry we did an excellent job with unidirectional identity, and the ecosystem for both CardSpace and OpenID is vital. But we haven’t yet found the laws for omnidirectional identity. When we do, things like Facebook Beacon won’t happen. We need to extend the conversation to include omnidirectional identifiers for users. A website has a public identity. But at this moment, a user’s public identity is an imagined phenomenon. You search for yourself and find traces of your identity on the web, or maybe the identity of somebody who has your same name.

JU: Or someone who said something about you. Made a claim about you, in effect.

VB: Exactly.

I’ve long projected a public identity omnidirectionally, so I’ve had a long time to consider this issue. A decade ago, when I realized the asymmetry of digital certificates — the secure website identifies itself to you, but not vice versa — I began using, and advocating the use of, client digital certificates. I used them to sign my emails, and would have used them to sign my postings to the Net if there had been any kind of ecosystem in place to recognize and honor those assertions of identity. There wasn’t, and there still isn’t. Meanwhile, as Vittorio notes, we’ve done a good job of first thinking through, and then implementing, the unidirectional identity scenarios that we need for e-commerce.

I realize now that even blogging, as big a phenomenon as it has become, wasn’t enough to motivate serious thought about the kind of public identity projection that I’ve always understood blogging to be. But I think Vittorio is right. The social networks are a much bigger phenomenon, and they’re acquainting many more people with the notion of public identity projection. Perhaps now the need for a system that enables people to project and manage their own public identities — a need that I was never able to articulate convincingly before — will simply become apparent.

« Previous Page