November 2009


In his luminous essay Information obesity, Ned Gulley illustrates the paradox of choice:

I’m reading about the Mohawk Trail, where the Cold River crashes noisily down the granitic glacier-fractured hillside. Where whispering understory birches are sheltered by towering firs. Now my mouth is watering. I have to go. I am referred to ReserveAmerica, a well-built web site that manages thousands of parks nationwide, and — DAMN! Mohawk Trail State Forest is booked solid. I start researching other nearby campgrounds, and now I’m sucked into the game. Unfortunately, ReserveAmerica lets you pick your campsite from an interactive map, and my book tells you which sites are the very best at each campground. Just when you start to salivate about the perfect spot, your dream is dashed by some early bird camper who’s beaten you to the reservation. You can cycle through this process for hours.

I borrow the phrase paradox of choice from Barry Schwartz, who argues in a compelling TED talk that as we broaden our options in all areas, we ratchet up our expectations about how good those options will be. The result is disappointment.

Less is more — except when it isn’t. My counterexample is a recent quest of mine for a particular kind of double-stick tape I needed for an interior storm window project. Key criteria included width (roughly 5/8″) and type of adhesion (plastic to wood). Web search yielded a bewildering array of choices, from various sources, but no way to filter by my criteria. This isn’t some idle consumer whim. I’m trying to save energy in the most effective way I can. I want to see as many qualifying choices as possible. But I can’t.

In Restructuring expert attention to revive the lost art of personal customer service I described one great soluion to this problem: Kevin, the resident expert at FindTape.com, with whom I discussed SCF-01, DC-4420LB, and eventually settled on 3M-4905.

When there’s a Kevin available, he’ll be my first choice. But there won’t always be a Kevin. The answer in that case is not to artificially constrain my choices. That already happens because web search doesn’t enable me to state my criteria. Instead I want to search more effectively. To do that — as noted by several comments on Barry Schwartz’s TED video — we need to overcome filter failure.

This week’s Innovators show, with Martin Hepp, explores how we can create better filters. It’s a follow-on to an earlier show with Kingsley Idehen on the topics of RDFa, the GoodRelations ontology, and the idea that we can become the masters of our own search indexes.

The conversation mainly revolves around how to express an offer for goods or services by means of RDFa snippets that use the GoodRelations e-commerce vocabulary, that are generated by a form-based tool, and that rely on the web’s venerable traditions of view source and copy/paste.

But the same vocabulary used to describe offers can also express needs. And here Martin makes a really good observation about the current architecture of web search:

You can only search synchronously. You can’t ask a question and say, ‘Work on this for two weeks, improve your results in the background, and then come back with the best answer.’ But think about the potential if we can increase the amount of computational time for returning results. Currently there is only 400 milliseconds, because this is the average patience of web users. But if you can express what you’re looking for, and save it with a name, then the search engine will have two weeks to produce a good list of results.

I was also intrigued by Martin’s comments on intermediaries and affiliates. In his view, a commerce site like Amazon is not the only possible source of filter-enhancing metadata. Affiliates can play too. A travel service, for example, might supply search engines with enhanced views of Amazon relative to certain places and certain areas of expertise.

The paradox of choice is real, and in many cases we may indeed be happier with less. But when we really need or want more options, we shouldn’t have to prematurely foreclose them. Search could be far more effective, and an approach like the one Martin envisions is the way to make it so.

Ever since Peter Wayner introduced me to the idea of a translucent database I’ve been thinking about the implications of this powerful idea. In a nutshell, the data in a translucent database service is opaque to the operator of the service, and visible only to sets of users who establish trust relationships. My 2002 review of Peter’s book summarizes his babysitter example:

Imagine a web service that enables parents to find available babysitters. A compromise would disastrously reveal vulnerable households where parents are absent and teenage girls are present. Translucency, in this case, means encrypting sensitive data (identities of parents, identities and schedules of babysitters) so that it is hidden even from the database itself, while yet enabling the two parties (parents, babysitters) to rendezvous.

Fast forwarding to 2009, here’s a current headline from InfoWorld: Microsoft adds access controls for SQL Azure online database. The article doesn’t say so, but this is database translucency in action.

The 2009 version of the babysitter example appears at 37:45 in this PDC session, where Dave Campbell and Rahul Auradkur discuss, and also show, a translucent pharmaceutical reagent marketplace. Dave Campbell spells out the scenario:

Pharma companies see reagents as being pre-competitive. They don’t compete at that level, and they’re willing to sell these reagents to one another, as long nobody can see what’s being bought and sold. That’s the controlled trust we need to set up.

The trick is accomplished by means of encryption and careful separation of concerns. Access policies are isolated from data storage, capable of federation, and auditable by trusted intermediaries.

This is exciting new territory. Historically, we’ve always assumed that the operator of an online information system has complete access to the data in that service. Translucency turns that assumption on its head, and leads to entirely new service design patterns. To implement those patterns requires more than just a database in the cloud. You also need a coordinated suite of supporting services for identity, access control, auditing, and more. Azure, as it becomes one provider of such services, will help make translucency a practical reality.

Back in 2007 I talked with Pablo Castro about Astoria, which I described as a way of making data readable and writeable by means of a RESTful interface. The technology has continued to move forward, and I’m now a heavy user of one of its implementations: the Azure table store. Yesterday at PDC we announced the proposed standardization of this approach as OData, which InfoQ nicely summarizes here.

I’ll leave detailed analysis of the proposal, and the inevitable comparisons to Google’s GData, to others who are better qualified. Nowadays I’m mainly a developer building a web service, and from that perspective it’s very clear that wide adoption of something like “ODBC for the cloud” is needed. We have no shortage of APIs, all of which yield XML and/or JSON data, but you have to overcome friction to compose with these APIs.

For example, the elmcity service merges event information from sets of iCalendar feeds and also from three different sources — Eventful, Upcoming, and (recently added) Eventbrite. In each of those three cases, I’ve had to create slightly different versions of the same algorithm:

  • Query for future events
  • Retrieve the count of matching events
  • Page through the matching events
  • Map events into a common data model

Each service uses a slightly different syntax to query for future events. And each reports the count of matching events differently: page_count vs. total_results vs. resultcount. OData would normalize the queries. And because the spec says:

The count value included in the result MUST be enclosed in an <m:count>

it would also normalize the counting of results.

Open data on the web has enormous potential value, but if we have to overcome too much data friction in order to combine it and make sense of it, we will often fail to realize that value. ODBC in its era was a terrific lubricant. I’m hoping that OData, widely implemented in software, services, and mashup environments like the just-announced Dallas, will be another.

My guest for this week’s Innovators show is Gavin Bell, author of Building Social Web Applications. A lot has changed in the decade since I wrote my own book on this topic. One constant, as we discuss in the podcast, is that we still reach for special terminology like computer-supported collaborative work or groupware or social software. That won’t be true forever. Sooner or later we’ll take for granted that all networked information systems augment us collectively as well as individually. Until then, though, it remains appropriate to speak of social web applications as opposed to simply web applications.

Whatever we call this kind of software, it’s a challenge in this era of tech churn to write about it at book length. This effort succeeds by exploring patterns and principles that will endure no matter which technologies prevail. Yes, it’s an O’Reilly technical book, with the traditional animal picture on the cover — in this case, of spiders. But it’s not code-heavy. Gavin Bell aptly compares it to the polar bear book by Peter Morville and Louis Rosenfeld. Both books draw on a wealth of experience gleaned from building and evolving web applications.

For designers, developers, project managers, and online community managers, Building Social Web Applications addresses questions like:

What are the social objects at the core of our application?

How can relationships form around such objects?

Which search, navigation, access, and notification patterns can best support those relationships?

How do we evolve our application as our users gain experience with these object-mediated relationships?

We’ll be thinking about these kinds of questions from now on. Gavin Bell’s excellent book provides a framework in which to do that thinking.

Over the weekend I was poking around in the recipient-reported data at recovery.gov. I filtered the New Hampshire spreadsheet down to items for my town, Keene, and was a bit surprised to find no descriptions in many cases. Here’s the breakdown:

# of awards 25
# of awards with descriptions 05 20%
# of awards without descriptions 20 80%
$ of awards 10,940,770
$ of awards with descriptions 1,260,719 12%
$ of awards without descriptions 9,680,053 88%

In this case, the half-dozen largest awards aren’t described:

award amount funding agency recipient description
EE00161 2,601,788 Sothwestern Community Services Inc
S394A090030 1,471,540 Keene School District
AIP #3-33-SBGP-06-2009 1,298,500 City of Keene
2W-33000209-0 1,129,608 City of Keene
2F-96102301-0 666,379 City of Keene
2F-96102301-0 655,395 City of Keene
0901NHCOS2 600,930 Sothwestern Community Services Inc
2009RKWX0608 459,850 Department of Justice KEENE, CITY OF The COPS Hiring Recovery Program (CHRP) provides funding directly to law enforcement agencies to hire and/or rehire career law enforcement officers in an effort to create and preserve jobs, and to increase their community policing capacity and crime prevention efforts.
NH36S01050109 413,394 Department of Housing and Urban Development KEENE HOUSING AUTHORITY ARRA Capital Fund Grant. Replacement of roofing, siding, and repair of exterior storage sheds on 29 public housing units at a family complex

That got me wondering: Where does the money go? So I built a little app that explores ARRA awards for any city or town: http://elmcity.cloudapp.net/arra. For most places, it seems, the ratio of awards with descriptions to awards without isn’t quite so bad. In the case of Philadelphia, for example, “only” 27% of the dollars awarded ($280 million!) are not described.

But even when the description field is filled in, how much does that tell us about what’s actually being done with the money? We can’t expect to find that information in a spreadsheet at recovery.gov. The knowledge is held collectively by the many people who are involved in the projects funded by these awards.

If we want to materialize a view of that collective knowledge, the ARRA data provides a useful starting point. Every award is identified by an award number. These are, effectively, webscale identifiers — that is, more-or-less unique tags we could use to collate newspaper articles, blog entries, tweets, or any other online chatter about awards.

To promote this idea, the app reports award numbers as search strings. In Keene, for example, the school district got an award for $1.47 million. The award number is S394A090030. If you search for that you’ll find nothing but a link back to a recovery.gov page entitled Where is the Money Going?

Recovery.gov can’t bootstrap itself out of this circular trap. But if we use the tags that it has helpfully provided, we might be able to find out a lot more about where the money is going.

A couple of years ago I was enamored with a clever password manager that pointed the way toward an ideal solution. It was really just a bookmarklet — a small chunk of JavaScript code — that used a simple method to produce a unique and strong password for the website you were visiting. The method was to combine a passphrase that you could remember with the domain name of the site, using a one-way cryptographic hash, in order to produce a strong password that would be unique to the site — and that you’d otherwise never be able to remember.

It wasn’t perfect. Sometimes the passwords it generated wouldn’t meet a site’s requirements. And sometimes the login domain name would vary, which broke the scheme. But it introduced me to two powerful — and related — ideas. JavaScript could turn your browser into a programmable cryptographic engine. And that engine could be used to implement protocols that relied on cryptography but transmitted no secrets over the wire.

To my way of thinking, that’s a killer combination. For years I’ve been using Bruce Schneier’s Password Safe, a Windows program that keeps my passwords in an encrypted store. There are many such programs, another example being 1Password for the Mac. This kind of app lives on your computer and talks to a local data store. That means it’s cumbersome to move the app and your data from one of your machines to another. And you can’t use it online, say from a public machine at the library or a friend’s computer.

Imagine a web application that would encrypt your credentials and store them in the cloud. It would deliver that encrypted store to any browser you happen to be using, along with a JavaScript engine that could decrypt it, display your credentials, and even use them to automatically log you onto any of your password-protected services. You’d trust it because its cryptographic code would be available for security pros to validate.

I’ve wanted this solution for a long time. Now I have it: Clipperz. My guest for this week’s Innovators show is Marco Barulli, founder and CEO of Clipperz, which he describes as a zero-knowledge web application. What Clipperz has zero knowledge of is you and your data. It just connects you with your data, on terms that you control, in a way that reminds me of Peter Wayner’s concept of translucent databases.

Clipperz is immediately useful to all of us who struggle to manage our growing collections of online credentials, But it’s also a great example of an important design principle. We reflexively build services that identity users and retain all kinds of information about them. Often we need such knowledge, but it’s a liability for the operators of services that store it, and a risk for users of those services. If it’s feasible not to know, we can embrace that constraint and achieve powerful effects.