A conversation with Richard Wallis about the Talis semantic web platform

I forgot to mention, before vanishing for the holidays, that I had published this ITConversations interview with Richard Wallis about the semantic web platform that UK-based library systems vendor Talis is building.In a sort of podcasting pas de deux, Richard simultaneously interviewed me for his podcast. Part of that interview was excerpted, and commented on, at ReadWriteWeb under the link-provoking title Sexy Librarians of the Future Will Help You Upload Your Videos to YouTube.

From oil to wood pellets: New England’s home heating future

For many of us living in New England during the first winter of oil prices above three dollars a gallon, discussions of information technology have given way to discussions of home heating technology. We’re at the beginning of an adoption curve here that will rival earlier waves of adoption — first of iron woodburning stoves, then central heating systems fueled by coal and now mostly oil.

Our future fuel is biomass, in the form of sawdust and other recycled wood waste compressed into pellets that look like rabbit food. They burn cleanly and efficiently in airtight stoves that use augurs to feed the pellets slowly and steadily from hoppers into surprisingly small combustion pots.

Like several of our friends, we jumped on the bandwagon last year. Ours is an insert that converts an otherwise useless fireplace into a heat source that’s displacing a sizable chunk of our oil usage. Another friend uses a standalone unit which, because pellet stoves produce so little exhaust heat, is vented through a wall without requiring a tall flue. Two other friends have pellet stoves in their basements, one heating a small house passively and another heating a larger house through its pre-existing forced hot air ducts.

We’re all new to this game, so when we get together we compare notes on stove designs, pellet prices and quality, heat distribution strategies, cleaning and maintenance, and of course effectiveness. Everyone’s story is different. My friends who are leveraging their pre-existing forced hot air system have just about kissed the oil truck goodbye and are saving a lot of money. But they’re not quite as comfortable upstairs in really cold weather as they used to be. Others with hot water radiator systems, including us, aren’t doing nearly that well. We’re using Rube Goldberg arrangements of fans to distribute the hot air that the stoves blow, but that only gets you so far in an old New England house with lots of small rooms. For us, it’s a supplement that’ll pay for itself in a few years, but not a replacement. And we’re way less comfortable in parts of the house than we used to be.

Still, this technology represents a path to a sustainable future in which we use a locally-produced commodity to heat our homes much more cleanly and efficiently than wood ever did before, at a cost that’s already way below oil and will only look better as oil continues to skyrocket.

For most of us, though, it’s not yet a perfect replacement. And for all of us, it’s not automatic. A truck doesn’t show up at the house to pump pellets into a giant tank. We buy them by the ton, and they’re delivered on pallets bearing fifty forty-pound bags that we haul inside, stack, and then dump one at a time into our hoppers. The reload interval varies from less than a day in our case, to up to a week in other cases. Although these stoves produce very little ash — a few ounces per 40-pound bag — the ash removal chore also varies from days to months depending on the design of your stove. In terms of convenience, heating with pellets is more like heating with wood than heating with oil or gas. There’s hauling, there’s loading, and there’s maintenance.

But things will change. A home like mine, with an oil burner and hot water circulating through radiators, will use a boiler that heats the water with pellets instead of oil. These are emerging in Europe, but only starting to come onto the market here. I don’t know anyone who uses one to heat a private residence. But the Harris Center — a nature conservancy in the nearby town of Hancock — is heating 10,000 square feet for $1700/year using a pellet boiler. I was driving by Hancock today and stopped to take a look. The place was closed, but I saw the silo that stores and automatically delivers pellets. For the oil tanks in basements like mine, that’s the handwriting on the wall.

The itemized electric bill

W. David Stephenson, a homeland security and disaster management strategist who shares my interest in citizen use of government data, pointed me to this item which suggests that if data produced by smart electric meters were shared in social networks, we could work together to optimize our energy use.

Sounds great. But as I understand it, although these meters report usage in hourly or even 15-minute intervals — thereby enabling utilities to fine-tune pricing and customers to fine-tune usage — they still can’t itemize your bill on a per-appliance basis. That would be a killer application. Imagine a little device that sits between the appliance’s plug and the wall socket, measures the power use, and reports that data over the AC network to a collector. Each device would be coded, you’d map the codes to appliances (TV, refrigerator, toaster, computer), and you’d wind up with a fully itemized accounting of where all the power goes. No guessing about the payback period for a new and more efficient refrigerator, you’d just know. A few years down the road, if your new Energy Star fridge starts to leak, you’ll be alerted to the fact and know to check the seals.

In this scenario the network effects would get really interesting. When contemplating the purchase of that new fridge, for example, you could go beyond the rated performance to the actual performance as measured by other users of that model. And maybe even adjust for factors like the number of kids in the house who are likely to stand in front of the open fridge door pondering their options.

As Amory Lovins points out, you can’t manage what you can’t measure. His MAP/Ming lectures on power use in industry are full of stories about unmeasured energy flows that, when instrumented, yield easy yet dramatic optimizations.

In software, of course, we know this instinctively. To optimize code, we inject instrumentation that shows us the hot spots where programs spend inordinate amounts of time. We need to inject the same kind of instrumentation into the electrical devices in our homes. I’m no engineer so I’m just asking: Is there conceivably a cheap, low-tech, easily-installed way to do it?

Collaboration plus productivity

As conversations about Office Live Workspace emerge, I expect to read more like this one:

Tim: What happens when you are working on a document at the airport, your wi-fi pass expires, and you hit Save? Maybe a beta tester can answer this. Does Word or Excel prompt for a local copy instead? And if you save such a copy, how do you sync up the changes later?

Kip: Just tried it, Tim. Opened a document from Workspaces, made a change, disconnected from the internet, and hit “Save to Office Live”. Word immediately opened the local My Documents folder and offered to save it there, just as if it was a local file.

Tim: Thanks for checking this out. That’s not ideal in my view. It should save it locally but automatically synch it at the next opportunity.

In fact, I think the folder in which Workspace offered to save Kip’s file was a Workspace folder, not a local one. At least, that’s what I saw when I repeated the experiment:

From there, I clicked the Save button and saw this:

Fair enough. Network problems were preventing a connection to the server. I’d have needed to make a conscious choice to save locally.

What then? A variety of scenarios are plausible. That’s the good news, but also the bad news, about having an offline option. When editable documents can live on multiple local drives and in the cloud, people need to be aware of multiple names and locations. It’s inherently harder to collaborate. But you can untether, and you can use full-strength applications.

With Google Docs, where there is only a single cloud-based editable document, people need only keep track of one name and location. It’s inherently easier to collaborate. But you’re always tethered, and (for now) you can only use rudimentary applications.

Given the differing capabilities — and ubiquities! — of browser-based and native applications, there is no single right approach. But as always, extremes converge. Office Live Workspace already has web-only components, like its simple AJAX note writer, which tends toward the model of a single cloud-based editable document and optimizes for collaboration. Meanwhile Zoho already has a Google Gears-based offline mode, and a plug-in for Word and Excel. It goes the other way, enabling editable documents to be in more than one place and enhancing productivity.

One way or another, we arrive at a midpoint along the continuum which is aptly described by the phrase software plus services. The challenge for everyone will be to figure out how best to merge the collaboration benefits of one model with the productivity benefits of the other.

I am (not) Spock

Tim Bray wants to know if/why spock.com matters. Here’s why I think it does. At some point, people are going to throw up their hands in disgust when invited to sign up for yet another service in order to assert or defend their online identities. So, for example, Spock thinks that Jon Udell is the inspector general at the Department of Justice, based on these two blog postings of mine. In fact, that person (whom I will not name here in order to avert yet more identity confusion1) is represented thusly in Spock.

I have no interest whatsoever in setting Spock straight about these facts, because I know that effort won’t carry over to ZoomInfo or to anywhere else.

I have a huge interest in establishing a presence, anchored somewhere in the emerging identity metasystem, to which I can refer Spock and ZoomInfo and other services. If Spock inspires other folks to appreciate why they might want to establish such presences for themselves, that’d be great. And based on some of the reactions I’m seeing, perhaps Spock will help us get there.

Isn’t it delightful, by the way, that both of these books exist?

I am Spock / I am not Spock


1 Of course, by writing the phrase “Jon Udell is the inspector general” I am probably ensuring that it will show up here.

“The discovery of irregular patterns by group noticings”

Phil Shapiro sent me this unusual and fascinating request:

On the local neighborhood email list I’m on here in Greenbelt, Maryland, there’s a growing sense about a problem with postal mail being delivered on time. It’s beginning to look as if a fatigued postal worker may be doing the proverbial “storing mail at his/her home” rather than delivering it.

Local mail is taking a full week to be delivered and some mail is not arriving at all. It’s been fascinating to watch the growing consensus that something may be amiss.

The discovery of irregular patterns via group noticings needs a name. We need a name for this process to better pinpoint the source and extent of viral outbreaks (such as the pandemic flu) and we need a name for this process for countless other ways that individuals look out for the well being of others.

Thanks for asking your readers if they can think of the right neologism for this process.

I just love the phrase “the discovery of irregular patterns via group noticings.” At first blush, it certainly seems worthy of a neologism! Then again, there is nothing new about group noticings. This was always the way of tribal, village, and agricultural life, before the industrial age made our daily activities less visible to one another. So maybe there’s a perfectly good old word just waiting to be dusted off and brought back?

Technical mastery requires social innovation

A number of times, recently, I’ve made an assertion with which nobody has disagreed. The assertion is that if we invented no new information technologies for the next five or ten years, we could nevertheless move the ball significantly forward by consolidating gains that we should have made by now, but haven’t. My argument is that what people don’t know and seemingly cannot learn about computers, software, and information systems represents what Amory Lovins, speaking in terms of energy, calls negawatts, a resource whose value springs not from new production but from the rethinking and improved utilization of existing resources.

As the software pendulum swings back and forth, we alternately hail the simplicity of interfaces that do very little but are easily learned (Google Docs), and the power of interfaces that do much more but are much harder to master (Microsoft Office). Arguments for the former presume that the latter are doomed because most people never learn to use most of their power. That’s true. But does that mean that most people will never be able to make better use of that power? If so, if we assume that people are simply uneducable in this regard, then it’s a problem across the board. Because even the simplest online application can do much more than people know or appreciate.

For example, del.icio.us looks to be bare-bones simple, and in a way it is, but to use it effectively you have to master some strategies that today elude almost everybody. In a comment on that entry, Tessa Lau writes:

In order to accomplish your #1 and #2 above, people need to both realize that they can do that database query, and that they can refer to the results using a stable URL. I’m coming to believe that both those operations are still way beyond the capabilities of mainstream web users.

Here’s a related example from Gmail. Recently, the application’s URLs became more RESTful. A message URL now looks like this: https://mail.google.com/mail/#inbox/116edd484f4ca72e. Why? So that you can bookmark it, exchange it, compose it with other things. Almost nobody will, of course. But are these operations truly beyond the capabilities of mainstream web users? Or are they just skills that aren’t easily transmissible in the current environment, but might be in a differently-designed environment?

Tessa Lau’s CoScripter is, of course, a beautiful example of such a differently-designed environment. It enables people to share experiential knowledge about the use of software in a relatively frictionless way. In the realm of screencasting, Jing is another way to reduce the friction of sharing such knowledge.

My point holds no matter where the pendulum happens to be at the moment. Across the spectrum of application styles, software can do a better or worse job of augmenting human capability. Simplification is important and useful, but it’s not all that matters. Mastery of the more complex matters too. And people can handle that.

As Lucas Gonze notes here, reading and writing musical notation was once a much more common skill than it is today. The 19th-century parlour music that he’s recovering and bringing back to life was, Wikipedia says, “intended to be performed in the parlours of middle class homes by amateur singers and pianists.” Were those amateur singers and pianists more capable than their counterparts today? No, they were just embedded in a culture that was attuned to a certain sort of peer production.

The peer production of our era is based increasingly on software applications and online resources. If we aspire only to the common denominator, and assume that no forms of mastery will matter, then we do ourselves a great disservice. People can attain mastery in an environment that encourages it. Creating that environment would in fact be a major innovation, albeit more social than technical.

The psychic burden of online registration

Verizon just sent me the following email:

Your request to be removed from the Online Billing program has been received and accepted. Thank you for your participation.  Please do not reply to this Email.  Should you have questions, please contact us at http://www.verizon.com/contactus.Please do not reply to this Email.  For questions or comments, please contact us at http://www.verizon.com/contactus.

I have multiple accounts with Verizon. A couple of years ago, I took advantage of the option to stop receiving paper bills, and instead receive them electronically in my online banking system where I pay them. Now, for no apparent reason, online bill presentment has been canceled for one or more of those accounts. But I never requested to be removed from the Online Billing program!

The headers on the email message convince me that it is genuinely from Verizon, despite its repetitition and the oddly constructed link. There’s no whiff of phishing here, and I’m not being asked to do anything, it’s just a misguided notification.

My options were:

  1. Debug whatever went wrong with my current online bill presentment.
  2. Sign up for Verizon’s own online bill payment service.
  3. Begin receiving paper bills again.

It is a sad commentary that, after contemplating the amount of my time and effort required for each of these options, I chose #3.

Is it just me, or is the psychic burden of online registration reaching epic proportions?

The wisdom of which crowd?

Dave Megginson recounts an interesting experience with the collective mind of Gmail users. When messages from a customer of his went AWOL, he found them in the Gmail spam trap. He surmises that even though this sender played by the opt-in/opt-out rules, the crowd made its own rule.

My guess is that they sent out an announcement, a lot of other gmail-users flagged it as spam, and whatever weighting algorithm gmail uses tipped it over so that the messages were no longer considered legit by default.

Dave concludes:

This new collaboration is an unexpected side-effect of the shift from desktop e-mail clients to web mail, and it would be foolish for companies not to pay attention.

I agree that companies should pay attention, and Dave’s list of do’s and don’ts (e.g., “I don’t care that your company just won five awards — SPAM!“) is spot on.

But while it’s true that we see this effect as a consequence of the shift from desktop e-mail to webmail, there’s no reason in principle why desktop e-mail clients can’t also contribute to this new crowd wisdom. Virtually all e-mail programs offer a “Mark as Spam” option. Those votes could be collected and processed by any cloud service, and not necessarily by one bound to any particular webmail service.

In Dave’s case, for example, he might rather not rely on the collective wisdom of the random group of folks with whom his Gmail account happens to be colocated, but instead on some trust circle of his own choosing or making.

The fact that we can’t separate these concerns is unrelated to the architectural choice of web-based versus client-based software. In both contexts we could specify whose wisdom we would like to add to our spam filters. We don’t have that choice today because we’ve barely scratched the surface when it comes to exposing services that users can compose to their liking.

Elsewhere, meanwhile, Doug Purdy notes the integration between Google Talk and Google Reader — whereby shared items become visible to friends — and remarks:

I still can’t believe we don’t have something like Live Reader.

Of course Live Reader, like Google Reader, would almost certainly expect me to share within the artificial context defined by use of the service. It’d be refreshing to see a different take, one that would enable me to reach out to friends and associates across a range of reading and bookmarking services.

How HD Photo will make happy snappers even happier

Back in July I interviewed Bill Crow about HD Photo, the image format that’s being considered for standardization as JPEG XR. One of the advantages of this new format, as Bill explained on his blog, is that it can preserve data that would normally be lost when a camera decides what color values to include in a photo. Now, there aren’t yet any cameras that implement HD Photo, but the idea is that when they arrive, you’ll get the the best of both worlds. As a JPEG camera does today, an HD Photo camera will produce an image that distributes color values as best it can. But unlike a JPEG camera, an HD Photo camera won’t throw away all the values it doesn’t include. More information will be preserved in the image, and will be recoverable in the editing process.

Bill demonstrates and explains that editing process in this 5-minute screencast (Silverlight, Flash). The editing application is nothing fancier than Windows Live Photo Gallery, and that’s an important point. Most people, myself included, are not wizards in the realm of color theory and advanced image manipulation. We’re happy snappers. We’d just like to be able to move a slider and pull in some information that the camera didn’t assign to the visible range but that we want to include. This screencast shows how easy that will be.

Discovering versus teaching principles of social information management

In response to Josh Catone’s observation that del.icio.us has failed to go mainstream, Richard Ziade offers three hypotheses:

  1. Nobody really needs a way to centrally store their bookmarks
  2. Most people don’t understand what del.icio.us does
  3. People don’t feel compelled to share del.icio.us with others

The winning explanation, I am sure, is #2. Nobody understands what del.icio.us does. I am constantly explaining the nature and value of its social information management capabilities. Just this week, in various meetings on Microsoft’s Redmond campus, I found myself reiterating four of my major uses of del.icio.us.

1. Answering a question with an URL.

I’m often asked questions like “What have you written about how to do screencasting?” I answer with an URL:

http://del.icio.us/judell/screencasting+howto

This not only wildly efficient, it’s future-proofed. If I hand you that URL today, then later add new items to the list, you’ll pick them up if you visit the URL in the future.

2. Del.icio.us as a database.

The URL shown above is an example of the pattern I discussed here. It’s actually a query: select all bookmarks where one tag is screencasting and another tag is howto. If you understand that such queries are possible, judicious assignment of tags becomes a data management discipline.

3. Collaborative list curation.

As discussed here:

Recently I began keeping track of interesting public data sources using the del.icio.us tag judell/publicdata, and invited others to do the same using their own del.icio.us accounts. That method sets up an interesting pattern of collaboration whereby all contributions flow up to the global bucket, tag/publicdata, but individual contributors can curate subsets of that collection according to their own interests.

It’s a powerful pattern for loosely-coupled collaborative list-making.

4. Feedback monitoring.

When I’m visiting an URL, I often use my del.icio.us citations bookmarklet (available here) to see who has bookmarked the URL, which quotation and tags were used to describe it, and what the history of attention to that URL has been over time.

Is it del.icio.us’ fault that, even in the geek subculture where the service is mainly used, so few people seem to discover and exploit these patterns? I wonder about this all the time, and not just with respect to del.icio.us. True, all of our information management tools could do a better job making features more easily discoverable. But to grok the patterns and apply the strategies I’m talking about, it’s not enough to know that features exist. You need to develop a sense of how those features can be used in support of certain principles of personal and social information management. It would be great if we could create software that naturally leads us to the discovery of those principles. But that’s a tall order. While we’re waiting, I think we should admit that these principles ought to be part of what you learn in order to become a digitally literate 21st-century citizen.

Professional services for professional blogs

This morning my web presence intersected on the Information World Review blog with the web presence of Ben Toth. In an IWR interview, Ben describes himself as follows:

Ben Toth, 48, domiciled on a farm in Herefordshire. I trained as a librarian at University College London about 15 years ago. I used to be the director of the NHS National Knowledge Service when it was part of Connecting for Health. The best known service it runs is the National Library for Health (www.library.nhs.uk). Currently, I’m designing the enterprise architecture for the National Institute for Health Research (www.nihr.ac.uk). I’m also writing a book on Health 2.0, which will be published in parts later this year.

Further along in the interview:

Q: How long have you been blogging?
A: Since about 2001. Eighteen months ago I lost all my entries and had to start again.

This is nuts. Never mind the posthumous disposition of the writings of this librarian and enterprise architect. They are not even reliably available here in the present.

Here’s another example. Recently John Halamka, whom I interviewed here, launched a remarkable example of the genre I call the professional blog — by which I do not mean blogging for pay, but rather the purposeful narration of a professional life. At geekdoctor.blogspot.com, Dr. Halamka has opened a window into the life of a dynamic individual whose insights into healthcare IT, and whose stewardship of key initiatives and standards in the area of portable health records, will be historically significant but are also important touchstones here in the present.

And yet…geekdoctor.blogspot.com? That’s the best we can do? Again, I’m not picking on any particular service. None of the present options offer anything close to the levels of service that a professional person investing real effort into the narration of a professional life ought to expect.

For Dave Winer, for me, for Ben Toth, for John Halamka, and for a growing number of professional bloggers in the sense I’m defining the term, there’s got to be a better way. We don’t need services that are free. We need services that are reliable here in the present, and that offer tiered levels of future assurance. If you build it, we will pay.

Matt MacLaurin on creative expression with Boku

In this week’s ITConversations show with Matt MacLaurin we discuss Boku, a programming environment in which kids can create their own games.

What inspired Matt to create Boku was the following observation:

If I’m a kid today, looking at the computer, am I going to see it as an art tool, as something that’s there for creative expression? Or am I going to see it as a content channel, really just a television with some interactivity and a whole lot of channels?

He contrasts that with his own early experience of computers:

If you turn on a Commodore Pet, or an Apple II, the first thing you get is the Basic prompt, and it’s really just saying that the computer is a blank slate, and it’s waiting for you to create something in code. That idea that the computer screen is a surface you can paint your ideas on, and then have those ideas come to life in a magical way, that’s what I wanted to recapture.

True, the pendulum is swinging back toward an architecture of participation. Text, audio, and video artifacts are pouring onto the net. But, says Matt, “you can express creative ideas in programming that you can’t really express in any other way.”

Exploring ideas in the realms of gaming and simulation should be as accessible, Matt thinks, as blogging and podcasting have become. So he’s created, in Boku, a programming environment where programs always run, are never incorrect, and are developed by the accretion of actors, of objects, and of rules that govern their interactions.

The roots go back to Logo and Smalltalk, but there’s also a very modern aspect: the system is decentralized and loosely-coupled, behavior is emergent. The robotics examples that Henryk Nielsen demonstrates in this screencast are spiritual cousins to Boku: loosely-coupled systems of independent, rule-governed actors.

If Boku teaches kids to think in these terms, I think they’ll be well prepared not only to create their own interesting games and simulations, but also to negotiate the emerging ecosystem of online services. With the Commodore Pet and the Apple II you were the ruler of the universe. But no more. As programmers who act in a world of loosely-coupled services, we can no longer aspire to total control. Instead we need to learn how to be effective consumers, producers, and — above all — interactive peers.

Boku is also, by the way, fundamentally social.

If you make something that’s cool, not a 4-hour game but just a 15-second interesting visual phenomenon, you give it a name and maybe a tag, and you hit a button, and it goes up into the cloud so other people can pull it down and take it apart. You can’t download a game without getting the source, Boku doesn’t really distinguish between the two. We consider authoring and tweaking to be part of the play.

Matt recalls a conference at which Ray Bradbury spoke, and said that while he was impressed with all the tech, the real problem lay elsewhere.

He held up a blank sheet of paper. What do you do when you’re faced with a blank sheet of paper, and you don’t have an idea? How do you come up with that idea? How do you express it? Our shorthand for that is the blank page problem, and we think about it all the time for Boku. The community is the answer. If you’re stuck, go and surf what others have uploaded today. If you can’t figure out how to make the motorcycle jump over the canyon, go and browse motorcycle and jump, and look, someone else did it. So you pull it down, and of course the code’s built in. It’s obvious I guess, but the Internet as an infrastructure for creative communities to come together — that’s really what it’s all about.

In case you can’t tell, I hugely enjoyed this interview. I hope it will be heard not only by those who care specifically about game programming but also by those broadly interested in creativity, education, and commons-based peer production.

From Simple Sharing Extensions to FeedSync

It’s been a couple of years since Ray Ozzie kicked off the Simple Sharing Extensions (SSE) initiative on his blog. It’s not so easy, by the way, to know exactly how many years it’s been. If you search for sse ray ozzie you’ll land on this page which is dated Nov 20, but no year is mentioned. I had to browse around in the archive to remind myself it was November 2005. Blogging platforms — and I’m not singling out Live Spaces here, they’re all in the same boat — aren’t good places for documents that turn out to have historical significance. Of course you never know what will turn out to be significant. That’s why I’ve been evangelizing the idea of a hosted lifebits service that will keep our stuff intact, and available for reliable long-term citation. But, I digress.

SSE was updated this week, and renamed as FeedSync. Over on Channel 9 I published a podcast and a screencast (Silverlight, Flash) with Steven Lees, one of the folks working on FeedSync. The screencast walks through an example (available at CodePlex) in which two simple list-making applications, running on two different machines, synchronize the insertion and deletion of items. In this case they talk through a relay service, but that’s not required. FeedSync is agnostic to topology and transport, specifying only how to represent updates, deletions, and conflicts in RSS and Atom feeds, and how to process those events into a merged result.

The merge algorithm described in the spec isn’t something most of us are going to be able to bang out quickly and reliably, so I’ll be on the lookout for implentations that package up the logic. One of those is the Microsoft Sync Framework, which has just updated the Microsoft.Synchronization.Sse namespace to Microsoft.Synchronization.FeedSync. I haven’t found good examples for using it, so if you can point me to some I’d appreciate that. Similarly I’m curious to know about other implementations of the synchronization logic for other programming environments. Or, equally interesting, implementations delivered as cloud services.

Although FeedSync is capable of full-blown multi-master synchronization, there are all kinds of interesting uses, including simple one-way uses. Consider, for example, how RSS typically has no memory. Most blogs publish items into a rolling window. If you subscribe after items have scrolled out of view, you can’t syndicate them. A FeedSync implementation could enable you synchronize a whole feed when you first subscribe, then update items moving forward. It could also enable the feed provider to delete items, which you might not want if the items are blog postings, but would want if they’re calendar items representing cancelled events.

RSS took off in part because it was human-readable and -writeable. To this day, you can get away with doing simple things “by hand”. With FeedSync, the RSS and Atom formats are still pretty easy to read and write. But nobody is going to do the rule-governed transformations by hand. For that we’ll need an ecosystem of libraries and services. I hope they’ll emerge because FeedSync is an extremely general mechanism that could be applied in all sorts of useful ways.

Simile: Semantic web mashups for the rest of us

I was at MIT yesterday to give a talk, and afterward visited with the Simile project team. I’d known a bit about their semantic web efforts, notably Piggy Bank, a Firefox extension that hosts JavaScript-based screenscrapers that extract data from web pages. But there’s been a ton of development since then, and it’s all good.

The best way to describe what I saw yesterday is probably to start with this example. It’s what the Simile team calls an exhibit, which is a web page that performs faceted browsing of a data set. In this case, my example exhibit is actually a mixture of two others: the CSAIL faculty page, and the CCNMTL staff page. If you visit either of those exhibits you’ll find that you can restrict the view by selecting, for example, the group facet. The CSAIL page exposes another view called position. (The CCNMT doesn’t expose its analog to that facet, but if it did, it would be called title.)

View source on either of these pages and you’ll find a very simple chunk of HTML that enumerates and styles the included data elements. You’ll also see references to two JavaScript files. One contains all the AJAX behavior that populates the page with data and drives the interactive experience. The other contains the data, which is a JSON serialization of a simplified form of RDF.

The first thing to notice is that you can rip and replace the data reference. In fact that’s what happened when one of these pages was cloned from the other. The clone then proceeded to rename and specialize its data set, yielded a seemingly incompatible result.

So how did I create a merge of the two? Enter Potluck, a shockingly capable AJAX-style data mixer. I referenced the two data sources, and then combined analogous fields. I merged CSAIL’s position and CCNMT’s title into a single field called position. Even more interesting is the field called building. The CCNMT data has a thing called building, but the CSAIL has nothing really comparable — there’s tower, but that’s not equivalent. In fact, the CSAIL equivalent to CCNMT’s building is the prefix in the CSAIL office field. In an office value of 32-G606, for example, the building is implicitly building 32.

David Hyunh, the original author of Potluck, showed me how to extract that implicit information using a feature called simultaneous editing. Here’s how that works:

The editor groups things into similar columns. You can then adjust an entire column by editing a single entry. Here I’ve inserted “Building” in front of “32” and selected and deleted everything following “32”.

To define a new column you drag/drop fields. To create the merged position field, I dragged position from CSAIL, and title from CCNMT, then renamed the combined result as position.

To define a facet, you drag/drop the column name to another area of the canvas. For my merged exibit, I used the facets origin (i.e., CSAIL vs CCNMT), plus group, position, and building.

Then I exported the merged data into the same JSON format as the original sources, cloned one of the pages, and referenced the merged data set. From there it was just a bit of tweaking to make the div elements in the HTML page reference the facets that I’d defined.

Stunning.

Behind the scenes it’s all RDF, but the point is that nobody needs to know or care about that. And the larger point is that the Simile folks — having spent years fighting ontology wars — have now gone AWOL. The new stance is: Everybody gets to name their fields as they prefer, and mashup tools like Potluck can define equivalences among them. All the original source data, and all the merged data, is available in a common format that translates into grist for the engines in the RDF mill. All the data, and all the interactive behavior associated with the data, is cleanly separated from the presentation.

This is a great boostrap strategy. When faculty group B sees the cool faceted browser that faculty group A has made, B will want one of its own. It can pretty easily figure out how to adapt its data to the format, perhaps with some help from the Babel translator in order to, say, repurpose a spreadsheet. Everybody gets to scratch their own itches, and the environment makes things easy and fun, but under the covers semantic data is being accumulated.

I don’t think that any semantic web skeptic, and I have been one, has ever disputed the value that can emerge when you traverse RDF-style data sets. The question has always been: How will we get people to create those data sets, in ways and for purposes meaningful to them? The Simile team are laser-focused on solving that problem, and from what I can see they’re biting off huge chunks of it with these tools and methods.

I’m not suggesting that ontologies will play no role, but I’ve long believed that we need to evolve toward them from real data that people can create, use interactively, and begin to cross-combine. That’s exactly the approach that Simile is taking. Seeing it in action, and then easily reproducing it myself, totally made my day.

Passwordless MyOpenID

In response to a Kim Cameron item about Blogger’s support for OpenID — and, when the OpenID provider is myopenid.com, for identity selectors — Vittorio Bertocci pointed out something I had not realized:

MyOpenID does exactly what I was asking for: it allows me to create a new openid without having to establish any password. Let me repeat/rephrase it: I can create an account that can be accessed exclusively by using a personal card.

That got my attention. Coincidentally I had just been reading the rough cut of Vittorio’s forthcoming book, Understanding CardSpace, and was at the same time reviewing how OpenID providers like MyOpenID work with OpenID relying parties like ClaimID.com. The ability to create a passwordless, card-only account on MyOpenID is a great step forward, for the reasons Vittorio explains on his blog.

I went over to MyOpenID, created a new, passwordless account, associated that OpenID URL with my ClaimID account, and away I went. Nice!

Now I’m trying to imagine how I would explain all this to a civilian. Honestly, I don’t think I could, yet. It’s a stretch even for me to hold in my head all the moving parts. Which identity selector works with which browser on which platform? What does the card represent? What does the OpenID URL represent?

But we are tantalizingly close to real use cases that will begin to walk people through these scenarios. It’s difficult to describe the abstractions, but as people begin to actually have the experiences, it’ll all start to come clear. Similarly, as people start to have the managed-card experiences that Dick Hardt discusses in our ITConversations podcast, those will start to come clear as well.

To all those attending the Internet Identity Workshop today: Thanks, and keep up the great work!

A conversation with Greg Whisenant about CrimeReports.com

For this week’s ITConversations show I spoke with Greg Whisenant, founder of CrimeReports.com. His company, called Public Engines, has ambitions to offer a range of services that enable citizens to access public data. CrimeReports, the flagship, aims to generalize the process of data extraction and reformulation that was done by Adrian Holovaty for ChicagoCrime.org. It works by installing software behind the police department’s firewall that relays crime data from internal reporting systems to the CrimeReports service.

Participating towns and cities all become part of single federated mapping application. So if two towns are adjacent, you’ll just pan seamlessly across the political border. It’s a cool idea, and makes you wonder about how a service/syndication-oriented architecture could enable federation across different mapping applications.

What’s particularly exciting to Greg, and to me as well, is the way in which these kinds of applications begin to create a framework for citizen/government collaboration. To that end, it’ll be important to roll out these services at a pace, and in a way, that enables governments to feel comfortable as they move to a more transparent stance. So CrimeReports does things in a pretty controlled way. Police departments can internally preview the application before it’s released, and there’s also the option to run more detailed analysis internally than is available to the public.

What worries me a little, though, is that CrimeReports implementations don’t (so far) yield up feeds of the underlying data. I understand the reasons why not. But I think it’s crucial that citizens will come to expect such access, and will be encouraged to make effective use of it.

First things first, to be sure. Systems that enable citizens both report and review a variety of events in the lives of their cities will bring a new and welcome era of collaboration. But let’s make sure the data flowing through those systems is, and remains, available.