The persistent blogosphere

In response to last Friday’s podcast with Tony Hammond about publishing for posterity, David Magda wrote to point out that our main topic of discussion — the DOI (digital object identifier) system — is one implementation of the CNRI (Corporation for National Research Initiatives) Handle System but there are others, including DSpace. I wondered whether this class of software might work its way into the realm of mainstream blogging. David responded:

A weblog (or web pages in general) are simply a collection of text, link, pictures. This is no different than any other document / object / entity that Dspace would handle. It’d simply be another type of CMS IMHO. I think this would be a really good project to implement for an undergrad thesis, or perhaps as part of a master’s thesis.

However as neat as all this is, I don’t think it would be implemented soon: or at least not in mainstream software. Few people will care whether their MySpace page survives over the aeons (and many people don’t want their kids to know what they did twenty years in the past).

But some of us do, and more of us will. The other day, for example, my daughter walked into my office while I was in the middle of a purge. Among the items destined for the recycling bin was a pile of InfoWorld magazines.

She: You’re throwing all these out?

Me: No, I’m keeping a few of my favorites. But as for the rest, I don’t have the space, and anyway it’s all on the web.

She: Don’t you want your grandkids to be able to see what you did?

Heh. She had me there. A pile of magazines sitting on a shelf is almost certainly a more reliable long-term archive than a website running on any current content management system.

Here’s another example. Back in 2002 I cited an essay by Ray Ozzie that appeared on what was then his blog, at ozzie.net. But if you follow the link I cited today, you’ll land on the home page of the latest incarnation of Ray’s blog. The original essay is still available, but to find it you have to do something like this:

My Blog v1 & v2 -> stories -> Why?

So OK, the web rots, get over it, we should all accept that, right?

Well, libraries and academic publishers don’t accept that. Nothing lasts forever, but they’re building content management systems that are far more durable and resilient than any of the current blogging systems.

Conventional wisdom says that it wouldn’t make sense to make blogging systems similarly durable and resilient, for two reasons. First, because the investment would be too costly. Second, because blogs aren’t meant to last anyway, they’re just throwaway content.

The first point is well taken. As Tony Hammond points out in our podcast, the cost isn’t just software. Even when that’s free, infrastructure and governance are costly.

But I violently disagree with the second point. Just because most blog entries aren’t written for posterity doesn’t mean that many can’t be or shouldn’t be. My view is that blogs are becoming our resumes, our digital portfolios, our public identities. We’re already forced to think long-term about the consequences of what we put into those public portfolios because, though no real persistence infrastructure exists, stuff does tend to hang around. And if it’s going to be remembered, it should be remembered properly.

So a logical next step, and a business opportunity for someone, is to provide real persistence. This service likely won’t emerge in the context of enterprise blogging, because enterprises nowadays are more focused on the flip side of document retention: forgetting rather than remembering. Instead it’s a service that individuals will pay for, to ensure that the public record they write will persist across a series of employers and content management systems.

A conversation with Tony Hammond about digital object identifiers

Tony Hammond works with the new technology team at Nature Publishing Group. His company publishes a flock of scientific journals in print and online including, most prominently, Nature. It also operates Connotea, a social bookmarking service for scientists. In this week’s podcast we talk about digital object identifiers which are, in effect, super-URLs designed to survive commercial churn and to work reliably for hundreds of years.

Many of us are becoming publishers nowadays, and we’d like to imagine that all our stuff could enjoy that level of consistency and durability. Few of us are prepared to make the necessary investment, but it’s interesting to hear from someone who has.

Unintended consequences of syndication

A while back, when this blog lived over there, I decided to include a recent links widget in the left column. So I injected some JavaScript into that column in order to read my JSON (JavaScript object notation) feed from del.icio.us and convert it to HTML. One unintended consequence of this arrangement was a change in how I used del.icio.us. Of course it’s always true that your stream of bookmarks is public — except for the relatively new option to bookmark privately. You may even find that people are pointing feedreaders at your stream of bookmarks and subscribing to them. But still, it doesn’t quite feel as though you’re publishing those bookmarks in a really explicit way.

If you do decide to explicitly publish your bookmarks in a sidebar widget on your blog, it may change the way you bookmark. It did for me, anyway. The balance shifted away from purely personal information management and toward the kind of editorial sensibility that governs the blog. It was around this time that private bookmarks became available in del.icio.us, and that’s been helpful. If I’m researching something and I just want to collect a list of resources labeled with some obscure tag meaningful only to me, there’s no need to flow that stuff onto my blog page. Conversely, if I want to draw attention to something in a public way, I can. It sounds great in principle, but in practice I think the friction involved in making that choice on a per-item basis made me less likely to bookmark either publicly or privately.

Here’s another unintended consequence that illustrates the surprising things that can happen in this web of information we are spinning. I realized the other day that my public del.icio.us bookmarks are appearing on every page of my mothballed InfoWorld blog. To stop that happening, I’d need to tweak its template — but I no longer have access to it!

Meanwhile, ironically, I haven’t yet figured out how (or whether) I can inject the del.icio.us JSON feed into the hosted WordPress blog I’m running here. For now, I’ve decided to embrace this constraint. Perhaps if my del.icio.us account feel less directly connected to what I am publishing now, I’ll use it more freely.

These scenarios are rather odd, quite interesting, and slightly scary. As building software systems with components morphs into building information systems with feeds, they’ll become increasingly common.

Divergent citation-indexing paths

As I mentioned the other day, it’d be useful to have audio-only versions of many of the Channel 9 videos for folks who (like me) have more time to listen away from the computer than to watch at the computer. Pretty soon that’ll start happening for new stuff. Then, of course, there’s the archive. One proposal was to sift out the most popular videos and convert them first. But what does “most popular” mean? It depends.

Pageviews and downloads are one way to measure, and of course the sites have those stats. Citations are another. I’m always interested to see how frequently things are being cited in blog postings and shared bookmark systems. The visualization I did here, which tracks citations of the ACLU Pizza fictional screencast as seen through the lenses of del.icio.us and Bloglines, is a nice example.

So I scooped up the video URLs mentioned in this RSS feed and ran the same sort of analysis. The Bloglines results were fine, but the del.icio.us results were wonky. Eventually I found the problem. Del.icio.us, unlike Bloglines, treats the URLs that you feed to its citation counter in a case-sensitive way. And there are multiple spellings of the “same” URL pattern on Channel 9, including:

channel9.msdn.com/ShowPost.aspx?PostId=
channel9.msdn.com/Showpost.aspx?postid=
channel9.msdn.com/ShowPost.aspx?PostID=

Because del.icio.us citations attach to particular spellings, you’d need to query using each of them to get the whole story.

This case-sensitivity is one of the subtler forms of a much more general problem. Content management systems quite commonly provide different URLs for the same resource, and each of those is an invitation for citation indexing systems to wander down divergent paths.

For a long time I’ve thought that strong URL discipline was the best way to avoid this problem. But there are other approaches. In the academic world, where citations are taken very seriously, digital object identifiers play a much more important role than they do on the web. We web folk could learn a thing or two from our academic cousins, and that’s one reason why I’ll be interviewing Nature.com’s Tony Hammond for an upcoming podcast.

If you’re curious, by the way, here’s the data from Bloglines. I don’t have the del.icio.us data yet because I managed to get myself blocked from the server while futzing around — sorry Josh, I’ll query more slowly next time.

bloglines
citations
show title
158 169962 Otto Berkes – Origami’s Architect gives first look at Ultramobile PCs
134 151853 Robert Fripp – Behind the scenes at Windows Vista recording session
090 271984 Scott Guthrie – MIX07, Work, and Personal Details Revealed
057 270965 Windows Home Server
043 261254 Looking at XNA – Part Two
034 116347 Steve Ball – Learning about Audio in Windows Vista
029 116702 Paul Vick and Erik Meijer – Dynamic Programming in Visual Basic
028 019174 Andy Wilson – First look at MSR’s "touch light"
024 056393 Steve Swartz – Talking about SOA
016 069437 Office Communicator
010 268480 Special Holiday Episode IV: Don Box and Chris Anderson
010 256597 WCF Ships, Doug Purdy Dances, and Don Box Sings
009 252457 A Chat and Demo about LINQ with Wee Hyong (Singapore MVP SQL)
009 208891 What’s Microsoft Speech Server (Beta)?
008 039280 Herb Sutter – The future of Visual C++, Part I
008 010189 Anders Hejlsberg – What’s so great about generics?
006 273697 Anders Hejlsberg, Herb Sutter, Erik Meijer, Brian Beckman: Software Composability and the Future of Languages
006 272229 Ulrik Molgaard Honoré: Production planning with Dynamics AX
006 229585 Programming in the Age of Concurrency: The Accelerator Project
006 159231 Office 12 – Word to PDF File Translation
005 267098 The Best XNA Movie in the UNIVERSE
005 221610 Shankar Vaidyanathan – VC++ IDE: Past, Present and Future
004 274865 Scott Hanselman & Jeffrey Snover Discuss Windows PowerShell
004 009894 Building a Picture Frame with Windows CE 5.0 – Step 1
003 274069 Brad Abrams on AJAX for ISVs
003 266221 MultiPoint: What. How. Why.
003 248575 Software Security at Microsoft: ACE Team Tour, Part 2
003 246477 Exploring the new Domain-Specific Language (DSL) Tools with Stuart Kent
003 237142 VSTO 2005 Second Edition Beta: Martin Sawicki
003 029505 Gabriel Torok – Protecting .NET applications through obfuscation
002 271257 Adam Carter and Mike Adams on Managed Services
002 269462 Tara Roth: Not your father’s world of Software Test
002 265667 Revisiting WiMo – The Windows Mobile Robot
002 263358 Joe Stegman talks about the "WPF/E" CTP
002 013653 Jason Flaks – What is Windows Media Connect?
001 274644 Beam me over, Scotty: Introducing Transporter Suite
001 274641 Sharepoint Templates: What. How. Why.
001 273337 New Vista GUI Stuff For Devs
001 273061 Mike Barrett: Testing and Deploying IPV6
001 270453 Technology Roundtable #1
001 267604 UK Community: DeveloperDeveloper Day
001 263442 Expression – Part One: The Overview
001 232481 WPF Chart Control (from the perspective of summer interns)
000 273120 Ask The Experts! : Anders Hejlsberg
000 271378 Ask The Experts! : KD Hallman
000 264874 Rob Short: Operating System Evolution
000 263902 Windows 2000 to Windows Vista: Road to Compatibility
000 238608 Windows Vista: Ready for ReadyDrive

A screencast about common feeds in Vista


Today’s 4-minute screencast, which explores Vista’s common feed system, serves multiple purposes. First, I wanted to familiarize myself with this stuff, and do so in a way that would elicit responses that help me understand how other folks are reacting to it. I am intensely interested in the reasons why people do or don’t take to the notion of reading RSS feeds. Mostly, as we know, they haven’t.

The assumption is that surfacing the concepts more prominently in the OS will help, and I think that’s true, but there’s a lot going on here. For example, even just explaining to people how feeds are like-but-unlike email is a huge challenge. When you start from the perspective of reading feeds versus reading email, it’s hard to see the difference. One key distinction — that feeds are by-invitation-only and can be easily and effectively shut down, versus email which is uninvited and can be very hard to deflect — is fairly abstract and hasn’t sunk in yet for most people.

When you start from the perspective of writing feeds versus writing email, the differences, and the benefits that flow from those differences, are even more compelling — at least to me. But the reasons why are even more abstract: manufactured serendipity, maximization of scope, awareness networking. How might Vista, or any desktop operating system, help surface these concepts?

I also made this screencast to find out what it’s like to make screencasts of Vista. I haven’t yet installed Camtasia on my newly-acquired Vaio laptop, because I want to repave that machine with a final version of Vista that I don’t have yet. But no worries, there’s always good old Windows Media Encoder. I’ve always said it’s an underappreciated jewel, and evidently that’s still true as it is not inclulded in Vista.

After capturing with Windows Media Encoder I transferred the file to my XP box for editing in Camtasia. As always, the process reminded me of Pascal’s famous quote: “If I had time, I would write a shorter letter.” Boiling a screencast down to its essence is really hard. One of the biggest challenges is meshing the video footage with the audio narration. I want to produce a series of screencasts that illustrate this process, but I’m not sure how best to separate out the kinds of general principles I outlined here from details of specific applications and delivery formats.

A couple of final points about the RSS features shown in the screencast. It shows how to acquire feeds one at a time into the common pool using IE, and how to acquire batches of feeds into Outlook by importing an OPML file, but there’s no obvious way to load a batch from OPML into the common pool. I know I could write that app, but is there one lying around somewhere that I’ve missed? Also, how do you batch-delete feeds from Outlook once you’ve acquired them via OPML?

Matthew Levine’s holy grail

Yesterday I noticed that my new home page was a disaster in IE7. I’d cribbed a CSS-based three-column layout from Google’s Page Creator. Then, as is my custom, I’d pruned away as much of the complexity as I could. But evidently I pruned too much of, or the wrong parts of, the the CSS gymnastics encoded in that example. So I wrecked the layout for Internet Explorer.

As per comments here, standards support in IE7 is a thorny issue and discussions of it are heavily polarized. But I aim to be (cue Jon Stewart) a uniter not a divider. So here I simply want to give props to this article by Matthew Levine, from A List Apart. From it, I cribbed the wrapper-free Holy Grail. It’s a minimalistic CSS-based three-column layout that seems to work well in every browser I’ve tried: IE6, IE7, Firefox, Safari.

To be honest, although I’m hugely fond of CSS styling, I’ve always struggled with CSS layouts, and I know I’m not the only one in that boat. When you read the explanation in Matthew’s article, you can see why. CSS layout is like one of those games where you slide 15 tiles around in a 16-square matrix. In principle it is a declarative language, but in practice the techniques are highly procedural: Step 1, Step 2, etc.

Whether that’s good or bad, and to what extent CSS layout really does trump table-based layout — these are interesting questions but separate discussions. The bottom line here is that I wanted to do a CSS-based layout, I wanted it to be as minimal as possible so I’d have the best shot at understanding and maintaining it, and I wanted it to behave reasonably in a variety of browsers. For that I needed the pattern, or recipe, which Matthew’s article helpfully provided.

It’s appropriate that he calls the technique the Holy Grail because the three-column layout applies very broadly. Yet, though there are tips and tricks all around the web, I’m not aware of a well-known cookbook, or pattern library, that:

  • Identifies the handful of most popular layouts.
  • Illustrates minimal, bare-bones CSS treatments.
  • Certifies those treatments for cross-browser use.

For extra credit, this cookbook could filter the recipes according to whether support in each of the major browsers is must have or nice to have or optional.

Cross-browser issues have always been a headache, they still are, and the reality is that dealing with them requires hacks. The more we consolidate and simplify the hacks, a la Matthew Levine’s holy grail, the better.