Sitemaps, segmentation, and streaming

The audio accompaniment for yesterday’s exercise hour was Tom Raftery’s interview with Brad Abrams, group program manager for Silverlight. I mention it for three reasons.

First, it’s a nice comprehensive overview of the history and mission of the Silverlight project. Now that the flurry of MIX announcements is over, this is a good time to step back and reflect on the big picture. As someone who’s been working on the .NET Common Language Runtime since its inception, Brad’s in a good position to paint that picture.

Second, it reminds me of an obvious strategy for podcasts that I’ve somehow managed to ignore: solicit questions ahead of time! Tom Raftery does that routinely. In this case people asked a bunch of great questions, Brad Abrams engaged straightforwardly with them, and the resulting show was much richer and deeper than it otherwise would have been. Given that I was an avid practitioner of this method in my journalism days, it’s crazy that I haven’t carried it forward into my podcasting. Gotta fix that.

Third, one particular segment of the interview really grabbed me. Referring to his talk at MIX (WMV, MP4), Brad discusses a strategy for exposing videos to search engines. The ingredients of the strategy are:

  1. A feature of the ASP.NET “Futures” release — DynamicDataSearchSiteMapProvider — that helps developers dynamically generate sitemaps that provide the breadcrumb trails otherwise unavailable to search engines when they visit dynamically-generated sites.
  2. An data source from which the sitemap provider can extract titles and timecodes for chapters within a video.
  3. A SMIL wrapper that provides closed captioning both to the video and, indirectly, to the web pages that the sitemap points crawlers to.
  4. A streaming server.

As an industry we’ve gone back and forth on that last point. In the beginning there was Real which primarily relied on streaming servers rather than standard web servers. The downside was that these were specialized and non-ubiquitous. One of the upsides was that they enabled random access. But then, hardly anybody took advantage of that opportunity. As you can see here, although it’s quite feasible to form URLs that point into Real streams, the details are just geeky enough to deter almost everyone.

Then things shifted. Increasingly the media encoders and players conspired to support progressive downloading. In this mode, you only need a standard webserver, serving up static files. The encoders tuck enough extra information into the files so that players can begin playing right away, after only a short buffering delay. It looks like streaming to most people, and a lot of applications and services even call it streaming rather than progressive downloading.

The upside here was that no specialized servers were needed. Any regular webserver would work, so this approach is very blog-friendly. Got an audio or video file? Just upload it to your blog, and bingo, you’re podcasting or videoblogging.

This radically democratized media publishing, and continues to do so. But, although few recognized the tradeoff, there was one. You couldn’t randomly access a static media file.

Or so most of us thought.

As it turned out, that’s not strictly true, at least not for MP3 files. I realized that some players were able to randomly access parts of statically-served MP3 files, found out how, and prototyped a gateway that would enable anyone to form a URL to a timecoded segment from an MP3 file hosted on a remote webserver.

This was an interesting result, but it was even clunkier than the methods already supported by the Real servers and players — and as we’ve seen, hardly anybody ever discovered or used their random-access features. What’s more, my method only worked for MP3 files by virtue of a special property of that format: frames are (usually) independent of one another, so you can reach blindly into the middle of a file, shove bytes at a player, and expect it to find the next frame boundary and start playing. I’m mostly ignorant of the details of video formats but, so far as I can tell, they don’t tend to work that way.

Now I wonder if we’re heading back to the future. Flash (with FLV) and Silverlight (with WMV) don’t require streaming servers on the back end, they can do progressive downloading as well. But in the services era, you’re less likely to worry about deploying your own streaming server and more likely to use an instance of one that runs in the cloud. That instance can react to requests for timecoded segments in a more intelligent way than by seeking to byte offsets.

It’s true that we failed, the first time around, to make the formation of those requests easy and obvious to people using media players. But a new generation of players — again, both Flash-based and now Silverlight-based — can be friendlier to that kind of innovation.

An example of what we should expect appears at 59:50 in Brad Abrams’ MIX talk (WMV, MP4). You search, find some title or caption text (thanks to a sitemap), click the link, and begin playing a segment at a timecode.

The hardest part, of course, is the data preparation. On my trip to the UK in January I mentioned the Open University’s FlashMeeting system which does a great job of segmenting captured video on the fly, then making it randomly accessible.

There are already too many triple-S acronyms so I probably shouldn’t do this, but the formula I’m looking for is: Sitemaps + Segmentation + Streaming.


  1. Jon,

    thanks a million for the kind words about the podcast and I love the fact that you are using th Rubric theme on this blog! Rubric was an early WordPress theme for WP 1.2 – when WordPress went to version 1.5, one of the changes was a new theme management system.

    I re-wrote Rubric at the time and released the modified version for WP 1.5.

    Great to see it still in use (now on and still looking good.

  2. This is great, similar to what the Annodex folks have been doing for a while now. Unfortunately their insistence upon using Ogg has hindered adoption, but the basic approach is sound.

  3. Jon,

    We’ve been working for a while to implement a deep-link to audio feature in, our online music player. Currently, we just support a start-point expressed in number of seconds from the beginning of the file. Our approach is different from the other’s I’ve seen in that it works via Javascript in the browser. Here’s an example link to the bridge of one of Lucas Gonze’s public-domain hymn resurrections:

    We could add automatic pausing and an end-point, or more complex (minutes:seconds) formats for time expressions, but it feels good just to have the feature out the door.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s