As I mentioned the other day, it’d be useful to have audio-only versions of many of the Channel 9 videos for folks who (like me) have more time to listen away from the computer than to watch at the computer. Pretty soon that’ll start happening for new stuff. Then, of course, there’s the archive. One proposal was to sift out the most popular videos and convert them first. But what does “most popular” mean? It depends.
Pageviews and downloads are one way to measure, and of course the sites have those stats. Citations are another. I’m always interested to see how frequently things are being cited in blog postings and shared bookmark systems. The visualization I did here, which tracks citations of the ACLU Pizza fictional screencast as seen through the lenses of del.icio.us and Bloglines, is a nice example.
So I scooped up the video URLs mentioned in this RSS feed and ran the same sort of analysis. The Bloglines results were fine, but the del.icio.us results were wonky. Eventually I found the problem. Del.icio.us, unlike Bloglines, treats the URLs that you feed to its citation counter in a case-sensitive way. And there are multiple spellings of the “same” URL pattern on Channel 9, including:
channel9.msdn.com/ShowPost.aspx?PostId=
channel9.msdn.com/Showpost.aspx?postid=
channel9.msdn.com/ShowPost.aspx?PostID=
Because del.icio.us citations attach to particular spellings, you’d need to query using each of them to get the whole story.
This case-sensitivity is one of the subtler forms of a much more general problem. Content management systems quite commonly provide different URLs for the same resource, and each of those is an invitation for citation indexing systems to wander down divergent paths.
For a long time I’ve thought that strong URL discipline was the best way to avoid this problem. But there are other approaches. In the academic world, where citations are taken very seriously, digital object identifiers play a much more important role than they do on the web. We web folk could learn a thing or two from our academic cousins, and that’s one reason why I’ll be interviewing Nature.com’s Tony Hammond for an upcoming podcast.
If you’re curious, by the way, here’s the data from Bloglines. I don’t have the del.icio.us data yet because I managed to get myself blocked from the server while futzing around — sorry Josh, I’ll query more slowly next time.
bloglines citations |
show | title |
158 | 169962 | Otto Berkes – Origami’s Architect gives first look at Ultramobile PCs |
134 | 151853 | Robert Fripp – Behind the scenes at Windows Vista recording session |
090 | 271984 | Scott Guthrie – MIX07, Work, and Personal Details Revealed |
057 | 270965 | Windows Home Server |
043 | 261254 | Looking at XNA – Part Two |
034 | 116347 | Steve Ball – Learning about Audio in Windows Vista |
029 | 116702 | Paul Vick and Erik Meijer – Dynamic Programming in Visual Basic |
028 | 019174 | Andy Wilson – First look at MSR’s "touch light" |
024 | 056393 | Steve Swartz – Talking about SOA |
016 | 069437 | Office Communicator |
010 | 268480 | Special Holiday Episode IV: Don Box and Chris Anderson |
010 | 256597 | WCF Ships, Doug Purdy Dances, and Don Box Sings |
009 | 252457 | A Chat and Demo about LINQ with Wee Hyong (Singapore MVP SQL) |
009 | 208891 | What’s Microsoft Speech Server (Beta)? |
008 | 039280 | Herb Sutter – The future of Visual C++, Part I |
008 | 010189 | Anders Hejlsberg – What’s so great about generics? |
006 | 273697 | Anders Hejlsberg, Herb Sutter, Erik Meijer, Brian Beckman: Software Composability and the Future of Languages |
006 | 272229 | Ulrik Molgaard Honoré: Production planning with Dynamics AX |
006 | 229585 | Programming in the Age of Concurrency: The Accelerator Project |
006 | 159231 | Office 12 – Word to PDF File Translation |
005 | 267098 | The Best XNA Movie in the UNIVERSE |
005 | 221610 | Shankar Vaidyanathan – VC++ IDE: Past, Present and Future |
004 | 274865 | Scott Hanselman & Jeffrey Snover Discuss Windows PowerShell |
004 | 009894 | Building a Picture Frame with Windows CE 5.0 – Step 1 |
003 | 274069 | Brad Abrams on AJAX for ISVs |
003 | 266221 | MultiPoint: What. How. Why. |
003 | 248575 | Software Security at Microsoft: ACE Team Tour, Part 2 |
003 | 246477 | Exploring the new Domain-Specific Language (DSL) Tools with Stuart Kent |
003 | 237142 | VSTO 2005 Second Edition Beta: Martin Sawicki |
003 | 029505 | Gabriel Torok – Protecting .NET applications through obfuscation |
002 | 271257 | Adam Carter and Mike Adams on Managed Services |
002 | 269462 | Tara Roth: Not your father’s world of Software Test |
002 | 265667 | Revisiting WiMo – The Windows Mobile Robot |
002 | 263358 | Joe Stegman talks about the "WPF/E" CTP |
002 | 013653 | Jason Flaks – What is Windows Media Connect? |
001 | 274644 | Beam me over, Scotty: Introducing Transporter Suite |
001 | 274641 | Sharepoint Templates: What. How. Why. |
001 | 273337 | New Vista GUI Stuff For Devs |
001 | 273061 | Mike Barrett: Testing and Deploying IPV6 |
001 | 270453 | Technology Roundtable #1 |
001 | 267604 | UK Community: DeveloperDeveloper Day |
001 | 263442 | Expression – Part One: The Overview |
001 | 232481 | WPF Chart Control (from the perspective of summer interns) |
000 | 273120 | Ask The Experts! : Anders Hejlsberg |
000 | 271378 | Ask The Experts! : KD Hallman |
000 | 264874 | Rob Short: Operating System Evolution |
000 | 263902 | Windows 2000 to Windows Vista: Road to Compatibility |
000 | 238608 | Windows Vista: Ready for ReadyDrive |
One thought on “Divergent citation-indexing paths”