Divergent citation-indexing paths

As I mentioned the other day, it’d be useful to have audio-only versions of many of the Channel 9 videos for folks who (like me) have more time to listen away from the computer than to watch at the computer. Pretty soon that’ll start happening for new stuff. Then, of course, there’s the archive. One proposal was to sift out the most popular videos and convert them first. But what does “most popular” mean? It depends.

Pageviews and downloads are one way to measure, and of course the sites have those stats. Citations are another. I’m always interested to see how frequently things are being cited in blog postings and shared bookmark systems. The visualization I did here, which tracks citations of the ACLU Pizza fictional screencast as seen through the lenses of del.icio.us and Bloglines, is a nice example.

So I scooped up the video URLs mentioned in this RSS feed and ran the same sort of analysis. The Bloglines results were fine, but the del.icio.us results were wonky. Eventually I found the problem. Del.icio.us, unlike Bloglines, treats the URLs that you feed to its citation counter in a case-sensitive way. And there are multiple spellings of the “same” URL pattern on Channel 9, including:

channel9.msdn.com/ShowPost.aspx?PostId=
channel9.msdn.com/Showpost.aspx?postid=
channel9.msdn.com/ShowPost.aspx?PostID=

Because del.icio.us citations attach to particular spellings, you’d need to query using each of them to get the whole story.

This case-sensitivity is one of the subtler forms of a much more general problem. Content management systems quite commonly provide different URLs for the same resource, and each of those is an invitation for citation indexing systems to wander down divergent paths.

For a long time I’ve thought that strong URL discipline was the best way to avoid this problem. But there are other approaches. In the academic world, where citations are taken very seriously, digital object identifiers play a much more important role than they do on the web. We web folk could learn a thing or two from our academic cousins, and that’s one reason why I’ll be interviewing Nature.com’s Tony Hammond for an upcoming podcast.

If you’re curious, by the way, here’s the data from Bloglines. I don’t have the del.icio.us data yet because I managed to get myself blocked from the server while futzing around — sorry Josh, I’ll query more slowly next time.

bloglines
citations
show title
158 169962 Otto Berkes – Origami’s Architect gives first look at Ultramobile PCs
134 151853 Robert Fripp – Behind the scenes at Windows Vista recording session
090 271984 Scott Guthrie – MIX07, Work, and Personal Details Revealed
057 270965 Windows Home Server
043 261254 Looking at XNA – Part Two
034 116347 Steve Ball – Learning about Audio in Windows Vista
029 116702 Paul Vick and Erik Meijer – Dynamic Programming in Visual Basic
028 019174 Andy Wilson – First look at MSR’s "touch light"
024 056393 Steve Swartz – Talking about SOA
016 069437 Office Communicator
010 268480 Special Holiday Episode IV: Don Box and Chris Anderson
010 256597 WCF Ships, Doug Purdy Dances, and Don Box Sings
009 252457 A Chat and Demo about LINQ with Wee Hyong (Singapore MVP SQL)
009 208891 What’s Microsoft Speech Server (Beta)?
008 039280 Herb Sutter – The future of Visual C++, Part I
008 010189 Anders Hejlsberg – What’s so great about generics?
006 273697 Anders Hejlsberg, Herb Sutter, Erik Meijer, Brian Beckman: Software Composability and the Future of Languages
006 272229 Ulrik Molgaard Honoré: Production planning with Dynamics AX
006 229585 Programming in the Age of Concurrency: The Accelerator Project
006 159231 Office 12 – Word to PDF File Translation
005 267098 The Best XNA Movie in the UNIVERSE
005 221610 Shankar Vaidyanathan – VC++ IDE: Past, Present and Future
004 274865 Scott Hanselman & Jeffrey Snover Discuss Windows PowerShell
004 009894 Building a Picture Frame with Windows CE 5.0 – Step 1
003 274069 Brad Abrams on AJAX for ISVs
003 266221 MultiPoint: What. How. Why.
003 248575 Software Security at Microsoft: ACE Team Tour, Part 2
003 246477 Exploring the new Domain-Specific Language (DSL) Tools with Stuart Kent
003 237142 VSTO 2005 Second Edition Beta: Martin Sawicki
003 029505 Gabriel Torok – Protecting .NET applications through obfuscation
002 271257 Adam Carter and Mike Adams on Managed Services
002 269462 Tara Roth: Not your father’s world of Software Test
002 265667 Revisiting WiMo – The Windows Mobile Robot
002 263358 Joe Stegman talks about the "WPF/E" CTP
002 013653 Jason Flaks – What is Windows Media Connect?
001 274644 Beam me over, Scotty: Introducing Transporter Suite
001 274641 Sharepoint Templates: What. How. Why.
001 273337 New Vista GUI Stuff For Devs
001 273061 Mike Barrett: Testing and Deploying IPV6
001 270453 Technology Roundtable #1
001 267604 UK Community: DeveloperDeveloper Day
001 263442 Expression – Part One: The Overview
001 232481 WPF Chart Control (from the perspective of summer interns)
000 273120 Ask The Experts! : Anders Hejlsberg
000 271378 Ask The Experts! : KD Hallman
000 264874 Rob Short: Operating System Evolution
000 263902 Windows 2000 to Windows Vista: Road to Compatibility
000 238608 Windows Vista: Ready for ReadyDrive
Posted in .

One thought on “Divergent citation-indexing paths

Leave a Reply