Speaking and writing webscale identifiers

I’ve really enjoyed the conversation about webscale identifiers. Naming web resources is such a crucial discipline, and yet one we’re all still making up as we go along. I ended the earlier post by suggesting that when we invent namespaces we should, where feasible, prefer names that make sense to people. In comments, a number of folks who have wrestled with the problem of ambiguity pointed out all sorts of reasons why that often just isn’t feasible.

Gavin Bell likes Amazon’s hybrid approach:

The model that Amazon have since moved to with a unique URL identifier and an ignored pretty human readable section is a good compromise.

Michael Smethurst agreed with me that the BBC’s opaque IDs — for example, b006qpgr for The Archers — could be promoted as a tag vocabulary that people would be encouraged to use:

Shownar is a prototype by Schulze and Webb that aims to track “buzz” around bbc programmes. For now it’s based on inbound links from blogs/twitter/etc but it could be expanded to use machine tags!?!

On Shownar, I find that this episode of Miss Marple was discussed in this blog entry:

BBC Radio have just started an Agatha Christie season and a whole host of programmes about the Queen of Crime are available to UK listeners on the iPlayer.

They include dramatizations of works starring super sleuths from Miss Marple to the Mysterious Mr Quin, as well as revealing documentaries.

The entry uses URLs that embed these BBC ids: b00mk71d, b007jvht. How did the author find them? Clearly, in this case, by way of the search URL which is also cited in the entry:

http://www.bbc.co.uk/iplayer/search/?q=agatha christie

The search term agatha christie is wildly ambiguous, of course. Shownar would never have included this item had it not cited specific BBC shows by way of their opaque IDs. Nor would the author have cited them if that had required typing b00mk71d or b007jvht. It only works thanks to copy/paste, but it works quite nicely, and it shows why site-specific search still matters in an era of uber search engines.

This example got me thinking about the character strings that we can and do type, easily and naturally, versus those we can’t and won’t. For example:

queries (what we can and do type) results (what we can’t and don’t type)
http://www.librarything.com/catalog/jonudell&deepsearch=
practical internet groupware

http://www.librarything.com/work/16804

http://www.librarything.com/work/16804/book/28447984

http://www.google.com/search?q=
practical internet groupware

http://oreilly.com/catalog/9781565925373

http://oreilly.com/catalog/pracintgr

http://www.bing.com/results.aspx?q=
practical internet groupware

http://www.amazon.com/Practical-Internet-Groupware-Jon-Udell/dp/156592537

http://my.safaribooksonline.com/1565925378

http://www.worldcat.org/search?q=
practical internet groupware

http://www.worldcat.org/oclc/43188074

http://www.amazon.com/s?index=blended&field-keywords=
practical internet groupware

http://www.amazon.com/Practical-Internet-Groupware-Jon-Udell/dp/1565925378

 

Looking at the consistency on the left column, and the variation on the right, I’ve got to conclude that:

  1. Practical Internet Groupware is the de facto webscale identifier for my book.

  2. 16804, 28447984, 9781565925373, pracintgr, 156592537, 1565925378, and 43188074 will never converge.

I’ve long imagined a class of equivalence services that would help us bridge the gap between vocabularies we can speak and write and those we’ll never speak and need help to write.

Both are sets of webscale identifiers that we’ll need to use in complementary ways. That’ll require a mix of social conventions and technical services.