Speaking and writing webscale identifiers

17 Sep 200926 Mar 2010 ~ Jon Udell

I’ve really enjoyed the conversation about webscale identifiers. Naming web resources is such a crucial discipline, and yet one we’re all still making up as we go along. I ended the earlier post by suggesting that when we invent namespaces we should, where feasible, prefer names that make sense to people. In comments, a number of folks who have wrestled with the problem of ambiguity pointed out all sorts of reasons why that often just isn’t feasible.

Gavin Bell likes Amazon’s hybrid approach:

The model that Amazon have since moved to with a unique URL identifier and an ignored pretty human readable section is a good compromise.

Michael Smethurst agreed with me that the BBC’s opaque IDs — for example, b006qpgr for The Archers — could be promoted as a tag vocabulary that people would be encouraged to use:

Shownar is a prototype by Schulze and Webb that aims to track “buzz” around bbc programmes. For now it’s based on inbound links from blogs/twitter/etc but it could be expanded to use machine tags!?!

On Shownar, I find that this episode of Miss Marple was discussed in this blog entry:

BBC Radio have just started an Agatha Christie season and a whole host of programmes about the Queen of Crime are available to UK listeners on the iPlayer.

They include dramatizations of works starring super sleuths from Miss Marple to the Mysterious Mr Quin, as well as revealing documentaries.

The entry uses URLs that embed these BBC ids: b00mk71d, b007jvht. How did the author find them? Clearly, in this case, by way of the search URL which is also cited in the entry:

http://www.bbc.co.uk/iplayer/search/?q=agatha christie

The search term agatha christie is wildly ambiguous, of course. Shownar would never have included this item had it not cited specific BBC shows by way of their opaque IDs. Nor would the author have cited them if that had required typing b00mk71d or b007jvht. It only works thanks to copy/paste, but it works quite nicely, and it shows why site-specific search still matters in an era of uber search engines.

This example got me thinking about the character strings that we can and do type, easily and naturally, versus those we can’t and won’t. For example:

queries (what we can and do type)	results (what we can’t and don’t type)
http://www.librarything.com/catalog/jonudell&deepsearch= `practical internet groupware`	http://www.librarything.com/work/`16804` http://www.librarything.com/work/16804/book/`28447984`
http://www.google.com/search?q= `practical internet groupware`	http://oreilly.com/catalog/`9781565925373` http://oreilly.com/catalog/`pracintgr`
http://www.bing.com/results.aspx?q= `practical internet groupware`	http://www.amazon.com/Practical-Internet-Groupware-Jon-Udell/dp/`156592537` http://my.safaribooksonline.com/`1565925378`
http://www.worldcat.org/search?q= `practical internet groupware`	http://www.worldcat.org/oclc/`43188074`
http://www.amazon.com/s?index=blended&field-keywords= `practical internet groupware`	http://www.amazon.com/Practical-Internet-Groupware-Jon-Udell/dp/`1565925378`

Looking at the consistency on the left column, and the variation on the right, I’ve got to conclude that:

Practical Internet Groupware is the de facto webscale identifier for my book.
16804, 28447984, 9781565925373, pracintgr, 156592537, 1565925378, and 43188074 will never converge.

I’ve long imagined a class of equivalence services that would help us bridge the gap between vocabularies we can speak and write and those we’ll never speak and need help to write.

Both are sets of webscale identifiers that we’ll need to use in complementary ways. That’ll require a mix of social conventions and technical services.

Published by Jon Udell

View all posts by Jon Udell

10 thoughts on “Speaking and writing webscale identifiers”

Tony Hirst says:

18 Sep 2009 at 7:42 am

I haven’t explored what’s ben going on with BBC URIs, but there is some naming starting to appear…

eg http://www.bbc.co.uk/iplayer/episode/b00fl9sw/A_History_of_Scotland_Hammers_of_the_Scots/

Loading...

Reply
Jon Udell says:

18 Sep 2009 at 8:13 am

> but there is some naming starting to appear

Yes. As Gavin Bell noted, it’s the Amazon approach which combines an opaque ID and readable slug.

So far, the id is promoted only for developers:

/programmes/:groupPID/episodes/upcoming
/programmes/:groupPID/episodes/upcoming/debut
/programmes/:groupPID/episodes/player

“To access these add .xml, .json or .yaml to the end of the url.”

However it can be used to find a cluster of related things:

http://elmcity.info/doublesearch/?q=b00fl9sw

And the ID finds more things than the name:

http://elmcity.info/doublesearch/?q=A_History_of_Scotland_Hammers_of_the_Scots

Interestingly, neither finds anything on the BBC site:

http://search.bbc.co.uk/search?go=homepage&scope=all&q=A_History_of_Scotland_Hammers_of_the_Scots&Search=Search

http://search.bbc.co.uk/search?go=homepage&scope=all&q=b00fl9sw&Search=Search

Loading...

Reply
Mohan Arun L says:

18 Sep 2009 at 9:41 am

Facebook has been using profileIDs (long digits of numbers) to identify people, so when you sign up for FB by default you end up with a long URL like
facebook.com/profile.php?id=12345678, which obviously you cant pass around to other people or even remember or recall it.
Then you had to go into settings and do somethings to get an url like
facebook.com/myname
On the other hand myspace has always been providing urls of the form
myspace.com/myname

Loading...

Reply
Andrew Gilmartin says:

24 Sep 2009 at 12:47 pm

Adding content hints to identifiers degrades into the case where you have several variations of the identifiers in the wild. For automation to help us it needs to which identifiers are for the same thing. If all identifiers used the same syntax then it would be possible to automatically do this. Using the same syntax does not seem likely to happen given the number of global identification systems we already have, eg Handle (http://www.handle.net/rfc/rfc3650.html), DOI, URL, ISSN, ISBN, LOC, etc…

The core of the problem is not coming up with another identification system but coming up with a identification relationship system. This would address not only the same thing with multiple identifiers but also the relationships of parts to whole and variations, such as revisions or translations. Let’s work on that for a while (independent of the larger semantic-web stuff).

Loading...

Reply
Jon Udell says:

28 Sep 2009 at 9:35 pm

> The core of the problem is not coming up
> with another identification system but
> coming up with a identification relationship
> system.

I agree. And that is (not coincidentally) why I’ve been talking to Kingsley Idehen and Stefano Mazzocchi about that:

http://blog.jonudell.net/2009/09/28/talking-with-stefano-mazzocchi-about-reconciling-web-naming-systems/

http://blog.jonudell.net/2009/09/09/talking-with-kingsley-idehen-about-mastering-your-own-search-index/

Loading...

Reply
JXL87 says:

22 Oct 2009 at 1:23 pm

Thanks for the link to the long one. ,

Loading...

Reply
Andrew Gilmartin says:

22 Oct 2009 at 2:13 pm

John, would you please change “semitic-web” to “semantic-web”. I am sure there is much groundbreaking scholarship being done regards the semitic-web but I really wanted to reference the semantic-web work. The other typos can stay.

Loading...

Reply
Jon Udell says:

22 Oct 2009 at 4:52 pm

groundbreaking scholarship being done regards the semitic-web

[Chuckle] OK, done.

Loading...

Reply
Pingback: Where is the money going? « Jon Udell
Pingback: Fear not, book lovers. The future of marginalia is bright! « Jon Udell

Speaking and writing webscale identifiers

Like this:

Published by Jon Udell

10 thoughts on “Speaking and writing webscale identifiers”

Leave a ReplyCancel reply

Share this:

Like this:

Published by Jon Udell

10 thoughts on “Speaking and writing webscale identifiers”

Leave a ReplyCancel reply

Discover more from Jon Udell