Talking with Kingsley Idehen about mastering your own search index

Kingsley Idehen’s vision of a web of linked data long predates the recognition I accorded him in 2003. He’s seen the big picture for a very long time, and has been driving toward it consistently. Over the years we’ve had conversations in which I’ve always wound up saying: “Yes, OK, but how will we get people to create this web of linked data that we want to navigate and query?”

On this week’s Innovators show he responds with what I find to be a plausible scenario. Every business, and increasingly every person, presents some kind of home page to the world. On those pages you will find, implied but not clearly stated, one or both of the following kinds of assertions:

1. Things I offer.

2. Things I seek.

A plumber, for example, may offer hydronic heating services, and may seek an assistant with certain qualifications. By encoding these kinds of assertions as subject-verb-object triples we could, in theory, build a semantic web that matches seekers and finders more efficiently than the current searchable web can. But that first step was always doozy. Writing the assertions required an XML syntax which has never become a web mainstay.

There are other ways to write them, however. Using an approach called RDFa, you can embed them directly into human-readable web pages. This isn’t a new idea. A decade ago, in my book Practical Internet Groupware, I showed how CSS class attributes could do double duty within a web page, governing style while also conveying meaning. In 2003 I was still experimenting with the idea, which I then called microcontent. Nowadays the term is microformats.

Although we’ve heard plenty about this idea over the years, it has yet to bear fruit. I don’t know that it will, but the scenario Kingsley Idehen outlines strikes me as plausible because, as Dries Buytaert evocatively says, structured data is the new search engine optimization. Formerly of concern only to publishers, the rationale for search engine optimization is now becoming evident to everyone who writes an About page for their businesses or — what often comes to the same thing — for themselves.

The formula for an About page is well known: name, address, services offered, hours of operation, etc. Everyone writes this stuff once for the About page, and then again in countless variations for inclusion in various directories. Kingsley and I both hope that the time is now ripe for a web-friendly way to write this data into About pages once, for common use by human visitors, search crawlers, and syndicated directories.

His proposal relies on RDFa to encode factual assertions, and on an e-commerce ontology called GoodRelations which, as its creator Martin Hepp says, provides the vocabulary to say things like:

  • a particular Web site describes an offer to sell cellphones of a certain make and model at a certain price,
  • a pianohouse offers maintenance for pianos that weigh less than 150 kg,
  • a car rental company leases out cars of a certain make and model from a particular set of branches across the country.

The GoodRelations wiki shows cookbook examples for Yahoo and Google. You’d have to be fairly technical to adapt these using cut-and-paste, but there’s also a form that, although currently still wired to emit the older RDF/XML kinds of assertions, will soon also emit RDFa that can be woven into existing About pages.

To navigate and query a web of linked data you need, obviously, mechanisms by which to do the navigation and the querying. That’s never been the problem. Technologists love to figure such things out. But we’ve spectacularly failed to help people create that web of linked data in the first place. I don’t know if the approach Kingsley Idehen sketches in this week’s podcast will succeed. But it feels right, and I love his tagline: “Be the master of your own index.”