For this week’s Perspectives show I spoke with Scott Prevost, general manager and product director for Powerset, the semantic search engine that was recently acquired by Microsoft, and that can currently be seen in action working with the combined contents of Wikipedia and Freebase.
In our interview, Scott discusses the natural language engine — 30 years in the making — that Powerset acquired from PARC (formerly Xerox PARC). But he also makes clear that the use of that engine is part of a blended strategy that also takes advantage of statistical and machine learning techniques.
If you try Powerset, you find that your mileage will vary depending on a lot of factors. It’s clearly a work in progress, as all of natural language technology has been since, really, the dawn of computing. But the approach that Scott describes here sounds like a flexible and pragmatic way to leverage the technology as it continues to evolve.
Here’s one evocative use of Powerset:
The first result, Dreams from My Father, comes from Freebase, where that book is one of two items in the Works Written slot of Obama’s Person record. In this case, there’s no need to discover structure, Freebase has already encoded it. But the natural language technology is being used in a complementary way, to map between a natural form of the question and the corresponding Freebase query.
To see a glimpse of what Powerset’s linguistic analysis of Wikipedia can do, try this query:
Here, Powerset uses its semantic representation of my Wikipedia page to extract two “Factz” based on one of the linguistic patterns it uses. In this case, the pattern is subject / verb / object, and two Factz are adduced. One is bogus:
udell authored advisor
And the other is valid:
udell authored Practical Internet Groupware
There isn’t much in Wikipedia about me, but if you pick a more notable person — say, Tim Bray — the list of Factz includes:
chaired Atompub Working group
Missing from this list, by the way, is:
live-engineered Electric Eel Shock
OK, I’m just kidding about that, Electric Eel Shock’s live engineer was another Tim Bray, which points out the need — as Scott and I discussed briefly — for name/entity recognition and disambiguation.
I’ve always been fascinated by the ongoing effort to understand and produce natural language using computers and software. Fifty years ago, early computer scientists thought they’d lick the problem in five years. Now many people believe it may never happen. I think it will, but gradually over a long time. And as Scott Prevost points out, it’s just one tool in the kit, and should be used appropriately, in concert with other tools.