In my writeup on MIT’s Project Simile, and again in my talk at the CUSEC conference, I lauded an approach to collective information management that respects our actual linguistic nature. People don’t normally create vocabularies by committee. Rather, we absorb, imitate, innovate, and negotiate the vocabularies we use. Simile embraces that reality. It encourages people to name resources in ways that make sense to them, within the context of their tribes. Then it provides ways to map out equivalences among the terms used by different tribes.
This same idea of pluralistic naming and equivalence mapping came up in last week’s Perspectives interview with Quentin Clark: Where is WinFS now? The connection was implicit but it’s worth making explicit. Here’s what Quentin said:
QC: Going through the litany of technologies that have come from WinFS, one of them is the notion of what I refer to as semi-structured records. The schema is not necessarily all that well defined at the outset of the application. How does the database handle that? We had built WinFS around a feature called UDTs [user-defined types], which is a column type — a CLR type system type.
We finished that up, and we built a whole spatial datatype on it in SQL Server 2008, it’s all good stuff.
But when we stepped back and looked at the semi-structured data problem in a larger context, beyond the WinFS requirements, we saw the need to extend the top-level SQL type system in that way. Not just UDTs, but to have arbitrary extensibility.
So we did this feature in SQL Server 2008 that we internally refer to as sparse columns. It’s a combination of various things. First, a large number of columns. Right now there’s a 1024 limit on the number of columns in a single SQL table. We’re way widening that out.
That comes of course with the ability to store data that’s very sparsely populated across a large number of columns. In SQL Server 2005 we actually allocate space for every column in every row, whether it’s filled or not.
JU: This is what the semantic web folks are interested in, right? Having attributes scattered through a sparse matrix?
QC: That’s right. And that leads to another thing which we call column groups, which allow you to clump a few of them together and say, that’s a thing, I’m going to put a moniker on that and treat it as an equivalence class in some dimension.
Given my enduring fascination with del.icio.us as a prime example of social tagging services that enable real people to evolve metadata vocabularies in a natural way, that really got my spidey sense tingling.
3 thoughts on “Semi-structured database records for social tagging”
Jon, this is a fascinating thread for me. I come at this from a different perspective, having spent 30 years in leading to bleeding edge enterprise software but always on the biz side (i.e. could not program to save my life). But what I saw again and again was the biz level frustration with having to define/model upfront what was going to change as soon as you finished that exercise. I remember a banking system based on Pick that did this “arbitrary extensibility” but at a great price in other areas. Bernard