Understanding Wikipedia notability

Some fellow residents of my town have recently noticed, and pointed out to me, that I’m listed in Wikipedia as a notable inhabitant of Keene, NH. They’re more impressed than they should be. All forms of notability are subject to bias, but Internet notability is subject to a different kind of bias than most people realize.

For example, friends and family used to be impressed by the fact that I was the top result in Google for my first name — and then second to Jon Stewart for a long while, until I had to reboot my InfoWorld archive. Why? Just because I’ve projected a large surface area of searchable documents whose titles include the trigram jon.

An example of a far more notable person than me is Glenn Fine, who was in my grade in junior high school and is now Inspector General for the Department of Justice. You won’t find him anywhere near the top of a search for his first name because Inspectors General don’t (yet) project a large surface area of documents onto the web.

To place my newfound Wikipedia notability into a similar context, I wanted to show people how these lists of notable inhabitants are made. I figured the person who made the change is somebody who knows of my work, because I’ve written about it so much online, and who is inclined to edit Wikipedia, which correlates with an interest in my work.

I wanted to illustrate exactly who, when, and how, so I went to Wikipedia with the confident expectation that it would be easy to answer those questions.

Surprisingly, it wasn’t. I guess I haven’t really tried searching revision histories in Wikipedia before, but in this case and a few others I’ve tried lately, it seems quite difficult to pinpoint the author of a change.

For example, on Twitter I asked:

Wikipedia: “The term ‘Web 2.0’ was coined by Darcy DiNucci in 1999.” Added when, by whom? WikiBlame seems an ineffective way to find out.

@bazzargh replied: Robert Gehl. http://bit.ly/46r1a

Thanks. By the way, how’d you do that?

switch to 500 view in history, then rough bisection from oldest. Couple of minutes; used this a lot to find long-lived vandalism.

if older, I progressively back off 2..4..8… pages through this. In this case though, there was a clueful log message!

That’s pretty much what I’ve found myself doing when trying to track down changes, so I was glad to know it wasn’t just me. But this highlights an important point about transparency: It’s all relative.

One of the reasons we think of government as opaque is that while records may be notionally public, it takes time, effort, and skill to visit city hall, dig through them, and find what you’re looking for.

I have always regarded Wikipedia as an extreme counter-example. And that’s true. It is radically transparent. You can ultimately find out exactly how any statement in any article came to be. You may not be able to correlate the author’s pseudonym to a real-world identity, but you can evaluate that author’s corpus and reputation within the context of Wikipedia.

And yet, the ability to do this spelunking requires more time, effort, and skill than most people possess. Although I’m reluctant to deflate my status as a notable inhabitant of Keene, I wish it were easier for people who read that to also find out what it does — and doesn’t — mean.

13 Comments

  1. You could easily write a program to find out who entered a particular phrase on a wiki page. You could not do the same for city hall info.

  2. Jon, it seems the earlier tools you used to create the Heavy Metal Umlaut visualization of wikipedia history are no longer functional.

    http://waxy.org/2005/06/wikipedia_histo/

    pretty sure none of these tools are around any more: Wikipedia Animate, WikiDiff, aniwiki, aniwiki.

    They contributed greatly to increased transparency in Wikipedia.

    It’s an interesting paradox, a lot of the credibility of wikipedia comes from the persistent visibility of high quality results and low likelihood of encountering vandalism.

    However, extreme transparency of the revision history could up-end the dirty laundry cart, if you’ll pardon the mixed metaphor.

    Micah

  3. > Of course, I could track this down myself…

    I had previously tracked it down, but forgot to record the contributor’s IP address.

    How long did it take you, and by what method?

    > Not much there.

    We only have the set of contributions by 98.117.85.23:

    http://en.wikipedia.org/w/index.php?title=Special:Contributions&limit=500&target=98.117.85.23

    Interestingly my edit appears to have been the first from that IP address.

    Entities found in the list of subsequent edits include:

    Qimonda
    David Gilmour
    Polysaccharide
    Robert Pedon
    Oenotrians
    Centriole
    x86 virtualization
    Wilson, North Carolina

    From this list a real-world identity could very likely be triangulated. It’s interesting to think about when and how the friction involved in doing so will make it broadly possible.

    1. > How long did it take you, and by what method?

      Binary search: about 5 minutes as there are less than 500 changes for that article.

  4. > Jon, it seems the earlier tools you used
    > to create the Heavy Metal Umlaut
    > visualization of wikipedia history are
    > no longer functional.

    Actually those tools came later. To make the move I stepped through the revision history capturing frames, made them into a first layer of the movie, then recorded a 2nd layer in which I scanned back and forth over the first layer highlighting and narrating interesting bits.

    But yeah, the tools built in response to Andy Baio’s LazyWeb request seem to have rusted.

    > They contributed greatly to increased
    > transparency in Wikipedia.

    I’m not so sure. I suspect they were rarely used. Partly because not well known. Partly because the necessary mechanism — sucking down the whole revision history — was awkward.

    The concept embodied in the tools, if more gracefully supported in software, would indeed make Wikipedia — and any other revision-aware system — more transparent.

    > However, extreme transparency of the
    > revision history could up-end the dirty
    > laundry cart, if you’ll pardon the mixed
    > metaphor.

    That’s a really good point. It comes up elsewhere in the domain of legislation. We can imagine extremely transparent and accountable visibility into the drafting of bills. That probably wouldn’t be a 100% win. Still worthwhile? My gut says yes but I’d love to see the experiment tried.

  5. > Say, why were you trying to figure out
    > where the DiNucci reference came from?

    I randomly ran across the fact that it had been added to Wikipedia, which reminded me that I’d been noodling on how easy it is (or isn’t) to pinpoint the source of changes.

    It’s funny that we’ve connected in this way.

    What prompted you to make the change?

  6. I’m working on a dissertation on Web 2.0. I’m studying the discourses about it and its material impact on culture. Naturally, I wanted to have a good understanding of the history of the term. That’s the earliest mention of it I could find in print, and since it’s up to users to edit Wiki entries, I figured I had to post it there. Wikipedia’s been really helpful to me and so there’s a lot of incentive to give back, and it doesn’t take long.

  7. More on-topic to your post as a whole – I agree that Wikipedia presents an intriguing model by which changes can be made more transparent. However, Wikipedia is really more like an iceberg now. Since anyone can make changes, the revision histories of entries in W outweigh what’s on the surface. The changes are all there for anyone to review, but who will take the time to pore through pages of minute changes, vandalism, sentence-level edits, and the occasional substantial edit? It’s the same problem we have with states; the information is usually there (somewhere) but who has the time to gather it, sort it, and more importantly contextualize it?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s