A literary appreciation of the Olson/Zoneinfo/tz database

You will probably never need to know about the Olson database, also known as the Zoneinfo or tz database. And were it not for my elmcity project I never would have looked into it. I knew roughly that this bedrock database is a compendium of definitions of the world’s timezones, plus rules for daylight savings transitions (DST), used by many operating systems and programming languages.

I presumed that it was written Unix-style, in some kind of plain-text format, and that’s true. Here, for example, are top-level DST rules for the United States since 1918:

# Rule NAME FROM  TO    IN   ON         AT      SAVE    LETTER/S
Rule   US   1918  1919  Mar  lastSun    2:00    1:00    D
Rule   US   1918  1919  Oct  lastSun    2:00    0       S
Rule   US   1942  only  Feb  9          2:00    1:00    W # War
Rule   US   1945  only  Aug  14         23:00u  1:00    P # Peace
Rule   US   1945  only  Sep  30         2:00    0       S
Rule   US   1967  2006  Oct  lastSun    2:00    0       S
Rule   US   1967  1973  Apr  lastSun    2:00    1:00    D
Rule   US   1974  only  Jan  6          2:00    1:00    D
Rule   US   1975  only  Feb  23         2:00    1:00    D
Rule   US   1976  1986  Apr  lastSun    2:00    1:00    D
Rule   US   1987  2006  Apr  Sun>=1     2:00    1:00    D
Rule   US   2007  max   Mar  Sun>=8     2:00    1:00    D
Rule   US   2007  max   Nov  Sun>=1     2:00    0       S

What I didn’t appreciate, until I finally unzipped and untarred a copy of ftp://elsie.nci.nih.gov/pub/tzdata2009o.tar.gz, is the historical scholarship scribbled in the margins of this remarkable database, or document, or hybrid of the two.

You can see a glimpse of that scholarship in the above example. The most recent two rules define the latest (2007) change to US daylight savings. The spring forward rule says: “On the second Sunday in March, at 2AM, save one hour, and use D to change EST to EDT.” Likewise, on the fast-approaching first Sunday in November, spend one hour and go back to EST.

But look at the rules for Feb 9 1942 and Aug 14 1945. The letters are W and P instead of D and S. And the comments tell us that during that period there were timezones like Eastern War Time (EWT) and Eastern Peace Time (EPT). Arthur David Olson elaborates:

From Arthur David Olson (2000-09-25):

Last night I heard part of a rebroadcast of a 1945 Arch Oboler radio drama. In the introduction, Oboler spoke of “Eastern Peace Time.” An AltaVista search turned up :”When the time is announced over the radio now, it is ‘Eastern Peace Time’ instead of the old familiar ‘Eastern War Time.’ Peace is wonderful.”

 

Most of this Talmudic scholarship comes from founding contributor Arthur David Olson and editor Paul Eggert, both of whose Wikipedia pages, although referenced from the Zoneinfo page, strangely do not exist.

But the Olson/Eggert commentary is also interspersed with many contributions, like this one about the Mount Washington Observatory.

From Dave Cantor (2004-11-02)

Early this summer I had the occasion to visit the Mount Washington Observatory weather station atop (of course!) Mount Washington [, NH]…. One of the staff members said that the station was on Eastern Standard Time and didn’t change their clocks for Daylight Saving … so that their reports will always have times which are 5 hours behind UTC.

 

Since Mount Washington has a climate all its own, I guess it makes sense for it to have its own time as well.

Here’s a glimpse of Alaska’s timezone history:

From Paul Eggert (2001-05-30):

Howse writes that Alaska switched from the Julian to the Gregorian calendar, and from east-of-GMT to west-of-GMT days, when the US bought it from Russia. This was on 1867-10-18, a Friday; the previous day was 1867-10-06 Julian, also a Friday. Include only the time zone part of this transition, ignoring the switch from Julian to Gregorian, since we can’t represent the Julian calendar.

As far as we know, none of the exact locations mentioned below were permanently inhabited in 1867 by anyone using either calendar. (Yakutat was colonized by the Russians in 1799, but the settlement was destroyed in 1805 by a Yakutat-kon war party.) However, there were nearby inhabitants in some cases and for our purposes perhaps it’s best to simply use the official transition.

 

You have to have a sense of humor about this stuff, and Paul Eggert does:

From Paul Eggert (1999-03-31):

Shanks writes that Michigan started using standard time on 1885-09-18, but Howse writes (pp 124-125, referring to Popular Astronomy, 1901-01) that Detroit kept

local time until 1900 when the City Council decreed that clocks should be put back twenty-eight minutes to Central Standard Time. Half the city obeyed, half refused. After considerable debate, the decision was rescinded and the city reverted to Sun time. A derisive offer to erect a sundial in front of the city hall was referred to the Committee on Sewers. Then, in 1905, Central time was adopted by city vote.

 

This story is too entertaining to be false, so go with Howse over Shanks.

 

The document is chock full of these sorts of you-can’t-make-this-stuff-up tales:

From Paul Eggert (2001-03-06), following a tip by Markus Kuhn:

Pam Belluck reported in the New York Times (2001-01-31) that the Indiana Legislature is considering a bill to adopt DST statewide. Her article mentioned Vevay, whose post office observes a different
time zone from Danner’s Hardware across the street.

 

I love this one about the cranky Portuguese prime minister:

Martin Bruckmann (1996-02-29) reports via Peter Ilieve

that Portugal is reverting to 0:00 by not moving its clocks this spring.
The new Prime Minister was fed up with getting up in the dark in the winter.

 

Of course Gaza could hardly fail to exhibit weirdness:

From Ephraim Silverberg (1997-03-04, 1998-03-16, 1998-12-28, 2000-01-17 and 2000-07-25):

According to the Office of the Secretary General of the Ministry of Interior, there is NO set rule for Daylight-Savings/Standard time changes. One thing is entrenched in law, however: that there must be at least 150 days of daylight savings time annually.

 

The rule names for this zone are poignant too:

# Zone  NAME            GMTOFF  RULES   FORMAT  [UNTIL]
Zone    Asia/Gaza       2:17:52 -       LMT     1900 Oct
                        2:00    Zion    EET     1948 May 15
                        2:00 EgyptAsia  EE%sT   1967 Jun  5
                        2:00    Zion    I%sT    1996
                        2:00    Jordan  EE%sT   1999
                        2:00 Palestine  EE%sT

There’s also some wonderful commentary in the various software libraries that embody the Olson database. Here’s Stuart Bishop on why pytz, the Python implementation, supports almost all of the Olson timezones:

As Saudi Arabia gave up trying to cope with their timezone definition, I see no reason to complicate my code further to cope with them. (I understand the intention was to set sunset to 0:00 local time, the start of the Islamic day. In the best case caused the DST offset to change daily and worst case caused the DST offset to change each instant depending on how you interpreted the ruling.)

 

It’s all deliciously absurd. And according to Paul Eggert, Ben Franklin is having the last laugh:

From Paul Eggert (2001-03-06):

Daylight Saving Time was first suggested as a joke by Benjamin Franklin in his whimsical essay “An Economical Project for Diminishing the Cost of Light” published in the Journal de Paris (1784-04-26). Not everyone is happy with the results.

 

So is Olson/Zoneinfo/tz a database or a document? Clearly both. And its synthesis of the two modes is, I would argue, a nice example of literate programming.

More Python and C# idioms: Finding the difference between two lists

Recently I’ve posted two examples[1, 2] of Python idioms alongside corresponding C# idioms. It always intrigues me to look at the same concept through different lenses, and it seems to intrigue others as well, so here’s a third installment.

Today’s example comes from a real scenario. I’ve recently added a feature to the elmcity service that enables curators to control their hubs by sending Twitter direct messages to the service. One method, GetDirectMessagesFromTwitter, calls the Twitter API and returns a list of direct messages sent to the elmcity service. Another method, GetDirectMessagesFromAzure, calls the Azure table storage API and returns a list of direct messages stored there. The difference between the two lists — if any — represents new messages to be processed.

Here’s one take on Python and C# idioms for finding the difference between two lists:

Python C#
fetched_messages = 
  GetDirectMessagesFromTwitter();
stored_messages = 
  GetDirectMessagesFromAzure();
diff = set(fetched_messages) - 
  set(stored_messages)
return list(diff)
var fetched_messages = 
  GetDirectMessagesFromTwitter();
var stored_messages = 
  GetDirectMessagesFromAzure();
var diff = fetched_messages.Except(
  stored_messages);
return diff.ToList();

I can’t decide which one I prefer. Python’s set arithmetic is mathematically pure. But C#’s noun-verb syntax is appealing too. Which do you prefer? And why?


PS: The Python example above is slightly concocted. It won’t work as shown here because I’m modeling Twitter direct messages as .NET objects. IronPython can use those objects, but the set subtraction fails because the objects returned from the two API calls aren’t directly comparable.

A real working example would add something like this:

fetched_message_sigs = [x.text+x.datetime for x in fetched_messages]
stored_message_sigs = [x.text+x.datetime for x in stored_messages]
diff = list(set(fetched_message_sigs) - set(stored_message_sigs))

But that’s a detail that would only obscure the side-by-side comparison I’m making here.

To: elmcity, From: @curator, Message: start

Because I am lazy, curious, and evangelical, the elmcity service works in an unusual way. Anything that I can delegate to other services I do. So when curators add feeds to hubs, or modify the behavior of hubs, they do it by bookmarking and tagging URLs at delicious.com. It would be foolish to only keep that registry and configuration data in delicious, so I don’t, I persist it to Azure tables. But for now, I’m delegating the data entry interface to delicious.

It’s a lazy approach, in the good sense of lazy. I don’t want to build my own data entry system unless I can add important value, and in this case I can’t.

I’m also curious to see how far this approach can take us. As the project has evolved, so has the tag vocabulary spoken between curators and the service. It’s an easy and natural process, and I don’t see any roadblocks ahead.

Finally, I’m evangelizing this way of doing things because I continue to think that more people should appreciate it.

In this scenario I’ve delegated something else to delicious: authentication. My service doesn’t have its own user accounts. Instead, as the administrator of the service, I tell it to trust a specific set of delicious accounts. When one of those accounts bookmarks an iCalendar URL, and tags it in a particular way, the service regards that as an authenticated request to add the feed to that hub’s registry.

Other requests that curators can make include:

Make the radius for my hub 5 miles.

Make my timezone Arizona.

Get my CSS file from this URL.

But here’s one that curators have wanted to make and couldn’t:

I just added a feed or changed a configuration option. Please reprocess my hub ASAP.

We could represent this message with a tag. Or we could use the rudimentary messaging system in delicious. But these approaches seemed awkward, and I rejected them.

Well, why not Twitter? True, it means that curators who want to send messages to the service will now need accounts in two places. But if they don’t already have accounts on both delicious and Twitter, they can create them. And those accounts will serve them in a variety of ways, unlike a single-purpose account on elmcity.

So, it’s done. As the curator for Keene, I’ve added the tag twitter=judell to the delicious account that controls the Keene hub. As the elmcity service periodically scans its designated set of delicious accounts, it follows any Twitter handle it isn’t already following. Those Twitter accounts can then send direct messages to the Twitter account of the elmcity service.

For now there’s only one thing a curator can say to the service in a direct message — “start” — which means “please reprocess my hub ASAP.” But I’m sure the control vocabulary will evolve. And of course the service can use the channel to send notifications back to curators.

Twitter is famously unreliable, but that should be OK for my purposes. We’re not controlling the space shuttle. If a message doesn’t get through to the service on the first or second try, it’ll get through eventually, and that’ll be good enough.

Someday I may have to build a data entry system and an accounts system. Then again, maybe not. Meanwhile I’m going to keep exploring this lightweight approach. It’s effective and, not coincidentally, it’s fun.

Restructuring expert attention to revive the lost art of personal customer service

Instead of mourning the lost art of personal customer service, I would rather celebrate examples that show it’s still possible. Yesterday I found two gems.

First, Southwest Airlines. I had booked a round-trip flight and then needed to change to one-way. You can’t do that online. So I clenched my jaw, called customer service, and prepared for the long wait.

Instead, this:

IVR: “Would you like us to call you back in about 20 minutes?”

Me: “Why…yes! Beep, beep, beep, beep, beep, beep, beep, #.”

My jaw relaxed.

Twenty or so minutes later, an agent called back and we made the change. Now the unclenched jaw morphed into a smile.

Second, FindTape.com. I’m making interior storm windows and I need double-stick tape for the project. Which, sure, you can buy online. But the smorgasbord of choices is paralyzing. I wasted a half-hour trying to figure out which product would best suit my unusual application and made no progress whatsoever.

Then, at FindTape.com, I read this:

If you have a specific question related to which tape would work best in your application please fill out and submit the following fields so that we can have an appropriate representative get back in contact with you.

A fellow named Kevin wrote back, we’ve have been discussing my options, and now I’m ready to buy.

Both examples remind me of Michael Nielsen’s luminous phrase: the restructuring of expert attention. He coined it to define a new era of scientific collaboration, but it applies more broadly.

We’ve been told that companies can’t afford to focus expert attention on customers. The truth, of course, is that they can’t afford not to.

For a generation and more we’ve driven a wedge between people who have expertise with products and services and people who need that expertise. How’s that working for you? Me neither.

It’s true that expert attention is a scarce resource. But we’re living through a Cambrian explosion of awareness networks and communication modes. Used adroitly, they can optimize the allocation of that scarce resource. Which is a fancy way of saying: Maybe personal customer service isn’t a lost art after all.

Allman Brothers, Oct 14: Huntington or Nashville? A parable about syndication and provenance.

Yesterday Bill Rawlinson, the elmcity curator for Huntington, WV, noticed something odd about an event that showed up on Eventful.com:

Here’s the example: http://eventful.com/huntington/events/allman-brothers-/E0-001-020736056-0. It appears the Allman Brothers were in concert today, but I’m pretty sure they weren’t.

I’m pretty sure they weren’t either. At AllmanBrothersBand.com it says they were in Nashville on October 14. But if that’s true, Eventful isn’t the only site that got it wrong date. So, apparently, did a number of event-gathering and ticket-selling sites. Here are couple of examples I found.

In cases like these it’s hard to nail down the provenance of a “fact” such as Allman Brothers, Huntington WV, October 14 2009. There is clearly syndication going on, but who’s upstream and who’s downstream? How is the network of feeds interconnected? Which is the authoritative source?

I know what the answer to all these questions should be. The Allman Brothers themselves should be the authoritative source, and everyone else should syndicate from them.

If AllmanBrothersBand.com published its schedule as calendar data rather than as calendarish web pages, the organization could control the data. Was there originally a concert planned for Huntington on the 14th? I don’t know, but say for the sake of argument there was. The Allman Brothers calendarish web page cannot effectively propagate a change of plan.

An iCalendar feed, on the other hand, could. But calendarish web page are almost never alternately available as machine-readable iCalendar data that can reliably syndicate.

Looking under the covers, I see that AllmanBrothersBand.com is a PostNuke site. Are there calendar modules for PostNuke that export iCalendar? None of the ones that I found seem to.

Why don’t more content management systems make event information available as useful data? Why do they instead advertise things like XHTML compliance and not-very-useful RSS feeds? Because, chicken-and-egg, nobody ever seems to expect an iCalendar feed.

If we can change that expectation, a nice chunk of the real-world semantic web will fall into place. And it won’t require RDFa or SPARQL or ontologies. Just good old RFC2445, right under our noses the whole time, if only we would open our eyes and look.

Talking with Daniel Debow about using Rypple to open the Johari Window

On this week’s Innovators show, with Daniel Debow of Rypple, I learned about a cognitive psychological tool called the Johari Window. Rypple focuses on the quadrant of the Johari Window at the intersection of “known to others” and “not known to self” — the so-called blind area. The company is dedicated to the proposition that if we can become more aware of what others know about us that we don’t, we can improve ourselves along various axes: personal, social, and — critically for Rypple’s business model — professionally.

How do you gain that awareness? By asking questions like:

Am I giving sufficiently clear guidance?

or

Do I interrupt people too often?

You direct these questions to a set of people whose feedback you value. Rypple anonymizes their responses and, to the extent you buy into the service, provides a progressively capable framework within which to continue the dialogue. This is a great idea, and one of the very few appropriate uses for online anonymity that I can imagine.

Rypple, as a company, lives at the intersection of a couple of key trends. Social media, obviously, but also the services ecosystem. As we discuss in the podcast, corporate HR has historically been a monolith that expects 100% compliance with its systems. But people, as we know, differ emotionally and cognitively. We should be able to use a variety of methods to manage and evaluate people, and help them manage and evaluate themselves. Software delivered as a service is an enabler of that possibility.

Here’s a twist: A company won’t have access to the feedback that employees solicit using Rypple. Daniel Debow says that HR folks, well aware of mainstream social software, are ready to embrace this model. I hope he’s right.

His favorite recent story about Rypple goes like this:

At an HR conference I talked to the CEO of a company that uses Rypple. He’s excited about what we’re doing, but he said: “You have a real problem. Use of your system might make your system obselete. We’ve been using it for a while now, and I’ve noticed that people are much more willing to give me feedback face-to-face, they’re willing to talk to me.”

Well that’s the furthest thing from a problem I can imagine. It’s like saying to Facebook, you’ve got a problem, people keep meeting on Facebook and then meeting up in person and creating real relationships offline.

Actually that would be problem for Facebook. But Rypple isn’t about pageviews, it’s about helping people improve. Which seems like a great idea to me.

You can, by the way, use Rypple not only to solicit anonymized feedback from a chosen set of responders, but also from an open-ended set. So here’s my question:

How can I make my ideas more accessible and more actionable?

I’m asking a chosen set too, but if you can perceive my blind spot I’d love to know what you see there.

More visualization of Nobel Peace Prize winners in Freebase

To sharpen the point I made the other day about the eroding bias toward giving the Nobel Peace Prize to Americans and Europeans, here’s a comparison of the nationalities of winners before and after 1960.

1901-2009 nobel peace prize winners by nationality
before 1960 after 1960

Here’s another point I forgot to mention. There are gaps in timeline for the Nobel Peace Prize, because it wasn’t awarded in 1914-1918, 1923, 1924, 1928, 1932, 1939-1943, 1948, 1955-1956, 1966-1967 and 1972. The timeline shows those gaps concisely:

As in the earlier examples, you can do this with point-and-click filtering in Freebase, no query-writing required. Which is awesome.

Finally, Stefano Mazzocchi offers a clarification of a point that came up in our recent interview:

I made it sound like Freebase loaded directly IMDB data while what I should have specified is that we loaded the IMDB ‘identifiers’ along with our movie data.

Thanks Stefano. And, kudos to the Metaweb team!

Recovering forgotten methods of construction

After feasting on audio podcasts for years, I realized that I don’t always want somebody else’s voice in my head while running, biking, and hiking. So I went on an audio fast for a couple of months. But now I’m ready for more input, and I’m once again reminded how wonderful it is to be able to bring engaging minds with me on my outdoor excursions.

One of my companions on yesterday’s hike was John Ochsendorf, a historian and structural engineer who explores the relevance of ancient and sometimes forgotten construction methods, like Incan suspension bridges woven from grass. One of his passions is Guastavino tile vaulting, a system that was patented in 1885. Although widely used in many notable structures — including Grand Central Station — Ochsendorf says that some of these structures have been torn down and rebuilt conventionally because modern engineers no longer understand how the Guastavino system works, and cannot evaluate its integrity.

This theme of forgotten knowledge echoes something I heard in Amory Lovins’ epic MAP/Ming lecture series. He describes a large government building in Washington, DC, that was made of stone and cooled by a carefully-designed pattern of air flow. The cooling system wasn’t completely passive, though. You had to open and close windows in a particular sequence throughout the day. Now that building is cooled by hundreds of window-mounted air conditioners. I’m sure our modernn expectation of extreme cooling is part of the reason why. But Lovins also says that air conditioning became necessary because people forgot how to operate the building.

I love the idea of recovering — and scientifically validating — forgotten knowledge. That’s what John Ochsendorf’s research group does. One of his students, Joe Dahmen, did a project called Rammed Earth — a long-term experiment to see if that ancient construction method could actually work in present-day New England. John Ochsendorf says:

Historical methods of construction that are very green, very local, may create beautiful low-energy architecture, we’ve forgotten how to do them. So we have to rediscover them, and do testing to prove to clients and building owners that you can use these methods. And it’s a good example of MIT’s motto of mind and hand. We don’t like to just read about rammed earth walls, we like to get dirty and build them.

Very cool. I think the MacArthur Foundation invested wisely in this guy.

Visualizing Nobel Peace Prize winners in Freebase

When I watched Barack Obama accept the Nobel Peace Prize, I thought about how the world has changed since the inception of the prize, and how it will continue to change. Since the winners of the Prize are themselves a reflection of what’s changing, I thought I’d try using Freebase to visualize them over the century the Prize has existed.

What you can find out, with Freebase, depends on its coverage of the topics you’re asking about. So realize that what I’ll show here is possible because Nobel Peace Prize winners are a well-covered topic. Still, it’s wildly impressive.

The Nobel site tells us that 89 Nobel Peace Prizes have been awarded since 1901. I haven’t been able to reproduce that number in Freebase because there are multiple winners in a few years, and I haven’t found a way to group results by year. But for my purposes this related query is good enough:

That number, 100, isn’t as closely related to 89 as you might think. It’s less by the number of years no award was given, but more by the number of recipients in multiple-award years. Perhaps a Freebase guru can show us how to measure those uncertainties, but I’ve eyeballed them and I don’t think they invalidate my results.

How did I wind up querying the topic /award/award_winner? It wasn’t immediately obvious. I spent a while searching and then exploring the facets that emerged, including:

The crazy thing about Freebase is that, in a way, it doesn’t matter where you start. Everything’s connected to everything, so you can pick up any node of the graph and re-dangle the rest.

Except when you can’t. I haven’t yet gotten a good feel for which paths to prefer and why.

But in the end I came up with the kind of results I’d envisioned:

1901-2009 nobel peace prize winners by gender
male female

1901-2009 nobel peace prize winners by nationality
male female

Taken together they show a couple of trends. First, of course, we see most female winners after about 1960. Second, we see a more even geographic distribution of female winners because, prior to 1960, most winners were not only male but also American or European.

These results didn’t surprise me. What did is the relative ease with which I was able to discover and document them. I thought it would be necessary to write MQL queries in order to do this kind of analysis. I’d previously done a bit of work with MQL, and dug further into it this time around.

But in the end I found that it was just as effective to use interactive filtering. Now to be clear, getting the software to actually do the things I’ve shown here wasn’t a cakewalk. I had to develop a feel for the web of topics in the domain I chose. And it’s painfully slow to add and drop filters.

But still, it’s doable. And you can do it yourself by pointing and clicking. That is an astonishing tour de force, and a glimpse of what things will be like when we can all fluently visualize information about our world.

Magic glasses and magic projectors: Private versus public augmentation of experience

At its core, your browser is powered by an engine called the Document Object Model, hereafter DOM. You can think of the DOM as an outline, and the browser as an outline processor that shows and hides things, displays things in different ways, and even adds, removes, or rearranges things. Nowadays what you see, when you view a web page, is the result of a complex interaction between data and code. The data is the HTML content of the page, and the code is its JavaScript behavior. But these are slippery terms. A lot of content never originates as HTML, but is instead produced dynamically — by a web server, but also quite possibly in the browser as it manipulates the DOM. And a lot of behavior happens opportunistically in response to content on the page.

This arrangement has radical implications. For example, back in 2002 I invented LibraryLookup, a bookmarklet that noticed when you were visiting an Amazon or Barnes and Noble book page and offered a one-click search for that book in your local library. A few years later, a Firefox extension called Greasemonkey arrived on the scene. It offered two capabilities that, working together, enabled a zero-click LibraryLookup. First, it could call out to a web service. Second, it could modify the DOM based on the response. Putting these two things together, I wrote a script that would notice that you were visiting an Amazon book page, check to see if the book was available at your local library, and if so, insert a paragraph into the DOM that said: “Hey, it’s available at the [YOUR LIBRARY NAME] library!”

Is this kosher? I think so, but it’s a tricky question. At the time I made a short screencast that reflected on questions of ownership and fair use in an environment that’s designed and built to support intermediation and remixing. These questions were still largely hypothetical, though, because Firefox users who had also installed Greasemonkey were a very small number indeed.

But now, thanks to modern browser-independent JavaScript libraries like jQuery, those hypothetical questions are becoming very real. Here’s Phil Windley demonstrating his 2009 version of LibraryLookup:

The example comes from Phil’s recent essay The Forgotten Edge: Building a Purpose-Centric Web, which makes the case for contextualized browsing as enabled by libraries like jQuery and by infrastructure like that provided by Phil’s company, Kynetx.

In Phil’s next blog item, Claiming My Right to a Purpose-Centric Web: SideWiki, he asserts:

I claim the right to mash-up, remix, annotate, augment, and otherwise modify Web content for my purposes in my browser using any tool I choose and I extend to everyone else that same privilege.

That item grew a long tail of comments. It includes some interesting back-and-forth between Phil Windley and Dave Winer, but I want to focus on this observation from Greg Yardley:

Sites also generally come with a contract attached – some implicit (the view-through), some explicit (the click-through) – and these contracts, done correctly, are generally enforceable.

This whole post mystifies me, because you don’t have the the right to mash-up, remix, annotate, augment, and otherwise modify Web content – it’s not your content.

Earlier in the thread, Jeremy Pickens cited an example of such a contract: Google’s terms of service:

8.2 You should be aware that Content presented to you as part of the Services, including but not limited to advertisements in the Services and sponsored Content within the Services may be protected by intellectual property rights which are owned by the sponsors or advertisers who provide that Content to Google (or by other persons or companies on their behalf). You may not modify, rent, lease, loan, sell, distribute or create derivative works based on this Content (either in whole or in part) unless you have been specifically told that you may do so by Google or by the owners of that Content, in a separate agreement.

In response to Greg Yardley, Phil Rees cites fair use:

Actually we do have those rights.

http://www.law.cornell.edu/uscode/17/107.html

I believe so too. Sooner or later, that belief will be tested.

After my March interview with Phil about Kynetx, I wrote:

There’s a continuum of ways in which I can modify a web page in a browser, ranging from font enlargement to translation to contexual overlays. I wouldn’t draw a line anywhere along that continuum. It seems to me that I’m entitled to view the world through any lens I choose.

This doesn’t only apply to my view of the virtual world, by the way. It will apply to my view of the physical world too. We don’t yet have magic glasses that overlay web prices on shelf items, or web reputations on store signage, but someday we will.

I can’t see how I could be prevented from creating a heads-up display — for realspace or cyberspace — that’s advantageous to me. But I’ve got a hunch that those magic glasses are going to be controversial.

I wonder if it’s going to boil down to magic glasses versus magic projectors. Or, in other words, private versus public augmentation of our experiences of the virtual and real worlds. I can wear my magic glasses, but I can’t necessarily project the view that I’m seeing.

Talking with Victoria Stodden about Science Commons

On this week’s Innovators show I spoke with Victoria Stodden about Science Commons, an effort to bring the values and methods of Creative Commons to the realm of science. Because modern science is so data- and computation-intensive, Science Commons provides legal tools that govern the sharing of data and code. There are lots of good reasons to share the artifacts of scientific computation. Victoria particularly focuses on the benefit of reproducibility. It’s one thing to say that your analysis of a data set leads to a conclusion. It’s quite another to give me your data, and the code you used to process it, and invite me to repeat the experiment.

In this kind of discussion, the word “repository” always comes up. If you put your stuff into a repository, I can take it out and work with it. But I’ve always had a bit of an allergic reaction to that word, and during this podcast I realized why: it connotes a burial ground. What goes into a repository just sits there. It might be looked at, it might be copied, but it’s essentially inert, a dead artifact divorced from its live context.

Sooner or later, cloud computing will change that. The live context in which primary research happens will be a shareable online space. Publishing won’t entail pushing your code and data to a repository, but rather granting access to that space.

It’s a hard conceptual shift to make, though. We think of publishing as a way of pushing stuff out from where we work on it to someplace else where people can get at it. But when we do our work in the cloud, publishing is really just an invitation to visit us there.