Search Results for 'rse'


I had a hunch that if I grew sunflowers in a fenced enclosure inside the chicken run they’d get big, since that’s the most fertile part of my backyard. Tonight I measured the tallest at 10 feet, 8 inches (3.25 meters). It’s stout, too, I feel like I could almost climb it. Impressive!

Yeah, but how impressive? And, even more interesting to me, how can we find data to help answer the question? Perhaps with a sequence of searches like so:

“1-foot sunflower”

“2-foot sunflower”

…etc…

“26-foot sunflower”

“27-foot sunflower”

These are parallel searches of Google and Bing for [1..27]-foot sunflower”. Here are the resulting counts, with Bing scaled up by a factor of 100 to make the trends comparable:

So, maybe my near-11-footer isn’t so special after all. This method of finding out is interesting, though. It seems incredibly naive. If you try those queries you’ll find all sorts of stuff that isn’t relevant to what I mean by an n-foot sunflower. But if the amount of irrelevance is constant across the range, it factors out, right? And the two independent search engines make this a controlled experiment.

I wonder how well this proxy for sunflower height distribution correlates with the actual distribution. Of course there are a million other questions you could try to answer this way. It’d be easy to make a web app to automate this method. I lazily hope somebody already has, or will, so I don’t have to.


PS: My sunflowers are actually a second crop. The first one had a crazy head start, because we had freaky warm weather in February. But then in early April, when they were already 3 feet high, the chickens broke into the enclosure and demolished them. What lofty heights could my sunflowers have reached this summer? We’ll never know.


PPS: Here’s the data:

1,2,0
2,994,10
3,8,4
4,10,4
5,9,4
6,3270,37
7,74,11
8,135,12
9,176,11
10,1690,39
11,75,9
12,472,37
13,82,12
14,220,8
15,54,9
16,9,4
17,2,1
18,55,4
19,6,2
20,119,8
21,0,0
22,2,0
23,0,0
24,8,3
25,891,2
26,3,2
27,0,0

Last week Kevin Curry dug into some data about school violence in his district. In this case the data was made available as HTML, which means it was sort-of-but-not-really published on the web. Kevin writes:

Whenever I come across data like this the first thing I want to know is whether or not it can actually be used as data. In order to be used/usable as data the contents of this HTML table need to be, at minimum, copy-and-paste-able into a spreadsheet.

Or, alternatively, the HTML table needs to be parseable as data. In this case, I was surprised to find that a couple of tools I normally use to do that parsing — Dabble DB and Excel — didn’t work. That’s because Kevin’s target page doesn’t include a static HTML table. It’s dynamic instead: First you select a district, then the table appears. This mechanism defeats tools that try to parse data from HTML tables, so it’s a bad way to publish data that you want to be available as data.

Lacking the option to parse the HTML table, Kevin’s only choice was to copy and paste. That’s clumsy, and you have to be really motivated to do it, but it can be done. Here’s the Google spreadsheet Kevin made from the data he copied and pasted. And here’s the same stuff as an Excel Web App.

If you haven’t tried out the new Excel Web App, by the way, it’s interesting to compare the two. One key difference, at least from my point of view, is — not surprisingly — the Excel Web App’s ability to roundtrip with Excel. A Google spreadsheet is, at this point, more functional in standalone mode. While you can edit both a Google spreadsheet and an Excel Web App in the browser, for example, the Google spreadsheet can insert and modify charts, whereas the Excel Web App only edits data.

Of course if you have Excel you’d rather use it to insert and modify charts. It’s a lot more capable than any browser app is likely to be anytime soon. So it’s pretty sweet to be able to open the cloud-based Excel spreadsheet, edit locally, and then save to the web. A related limitation of the Google spreadsheet is that you lose charts when you download to, or upload from, Excel.

Another key difference: The Excel Web App currently lacks an API like the one Google provides. I really hope that the Excel Web App will grow an OData interface. In this comment at social.answers.microsoft.com, Christopher Webb cogently explains why that matters:

The big advantage of doing this [OData] would be that, when you published data to the Excel Web App, you’d be creating a resource that was simultaneously human-readable and machine-readable. Consider something like the Guardian Data Store (http://www.guardian.co.uk/data-store): their first priority is to publish data in an easily browsable form for the vast majority of people who are casual readers and just want to look at the data on their browsers, but they also need to publish it in a format from which the data can be retrieved and manipulated by data analysts. Publishing data as html tables serves the first community but not the second; publishing data in something like SQL Azure would serve the second community and not the first, and would be too technically difficult for many people who wanted to publish data in the first place.

The Guardian are using Google docs at the moment, but simply exporting the entire spreadsheet to Excel is only a first step to getting the data into a useful format for data analysts and writing code that goes against the Google docs API is a hassle. That’s why I like the idea of exposing tables/ranges through OData so much: it gives you access to the data in a standard, machine-readable form with minimal coding required, even while it remains in the spreadsheet (which is essentially a human-readable format). You’d open your browser, navigate to your spreadsheet, click on your table and you’d very quickly have the data downloaded into PowerPivot or any other OData-friendly tool.

Some newspapers may be capable of managing all of their data in SQL databases, and publishing from there to the web. For them, an OData interface to the database would be all that’s needed to make the same data uniformly machine-readable. But for most newspapers — including even the well funded and technically adept Guardian — the path of least resistance runs through spreadsheets. In those cases, it’ll be crucial to have online spreadsheets that are easy for both humans and machines to read.

Last week Scott Hanselman summed up the principle of keystroke conservation like so:

There are a finite number of keystrokes left in your hands before you die. Next time someone emails you, ask yourself “Is emailing this person back the best use of my remaining keystrokes?”

Several of the comments on Scott’s post focused on the notion that keyboards will one day be obsolete, and that speech recognition will break the typing bottleneck. But that’s not the real bottleneck. The keystroke conservation principle is just one way of getting at the notion of scalable communication powered by network effects.

One of my favorite stories comes from Larry Moore, who was a Lotus executive. To illustrate why people didn’t “get” Lotus Notes, he used to talk about the early days of the telephone business, when there were roadshows to introduce people to the concept of telephony. Demonstrators would set up two phones on either end of a stage, with a wire strung between, and talk to each other. But it made no sense to the audiences. Obviously those people could already hear each other! Who needed the wire?

It’s the same thing with the principle of keystroke conservation. If I talk to one person, or a few people, faster than I can type messages to one or a few, I can communicate more, but not orders of magnitude more, and not in ways that fully exploit the power of the network.

Forget keystrokes for a moment and look at how Sal Khan is rewiring math and science education. He started out doing one-on-one tutoring with his cousin Nadia. It’s clearly ridiculous to say that his ability to scale that effort is constrained by the rate at which he can talk. On his instructional videos he talks no faster than normal. But he has strategically placed those videos in a pub/sub network where they can be discovered, subscribed to, shared, and reused. There are nearly 60,000 subscribers to his YouTube channel. That’s scalable communication.

The problem with examples like this one, of course, is that most of us aren’t rock-star performers like Sal Khan. If we push all the communication that we can into open networks, we’re not going to boost our reach by five orders of magnitude. Maybe only two. Maybe even just one. But that’s significant! You’ll never type a message 10x faster, or speak it 10x faster. But you can easily reach 10x more people by adopting communication habits that make it more likely that your message will be discovered, shared, and reused.

Face-to-face discussion, phone calls, email, and text messages are narrowcasting modes that don’t scale in this way. Blogs, Twitter, Facebook, wikis, and audio or video podcasts are broadcasting modes that do. How do we use both together in the right ways for given situations? It’s subtle. One commenter on Scott’s post writes:

My emails very rarely contain anything to blog about or update a wiki with.

What amount of email do you think is actually appropriate to becoming a blog entry in your life or in a less technical person’s life?

For what it’s worth, I think in terms of an inventory of reusable parts and the DRY (don’t repeat yourself) principle. For example, I’m often asked about how to publish iCalendar feeds from popular calendar apps. So I’ve written up a series of how-to blog posts. And I’ve encapsulated that series into a query: http://delicious.com/judell/icalpub+howto. None of those posts would have been email messages. But there are many email messages in my outbox that contain links to the series. Because the link is a query, it yields fresh results for anyone who has ever received the link in email as well as for anyone who ever will. The same posts are also quite often found directly by way of search.

Counting keystrokes is just one way to think about the underlying pattern. It’s not about typing versus talking. It’s about choosing the mix of modes that will best repay the effort you invest in communication.

Wakened this morning, about three o’clock, by Mr. Griffin with a letter from Sir W. Coventry to W. Pen

So begins today’s installment of The Diary of Samuel Pepys, as rendered by Phil Gyford. It’s a remarkable project that maps January 1, 1660 (the start of Pepys’ famous diary) to January 1, 2003 (the start of Phil’s Moveable Type recreation of the diary) and has continued faithfully ever since.

The Pepys blog is enhanced in all sorts of useful ways. People, places, and topics are cross-linked with indexes, places are mapped, all references are viewable on a timeline — it’s a brilliant example of advanced blog customization.

Back in 2003 I mused about what kind of content management system would enable somebody to do a project like this without a lot of inspired hacking. The question came up again recently when my sister Ruth decided to recreate an archive of letters that my parents wrote home from our 15-month stay in New Delhi during 1961 and 1962.

I’ve long held that blog publishing systems are really lightweight content management systems that can be used for almost any purpose. So I pointed her to WordPress.com, explained that you can use pages instead of posts to arrange items however you like, and waited to see what would happen.

Well, it didn’t work. It’s true that you can build an arbitrary collection of pages, but there’s no way Ruth would be able to manage that collection without automation. I could write code to help her, but I don’t want to. That’s partly laziness, and partly curiosity about how to use the standard kit to achieve the desired effects.

One of the biggest limitations of pages, in WordPress, is something I’d never noticed until now: No tags! So ended my plan to have Ruth use tags on pages to achieve a lightweight version of Phil Gyford’s indexes.

Why not just use posts? Originally I thought it would be cool to mimic the Pepys diary: start with a date in 1961, and continue in “real” time. But Ruth doesn’t want to do it that way. She wants to be able to process the archive in any order that’s convenient. And she wants it to read forward, like a book of letters, not backward like a blog. These perfectly reasonable requirements turn out to be harder to satisfy than you’d think.

It turns out that you can make the letters run forward on the Posts page by manipulating the publication dates. So here was the scheme I tried first:

July 2 1961 -> Jan 01 1961 15:01
July 4 1961 -> Jan 01 1961 15:00
...
Oct 19 1961  -> Jan 01 1961 04:01
Oct 22 1961  -> Jan 01 1961 04:00

In this scheme, every letter maps to the same day, chosen arbitrarily as Jan 1, 1961. Every month maps to an hour of that day, each letter maps to a minute within that hour, and the times run backward. Since WordPress reverses the sequence again when displaying items on the Posts page, that makes time run forward in that view.

The benefits are huge. Now Ruth can use tags to organize sets of letters, imposing as much or as little structure as she wants. Views by tag are neatly presented as sets of blurbs with “Continue reading” links. Each item automatically links to its predecessor and successor.

But there’s irreducible weirdness too. For example, the Jan 01 1961 date — which has now become an abstract database key used only for sorting — is part of every post URL. You wind up with patterns like this:

/1961/01/01/june-30-1961-from-anita/

This gets even weirder because dates prior to the start of Unix time — Jan 1, 1970 — don’t display in the management UI. However that turns out to be both a feature and a bug. It’s a feature because WordPress reverts to the current date for display, so you see “Posted on June 28, 2010 by Ruth” instead of “Posted on January 1, 1961 by Ruth.” And it’s a bug because you can’t easily scan and adjust the dates that control sorting.

More weirdness arises from the deeply hardwired assumption — in WordPress, but also in all blogs, really — that entries post in reverse chronological order. Although the backwards time mapping seemed at first glance to work, it turned out to be broken in two ways. On the Posts page, after the break, the link pointed to “Older entries” which were really, in our scheme, “Newer entries.” And within posts, the next and previous logic was also reversed.

So for now I’ve gone back to a forward mapping of hours and minutes within Jan 1, 1961. I’ve ditched the default Posts page in favor of a hand-crafted page that presents items in ascending order. Once you’re in an item, the next and previous links work as expected because, when you move from item to item, WordPress uses a forward arrow of time.

I’m not complaining. It’s astonishing that WordPress provides a free service that Ruth can use publish this archive of letters, and I’m hugely grateful. I think we’ll be able to come up with a technique that will satisfy her requirements — without demanding heroic effort from her or custom software from me. But it sure is interesting to see what happens when you mess with a blog’s notion of the direction of time.

This could have been me:

A bicyclist riding along Old Homestead Highway was hit by a vehicle Friday evening.

At about 6:43 p.m. Swanzey Police and Fire Department responded to a reported hit-and-run accident on Route 32.

The vehicle was described as a white SUV, possibly a Chevy Blazer, with a black roof rack. It’s missing its passenger-side mirror as a result of the accident, according to Cpl. Robert Eccleston of the Swanzey police.

The cyclist suffered serious injuries and was transported to Cheshire Medical Center/Dartmouth-Hitchcock Keene.

A couple of years ago it was me. I got sideswiped on a bike ride in another part of the county. In that case too, the impact also broke off the passenger-side mirror. Luckily I only suffered a bruised leg. According to a follow-up report, this cyclist suffered “skull fractures on the left side of his head, where his helmet hit the pavement, a broken shoulder and severe road rash.”

When it happened to me, I was furious for weeks. Every time I saw a sedan similar to the one that knocked me off my bike I looked for the telltale missing passenger-side mirror. And I formed a clear idea of a product that might have prevented the hit-and-run, or failing that, nabbed the perpetrator. It’s a pair of bicycle-mounted cameras, front and rear, that trigger on approaching traffic and take sequences of shots that can identify approaching vehicles.

Here’s why I imagine this could work. I don’t know about yesterday’s hit-and-run, but in my case it didn’t feel like an accident. We were the only two vehicles on the road. There was plenty of room for the car to give me wide berth. But some motorists like to hassle cyclists verbally, and once in a while that escalates to a cat-and-mouse game. That’s a game people these people play because they think they can get away with it. There’s no expectation that the sideswiped cyclist will be able to prove that it happened, or capture the identity of the car. In my case, when I jumped to my feet after tumbling along the roadside, only to see the car speeding over the top of the next hill, I remember thinking: “You bastard, if I only had your license plate number you would regret this.”

Defensive surveillance isn’t just a capability that cyclists need, of course. It makes sense for motorists to identify and record oncoming traffic too. But car-on-car violence is a game played on a level field. Car-on-bike violence is so unequal that I’ll jump at any advantage I can get.

Does the product I imagine already exist? Maybe, but I don’t think so. There are obviously scads of cheap helmet- or bike-mountable cameras. What I’m looking for, though, is one that’s optimized for defensive surveillance. I think that means a gadget that senses oncoming traffic, and then shoots sequences of high-resolution stills. Ideally it’d come with two pairs of mounts. One pair would be fitted to my bike’s handlebar and seat. The other pair would be fitted to my car’s dashboard and rear deck. For extra credit, the car would keep the cameras charged so they’re always ready to defend the bike.


PS: Meanwhile, my low-tech solution is a helmet-mounted rear view mirror. I have always used one, and can now scarcely imagine what it used to be like to have to crane my head around — and wobble my bike — in order to see what’s behind me. With a helmet mirror, situational awareness only requires rapid eye flicks that become an automatic habit. Obviously the habit wasn’t fully automatic, but after the incident a couple of years ago I’m even more vigilant. I watch every car that approaches from the rear, and am always mentally preparing a dive into the ditch.

When I posted Permalinks and hashtags for city council agenda items last week, I embedded a permalink and a hashtag to illustrate the idea. The post links to the video of Keene’s recent city council meeting, at the point where Patty Little introduces Tom LePage’s request to expand the Armadillo’s sidewalk cafe. The post also refers to this agenda item using the hashtag generated for it by the Granicus system.

I figured this would enable two ways to find pages, like my blog post, that refer to agenda items, like Tom’s request. First, you could search for pages that mention the hashtag. For example, this combined search of Google and Bing for granicus732_7716 finds my blog post because it mentions that tag. These searches also find my tweet containing the tag, and some echoes of the tweet. Finally, of course, you could search Twitter directly for the tag.

A second approach would be to search for pages that link to the video segment. I expected to be able to find my blog post by searching for this permalink which it cites:

http://keene.granicus.com/MediaPlayer.php?view_id=2&clip_id=77&meta_id=7716

I planned to use the link: operator, which finds pages pointing to an URL. And I figured this would work for both Google and Bing. But I was wrong on several counts. Bing doesn’t seem to support the link: operator. And even though Google does, this query doesn’t find my blog post.

Using the permalink as a plain search term doesn’t work either. And after reviewing the advanced search operators for both Google and Bing, I’m left wondering: How do you find pages that cite a permalink?

One weekend last year I was hiking with my dog along the Washington Street Extension in Keene, NH. It’s the old Route 9, now an abandoned road that runs alongside Beaver Brook and climbs up to Beaver Brook Falls. The road has been returning to nature since before we came to Keene. It’s lined on both sides, for over a mile, with 25-year trees that now entangle a course of utility cables. On that hike last year, I wondered if the owner of those cables might want to take a look and maybe schedule some pruning.

I tried calling the power company first. Directory services gave me the main number, but I failed repeatedly to find any path through the IVR system that would enable me to report the problem. When I got home I also failed to find the PSNH web page that has number to call: 1-800-662-7764. (Menu path: Residential or Business -> Safety Center -> Tree Trimming. Effective search: tree trimming not report a problem.) When I tweeted my query to Martin Murray (@psnh), though, he got back to me promptly. It turns out these aren’t power cables, they’re telephone cables.

So I tried to report the problem to Fairpoint. Again there was no obvious way to do it online. And I couldn’t find anybody at the phone company who would answer the phone on the weekend. Eventually I got distracted by other things and never followed up.

Fast forward to yesterday. I’m hiking with my dog along the same abandoned road. The 25-year trees are now 26-year trees. And some big 60- and 80-year trees, tilting on banks eroded by spring floods, threaten to bring down the cables.

So I call again. There’s got to be some way to report this, right?

It becomes a game. Every path through the IVR system leads, after much delay — and, infuriatingly, an advertisement — to a message saying that business hours are Monday through Friday, 9 to 5. I might have tried the website again, but:

a) I am not carrying a connected, browser-equipped device.

b) You are the fracking phone company. Answer the phone!

Finally somebody answers. It’s Patrick, in Internet tech support.

Patrick: What’s your phone number for DSL service?

Me: 603.355.xxxx

Patrick: And what operating system are you using?

Me: Never mind that, here’s the deal. I’m standing on the old Washington Street Extension, looking at what I suppose is Keene’s Internet trunk. There are 26-year-old trees entangling it for a mile. And right here, at pole 13-T, there are 60- and 80-year old trees leaning at a 45-degree angle over the cables. They’re going to bring those wires down in the next big ice or wind storm, if not before.

Look, I know this isn’t your department, but I’m having a hell of a time finding anybody at Fairpoint who cares about this. There must be some way to report the problem.

Patrick: I totally get what you’re saying. But you’ve reached the lowest guy on the totem pole. And, I hate to say it, but this really isn’t my department.

Me: I know. But you’re several hops closer to the right department than I am. Can you please just take a report, email it to your supervisor, and cc me on the email?

Patrick: OK, hang on…done.

Me: Thanks Patrick! You may have just prevented a whole shitload of Internet technical support calls!


Update: Got these responses from @MyFairPoint on Monday AM:

@judell Hi, Jon – thanks so much for the heads up (just saw your tweet come up in our alerts). I really appreciate you looking out!

@judell Also, our active acct is @MyFairPoint and we’re working to ramp up our social media efforts, so expect to hear more soon! Thx again!

@judell – I’ll see what I can do based on this and your attached article. ^JP

Nice!

The KUOW Speakers’ Forum continues to deliver the most consistently valuable talks I listen to these days. The latest is Hernando de Soto on Shadow Economies. It’s about facts, relationships, linked data, identity, property rights, the rule of law, derivatives, toxic assets, and permanent credit crunch. Bottom line: We need to get the facts about those assets, link them together, and bring them out of the shadows. So far as I can tell, the current crop of financial reform bills aren’t saying that. The following excerpts from de Soto’s talk explain why they should, and also why they probably won’t.


Facts were the subject of all the reformers who made the market economy come into being, between 1850 and 1950. We’re all clear about the ideology of the people who talked about the market, and the capitalist system, from 1750 to 1850: Adam Smith, Marx. They all talked about division of labor. What they didn’t say is that once labor is divided, and you have many sources of production, how do you coordinate them?

That crisis actually came. The whole system faltered in the 19th century because feudalism had collapsed, patrimony had collapsed, there was freedom, but freedom without law and structure. So different people, who wrote very little — you find the details in things that stopped being published a hundred years ago — said, We are in front of swarms of facts. They have nothing to do with our immediate vicinity, our village, our feudal lots, it’s about the world as a whole, and we can’t digest it.

So, property rights had to become universal. We had to make them explicit as facts. And we had to make sure that everybody had access to a new business instrument, the corporation. Before, even in the US, you needed an act of Congress to make a corporation. That changed. It was a big battle, but finally the argument that won was, they’re doing it anyway, and if we don’t get them on the books they’ll stay in the shadows. So gradually textiles, and cotton, and machinery started recording facts, and it all started coming under property law.

Facts isn’t just information. Here we have an apple, it’s mine, it looks just like a stolen apple, but it has a property right associated with it. That apple can be bought, sold, rented, used as a mortgage, there are a hundred things I can do with the apple. Those are its relations to the rest of society. For that you need something that describes those relations.

Charles Sanders Peirce, when asked to describe the universe, said: “Things in relation to one another.” The wonderful thing about the rule of law, especially as developed in the United States, is that you’ve been able to put together things and relationships in organized documents that are accessible and actionable. When that happens, the shadow economy goes away and you’re in control. You know who you’re dealing with, and you know what their assets are.

Now, here’s my concern about what’s happening with the recession. I’m watching TV, October 2008, and I see your Mr. Paulson, secretary of the Treasury, say, “We’re in trouble. We have troubled assets. So I’m going to buy them up, and then we’ll see what’s what.” Basically, he was saying: “We don’t have the facts, so I’ve got to produce them so we know who’s solvent and who isn’t.”

Later, I turn on the TV and he says, “We’ve thought about it, and we’ve decided we’re not going to buy the toxic assets, these derivatives, and sort them out. Instead we’ll just give enough money to the banks so that everybody knows they’re not going to break.” In other words, I’m not going to find out where the assets are, or record property rights.

Why that change? I asked. The reply was: “Well, he couldn’t find the toxic assets.” I thought that was really interesting. In the United States, everything is recorded: every house, every car, every boat. You know where things are. You’ve got facts. It is a factual economy, not like my economy which is a shadow economy where there are no facts.

I asked Chris Cox: “How many of these assets that are called derivatives are not on record?” And he said, “Well, we think there’s 600 trillion dollars of them.” That violates the crucial law of property as you have developed it over 150 years. No wonder nobody feels safe. You have created the world’s largest shadow economy.

As long as you don’t know who owns the greatest amount of your assets, there’s no info as to who owns what, who is related to what, you have a shadow economy. We live in one, and it has as a characteristic a permanent credit crunch. We know more about it than you do. Credit crunch is where you don’t know who you’d be lending to, so you don’t lend. It’s permanent, we live with it, and now you’re going to have to learn to live with it too, because until you know who is solvent how can you give anybody credit? You’re flying blind.

Einstein used to say: “What does the fish know about the water in which it swims?” That was his way of saying you have to be outside the aquarium to understand what’s going on inside the aquarium. Well, as an outsider looking in, I’m a great admirer of the United States, of your rule of law, which says that everything has to be identified because you are a nation of facts. As opposed to us, a nation of rumors and shadows. But you’ve slipped up really badly. You’ve got to get your banks to put these things on the record.

Back in the 1930s, Roosevelt saw that it was important to find out how much liquidity there was. To do that he needed to know where the gold was. He made a law, you had to record your holdings of gold or go to jail for ten years. Very soon he knew where all the gold was. That’s where you’re at. The problem is, what happens if when you do it, you find out that most of your top banks are insolvent? So you’ll need to involve the FDIC. But you’ve got to get the facts.

It’s very easy to get there, but it will mean that a sector of your society that is today in power will not be in power a month later, because they’ll be broke. Peter Munk, who owns gold mines in Canada, is building a marina in Montenegro for the biggest yachts in the world. When he was thinking that the U.S. administration was going to clean up the mess, and find out where the derivatives were, he said “You see all those yachts?” (He was looking at Sardinia.) “Well, in 2011, 4/10 of them will belong to somebody else.” Those 4/10 are holding out, obviously, because they don’t want that to be known. But they’re really screwing the rest of us.

Update: From Crain’s:

The Senate legislation would push most of the $615 trillion in over-the-counter derivatives onto regulated exchanges or similar electronic systems, a measure that would make it easier for the market and regulators to track the trades.

Really? Well OK then! Fingers crossed.

After long study of the psychological effects that computers and information systems are having on us, Linda Stone has turned to the physiological effects. Her elevator pitch used to be continuous partial attention. Now it’s email apnea. When we use these technologies, Linda says, we project ourselves into them, we become disembodied, we lose the ability to regulate our posture and breathing. On this week’s Innovators show she discusses what she has learned, and challenges us to find ways to remain embodied as we interact with networked devices and information systems.

Coincidentally, I got to hang out with Linda this weekend and try the HeartMath system that she’s been experimenting with. It sense your pulse, displays the variability of your heart rate, and then guides you through a breathing exercise that helps you regulate it. The HeartMath hardware and software supports regulation of the autonomic nervous system, bringing awareness to breathing patterns that emphasize fight or flight (sympathetic) or a rest and digest (parasympathetic) state.

One expert in this field, Steve Elliott, refers to this state as Coherent Breathing, and also offers a set of exercises. Now, to be honest, when I land on web pages like this one, where scholarly charts and footnotes rub elbows with ads for Swarovski Crystal Reminder Bracelets, my instinct is to move along. I’m fiercely non-mystical. I had to quit a yoga class because I just couldn’t listen to all the chatter about sun energy and moon energy. Can’t we just breathe and stretch?

The thing is, I’m also fiercely rational about physiology and health. I know that good posture, deep breathing, and slow stretching have profound benefits. I have resolved several health crises, ones that our medical system would prefer to address with drugs and surgery, by paying attention to my body and then adjusting how I use it. But I’ve never had a chance to try biofeedback. So I was intensely curious about the HeartMath system. It uses a pulse monitor clipped to your earlobe to monitor your heart rate, plus software to guide you through an exercise that levels out the variability and leads you into a state of breathing and pulse “coherence.”

For me it was easy, and fun, to achieve a high coherence score. Linda asked: “Are you a meditator?” No. “An athlete?” Yes. So that makes sense. I have decades of experience regulating my own breathing and heart rate. But never in a work context, and that’s the point Linda is driving at. In our work environments we leave our bodies and project ourselves into computers and networks. If we can reconnect with our bodies in those environments, we’ll be healthier. I can’t prove that, but I feel sure that it’s right.

I haven’t yet plunked down $300 for the HeartMath system, but I’m trying to talk myself into it. Although the company advertises it as a “desktop personal stress relief system,” I like the way that Linda is articulating a larger vision. For her, its about human performance. We are more powerful when augmented by computers and networks, but also less healthy. One answer is to decouple ourselves from computers and networks, and sometimes that’s the right answer. But another answer is to find ways to remain embodied as we use computers and networks. Linda thinks that’s a crucial way forward, and I agree.

Why not just jump on the bandwagon then? Because my antipathy to mysticism has lately also extended to geek crazes. I’m suspicious of the instinct to solve problems created by our computerized gadgets by acquiring and using more computerized gadgets. And I’m wary of the quasi-autistic compulsion at the heart of the quantified self movement whose manifesto, the data-driven life, appeared in this Sunday’s New York Times Magazine.

I have been a runner and a biker for decades. People always ask: How far did you run? How fast do you bike? I don’t know. I don’t want to know. It’s enough for me to be outside, moving over the landscape, breathing deeply, thinking my own thoughts and listening to other people’s thoughts.

I’m the kind of guy who hates waiting for a machine at the Y while the person who just did 20 reps pauses to scribble in a journal that he did 20 reps.

I’m certain that we will see, in a year or two, the emergence of 12-step programs for people who are addicted to self-monitoring.

And yet…I really liked the coherent breathing exercise. I want to repeat it, and I think it can become a helpful part of my routine.

The endnotes for the book I’m now reading are a mixture of conventional citations and URLs. The former, expressed as publisher, book or journal title, author, date, and page number, seem not nearly so useful as the latter. Would you rather visit the library or click a link? But nowadays cited URLs also come with disclaimers like this: Accessed July 27, 2009. It might be inconvenient to verify a conventional citation in its original context, but I know that if I had to, I could. There’s no guarantee that I’ll be able to revisit a cited URL. Even if the page itself has not gone missing, there’s no way to know that the page I view on April 22, 2010 is the same one that the author viewed on July 27, 2009.

This anecdote was the springboard for my conversation with Herbert Van de Sompel about Memento, a proposed (and prototyped) method for adding the dimension of time to the web’s existing mechanism for content negotiation.

That mechanism has, to be sure, not taken the world by storm. The most common scenario involves a browser telling a multilingual server that its user prefers to read, say, French. A paper about Memento published last fall walks through the HTTP protocol that enables this negotiation. Odds are, though, that you’ve never seen this actually happen. It’s much more likely for a multilingual website to present itself as “a multiplication of language-specific mini-sites, instead of thinking of it as one site, with one set of URIs, only with different versions and languages available.” Wikipedia, for example, works that way.

The quote comes from a 2006 W3C article, Content Negotiation: Why it is useful, and how to make it work. The article blames the awkwardness of Apache’s implementation of the protocol (since corrected):

For a long time, with the most popular negotiation-enabled Web server (the ubiquitous apache), failed negotiation (for instance, a reader of french being proposed only english and german variants of a document), resulted in a nasty “406 not acceptable” HTTP error, which, while technically conforming to HTTP, failed to follow the recommendation that a server should try to serve some resource rather than an error message, whenever possible.

Is there any reason to suppose that time negotiation will succeed where language negotiation has so far mainly failed? That’s a hard question, and one I wish I’d thought to ask Herbert in the interview, but maybe we can continue the dialogue here.

Meanwhile, the fact that content negotiation is tricky to get right doesn’t invalidate the core of the Memento proposal. Time is fundamental, the web could have a reliable memory, and if we can build such a memory into the fabric of the web the benefits will be profound.

Examples are everywhere. Consider mediabugs.org. Founded by Scott Rosenberg, whom I interviewed last week, the site is dedicated to finding and fixing errors in media reports. A few days ago, the first bug was marked Closed:Corrected. The mediabugs.org bug page initially said:

Listing for Josh Kornbluth’s show “Andy Warhol: Good for the Jews?” says the show is at the Jewish Community Center in SF, but actually it’s at The Jewish Theater in the Theater Artaud building.

There’s a comment pointing out the error but it’s still showing with the wrong info on the Express home page.

And later:

This is fixed now!

If you visit the original news report, though, there’s no record of the correction. It’s no big deal in this particular case, but media organizations should want to be transparent about when and how they alter published items.

Likewise governments. The Citability project aims to account for the history of changes made to items published on government websites. As with mediabugs.org, the approach will initially require third-parties to monitor and chronicle the changes.

The Memento idea is that media organizations, governments, and other kinds of web publishers will be accountable for their own change histories.1 And they’ll do so in a standard way, so that people viewing these sites in browsers can straightforwardly say: “Show me this page as it existed on July 7, 2009.”

This is wildly ambitious, but I applaud the ambition. Every since I made the Heavy Metal umlaut screencast, I have imagined what it would be like to scroll back and forth along the timelines of evolving web pages. At one point Andy Baio sponsored a contest to write a script that would animate the revision history for any Wikipedia page, and I made a screencast of Dan Phiffer’s solution.

Clearly we want this. Will it be hard to arrive at a well-known and well-used standard? Sure. Is it worth doing? Absolutely.


1 Third-party watchdogs will often be needed, of course. We’d like to trust self-reported change histories, but we’d also like to verify them. Even so, third parties shouldn’t be the only mechanisms. Self-reported histories should exist.

I’m editing an interview with John Hancock, who leads the PowerPivot charge and championed its support of OData. During our conversation, I told him this story about how pleased I was to discover that OData “just works” with PubSubHubbub. His response made me smile, and I had to stop and transcribe it:

Any two teams can invent a really efficient way to exchange data. But every time you do that, every time you create a custom protocol, you block yourself off from the effect you just described. If you can get every team — and this is something we went for a long time telling people around the company — look, REST and Atom aren’t the most efficient things you can possibly imagine. We could take some of your existing APIs and our engine and wire them together. But we’d be going around and doing that forever, with every single pair of things we wanted to wire up. So if we take a step back and look at what is the right way to do this, what’s the right way to exchange data between applications, and bet on a standard thing that’s out there already, namely Atom, other things will come along that we haven’t imagined. Dallas is a good example of that. It developed independently of PowerPivot. It was quite late in the game before we finally connected up and started working with it, but we had a prototype in an afternoon. It was so simple, just because we had taken the right bets.

There are, of course, many kinds of efficiency. Standards like Atom aren’t most efficient in all ways. But they are definitely the most efficient in the “it just works” way.

The elmcity project’s newest hub is called Madison Jazz. The curator, Bob Kerwin, will be aggregating jazz-related events in Madison, Wisconsin. Bob thought about creating a Where hub, which merges events from Eventful, Upcoming, and Eventbrite with a curated list of iCalendar feeds. That model works well for hyperlocal websites looking to do general event coverage, like the Falls Church Times and Berkeleyside. But Bob didn’t want to cast that kind of wide net. He just wanted to enumerate jazz-related iCalendar feeds.

So he created a What hub — that is, a topical rather than a geographic hub. It has a geographic aspect, of course, because it serves the jazz scene in Madison. But in this case the topical aspect is dominant. So to create the hub, Bob spun up the delicious account MadisonJazz. And in its metadata bookmark he wrote what=JazzInMadisonWI instead of where=Madison,WI.

If you want to try something like this, for any kind of local or regional or global topic, the first thing you’ll probably want to do — as Bob did — is set up your own iCalendar feed where you record events not otherwise published in a machine-readable way. You can use Google Calendar, or Live Calendar, or Outlook, or Apple iCal, or any other application that publishes an iCalendar feed.

If you are very dedicated, you can enter invidual future events on that calendar. But it’s hard, for me anyway, to do that kind of data entry for single events that will just scroll off the event horizon in a few weeks or months. So for my own hub I use this special kind of curatorial calendar mainly for recurring events. As I use it, the effort invested in data entry pays recurring dividends and builds critical mass for the calendar.

Next, you’ll want to look for existing iCalendar feeds to bookmark. Most often, these are served up by Google Calendar. Other sources include Drupal-based websites, and an assortment of other content management systems. Sadly there’s no easy way to search for these. You have to visit websites relevant to the domain you’re curating, look for the event sections on websites, and then look for iCalendar feeds as alternatives to the standard web views. These are few and far between. Teaching event sponsors how and why to produce such feeds is a central goal of the elmcity project.

When a site does offer a Google Calendar feed, it will often be presented as seen here on the Surrounded By Reality blog. The link to its calendar of events points to this Google Calendar. Its URL looks like this:

1. google.com/calendar/embed?src=surroundedbyreality@gmail.com

That’s not the address of the iCalendar feed, though. It is, instead, a variant that looks like this:

2. google.com/calendar/ical/surroundedbyreality@gmail.com/public/basic.ics

To turn URL #1 into URL #2, just transfer the email address into an URL like #2. Alternatively, click the Google icon on the HTML version to add the calendar to the Google Calendar app, then open its settings, right-click the green ICAL button, and capture the URL of the iCalendar feed that way.

Note that even though a What hub will not automatically aggregate events from Eventful or Upcoming, these services can sometimes provide iCalendar feeds that you’ll want to include. For example, Upcoming lists the Cafe Montmartre as a wine bar and jazz cafe. If there were future events listed there, Bob could add the iCalendar feed for that venue to his list of MadisonJazz bookmarks.

Likewise for Eventful. One of the Google Calendars that Bob Kerwin has collected is for Restaurant Magnus. It is also a Eventful venue that provides an iCalendar feed for its upcoming schedule. If Restaurant Magnus weren’t already publishing its own feed, the Eventful feed would be an alternate source Bob could collect.

For curators of musical events, MySpace is another possible source of iCalendar feeds. For example, the band dot to dot management plays all around the midwest, but has a couple of upcoming shows in Madison. I haven’t been able to persuade anybody at MySpace to export iCalendar feeds for the zillions of musical calendars on its site. But although the elmcity service doesn’t want to be in the business of scraping web pages, it does make exceptions to that rule, and MySpace is one of them. So Bob could bookmark that band’s MySpace web page, filter the results to include only shows in Madison, and bookmark the resulting iCalendar feed.

This should all be much more obvious than it is. Anyone publishing event info online should expect that any publishing tool used for the purpose will export an iCalendar feed. Anyone looking for event info should expect to find it in an iCalendar feed. Anyone wishing to curate events should expect to find lots of feeds that can be combined in many ways for many purposes.

Maybe, as more apps and services support OData, and as more people become generally familiar with the idea of publishing, subscribing to, and mashing up feeds of data … maybe then the model I’m promoting here will resonate more widely. A syndicated network of calendar feeds is just a special case of something much more general: a syndicated network of data feeds. That’s a model lots of people need to know and apply.

I’ve posted the Python script I used to make the Pivot visualization of this blog. I need to set it aside for now and do other things, but here’s a snapshot of the process for my future self and for anyone else who’s interested.

Using deepzoom.py to create Deep Zoom images and collections

I’m using this Python component to create Deep Zoom images and collections. I made the following changes to it:

1. tile_size=256 (not 254) at line 59, line 160, and line 224

2. source_path.name instead of source_path at line 291

3. destination + '.xml' instead of destination at line 341

Let’s assume that Python is installed, along with the Python Imaging Library, and that your current directory contains the files 001.jpg, 002.jpg, and 003.jpg:

001.jpg
002.jpg
003.jpg

For each image file, you could run deepzoom.py thrice from the command line, like so:

python deepzoom.py -d 001.xml 001.jpg
python deepzoom.py -d 002.xml 002.jpg
python deepzoom.py -d 003.xml 003.jpg

My script doesn’t actually do it that way, it enumerates JPEGs and instantiates deepzoom.py’s ImageCreator object once for each. But either way, for each JPEG you end up with a DZI (Deep Zoom Image) package that consists of (for 001.jpg):

  • A settings file: 001.xml
  • A subdirectory: 001_files
  • More subdirectories (named 0, 1, etc.) inside 001_files
  • JPG files inside those subdirectories

Now, in this case, the current directory looks like this (using -> to mark additions):

001.jpg
-> 001.xml
-> 001_files
002.jpg
-> 002.xml
-> 002_files
003.jpg
-> 003.xml
-> 003_files

To build a collection, do something like this in Python:

from deepzoom import *
images = ['001.xml','002.xml', '003.xml']
creator = CollectionCreator()
creator.create(images, 'dzc_output')

Now the current directory looks like:

001.jpg
001.xml
001_files
002.jpg
002.xml
002_files
003.jpg
003.xml
003_files
-> dzc_output.xml
-> dzc_output_files

The Pivot collection’s CXML file will refer to dzc_output.xml, like so:

<Items ImgBase="dzc_output.xml">

Using IECapt to grab screenshots

This tool uses Internet Explorer, so only works on Windows. There is also CutyCapt for WebKit, which I haven’t tried but would be curious to hear about.

Here’s an example of the IECapt command line I’m using:

iecapt –url=http://blog.jonudell.net/… –delay=1000 –out=tmp.jpg

The result in most cases is a tall skinny JPEG, because it renders the whole page — which can be very long — before imaging it. When I ran it over a 600-item collection, it hung a couple of times because of JavaScript errors. So I went to Internet Options->Browsing in IE, checked Disable script debugging, and unchecked Display a notification about every script error.

Using ImageMagic to crop screenshots

Here’s a picture of an image produced by IECapt, overlaid with a rectangle marking where I want to crop:

The rectangle’s origin is at x=30 and y=180. Its width is 530 pixels, and height 500. Here’s the ImageMagick command to crop a captured image in tmp.jpg into a cropped image in 001.jpg:

convert -quality 100 -crop 530×500+30+180 -border 1×1 -bordercolor Black tmp.jpg 001.jpg

I’m writing this down here mainly for myself. ImageMagic can do everything under the sun, but it always takes me a while to dig up the recipe for a given operation.

Parsing the WordPress export file

I found to my surprise that WordPress currently exports invalid XML. So the script starts with a search-and-replace that looks for this:

xmlns:wp="http://wordpress.org/export/1.0/"

And replaces it with this:

xmlns:wp="http://wordpress.org/export/1.0/"
xmlns:atom="http://www.w3.org/2005/Atom"

Then it walks through the items in the Atom feed, extracting the various things that will become Pivot facets. For the description, it tries to parse the content:encoded element as XML, and find the first paragraph element within it. If that fails, it just treats the element as text and grabs the beginning of it.

Weaving the collection

There are two control files that need to be synchronized. First, there’s dzc_output.xml, for the Deep Zoom collection. It has elements like this:

<I Id=”596″ N=”596″ Source=”2245.xml”>

Then there’s pivot.cxml which drives the visualization. It has elements like this:

<Item Id="596" Img="#596"
  Name="Freebase Gridworks: A power tool for data scrubbers"
  Href="http://blog.jonudell.net/2010/03/26/...
<Description><![CDATA[
I've had many conversations with Stefano Mazzocchi and David Huynh [1, 2, 3]
about the data magic they performed at MIT's Project Simile and now perform
at Metaweb. If you're somebody who values clean data and has wrestled with
the dirty stuff, these screencasts about a forthcoming product called
Freebase Gridworks will make you weep with joy.
]]></Description>
<Facets>
  <Facet Name="date">
    <DateTime Value="2010-03-26T00:00:00-00:00" />
  </Facet>
<Facet Name="tag">
<String Value="freebase" />
<String Value="gridworks" />
<String Value="metaweb" />
</Facet>
  <Facet Name="comments">
    <Number Value="24" />
  </Facet>
</Facets>
</Item>

In this example, Source="2245.xml" in dzc_output.xml refers to a Deep Zoom image whose name comes from the WordPress post_id for that entry, which is:

<wp:post_id>2245</wp:post_id>

But Id="596", which is the connection between dzc_output.xml and pivot.cxml, comes from a counter in the script that increments for each item processed. I don’t know why the numbering of items in the WordPress export file is sparse, but it is, hence the difference.

Things to do

Here are some ideas for next steps.

1. Check the comment logic. I just noticed the counts seem odd. Maybe because I’m counting all comments instead of approved comments?

2. Use HTML Tidy to ensure that item content will parse as XML, and then count various kinds of elements within it: tables, images, etc.

2. Use APIs of various services — Twitter, bit.ly, etc. — to count reactions to each item.

A Pivot experiment

Pivot, from Microsoft Live Labs, is a data browser that visualizes data facets represented as Deep Zoom images and collections. I’ve been meaning to try my hand at creating a Pivot collection. My first experiment is a visualization of my blog which, in its current incarnation at WordPress.com, has about 600 entries. That’s a reasonable number of items for the simplest (and most common) kind of collection in which data and visuals are pre-computed and cached in the viewer. Here’s the default Pivot view of those entries.

The default view

To create this collection, I needed a visual representation of each blog entry. I didn’t think screenshots would be very useful, but the method worked out better than I expected. At the default zoom level there’s not much to see, but you can pick out entries that include pictures.

A selected entry

When you select an entry, the view zooms about halfway in to focus on it.

A text-only entry

Here’s a purely textual entry at the same resolution. If you click to enlarge that picture, you’ll see that at this level the titles of the current entry and its surrounding entries are legible.

The Show Info control

Clicking the Show Info control opens up an information window that displays title, description, and metadata. I’ve included the first paragraph of each entry as the description.

Zooming closer

If I zoom in further, the text becomes fully legible.

Histogram of entries

Of course the screenshot doesn’t capture the entire entry, it’s just a picture of the first screenful. To read the full entry, you click the Open control to view the entire HTML page inside Pivot.

Pivot itself isn’t a reader, it’s a data browser. This becomes clear when you switch from item view to graph view. 2006 and 2010 are incomplete years, but the period 2007-2009 shows a clear decline. I suspect a lot of blogs would show a similar trend, reflecting Twitter’s eclipse of the blogosophere.

2007 distribution

Here’s the distribution for just the year 2007.

Histogram of comments

And here’s the comments facet, which counts the number of comments on each entry.

Histogram of entries with more than 20 comments

Adjusting the slider limits the view to entries with more than 20 comments.

Filtering by tags

Of course I can also view entries by tags or tag combinations.

Filtering by keywords

When I start typing a keyword, the wordwheel displays matches from two namespaces: tags and titles.

Other possible views

Facets can be anything you can enumerate and count. I could, for example, count the number of images, tables, and other kinds of HTML constructs in each entry. That isn’t just a gratuitous exercise. Some years back, I outfitted my blog with an XQuery service that could search for items that contained more than a few images or tables, and it was useful for finding items that I remembered that way.

It would also be nice to include facets based on the WordPress stats API. And since a lot of the flow to the blog nowadays comes through bit.ly-shorted URLs on Twitter, a facet based on those referrals would be handy.

How I did it

Life’s too short to make 600 screenshots by hand, so the process had to be automated. Also, I want to be able to update this collection as I add entries to the blog. So I’m using IECapt to programmatically render pages as images, and the indispensable ImageMagick to crop the images in a standard way.

To automate the creation of Deep Zoom images (and XML files), I’m using deepzoom.py. (Note that I had to make two small changes to that version. At line 224, I changed tile_size=254 to tile_size=256. And at line 291 I changed PIL.Image.open(source_path) to PIL.Image.open(source_path.name).)

To build the main CXML (collection XML) file, I export my WordPress blog and run a Python script against it. I hadn’t looked at that export file in a long time, and was surprised to find that currently it isn’t quite valid XML. The declaration of the Atom namespace is missing. My script does a search-and-replace to fix that before it parses the XML.

I haven’t uploaded the collection to a server yet, because there are a bazillion little files and I’m still tweaking. Once I’m happy with the results, though, I should be able to establish a baseline collection on a server and then easily extend it an entry at a time.

If there’s interest I’ll publish the script. It’ll be more compelling, I suspect, once Pivot becomes available as a Silverlight control. Currently you have to download and install the Windows version of Pivot to use this visualization. But imagine if WordPress.com could deliver something like this for all of its blogs as a zero-install, cross-platform, cross-browser feature. That would be handy.

The other day I read the following statement in the Economist:

Sensitivity of the data will decide if an application is suitable for processing in the cloud.

The writer does not mention, and probably is unaware of, the principle of translucent data. In a translucent database, the data is encrypted and thus opaque to the operator of the database. Users of the data share keys to unlock the data, and can do anything with cleartext copies that they keep locally. Can real and useful applications be built in this kind of regime? We don’t really know, because hardly anybody has tried. But if it turns out to be possible, it could become a foundation of cloud computing.

I wanted to advance the story. In particular, I wanted to help make a connection between that statement in the Economist and the idea of data translucency. I’ve written about translucency on my blog, and those entries are tagged on delicious. But nowadays the attention stream flows mainly through Twitter. So I composed this tweet:

Economist: “Sensitivity of the data will decide if an application is suitable for processing in the cloud.” Unless the data is #translucent.

There’s a limit to what you can do in 140 characters. That tweet uses all 140, but still falls short of what I wanted to do:

  • Quote from the Economist
  • Link to the Economist
  • Colonize a formerly empty hashtag namespace (#translucency)
  • Connect that namespace to its delicious counterpart

Inevitably I failed to do all that in 140 characters. Reflecting on the failure, I made this LazyWeb wish:

I wish I could tweet the command “join http://delicious.com/judell/translucency to #translucent and #translucency”

I’ve had some success joining tag namespaces from different domains. I mentioned the idea in this entry, and a commenter (engtech) provided a nifty solution based on Yahoo Pipes. I have since used it to keep track of items tagged icalvalid on blogs, on delicious, and on Twitter.1

My LazyWeb wish came from that experience, plus another which I wrote up in an entry entitled To: elmcity, From: @curator, Message: start. That entry describes how elmcity curators can now use Twitter direct messages to send commands to the elmcity service. The mechanism harkens back to Rael Dornfest’s brilliant Sandy, a service that acted as a personal assistant and responded to a repertoire of command messages.

Sandy lost her job when Rael went to work for Twitter. I’ve wondered if she would be rehired there. If so, a command like the one I proposed might be an example of the kind of thing she could do.

On further reflection, I’m not really sure what such a command would mean, or whether it would make sense to use Twitter to send it, or indeed whether it would make sense for Twitter (rather than some other service) to respond to it. But I’m in an exploratory mood, so let’s explore.

It would be straightforward to create a service that would take the Yahoo Pipes trick to the next level. Instead of editing and saving a Yahoo Pipe, you’d just command that service to merge the set of feeds for some tag. That command might best take the form of a URL:

http://tagjoiner.org/join/TAG?delicious=yes&twitter=yes&wordpress=yes

As is true for my combined icalvalid feed, the result formats could be HTML for viewing and RSS for feed splicing. As the creator of the joined feed, I’m aware that it exists, and I can cite it when I want to direct people’s attention to the union of the namespaces.

But suppose I wanted the joined namespace to be more discoverable than that? Here’s where it might make sense for Twitter to be involved. If a hashtag search on Twitter did the join, it could be made evident to the followers of the person making the join request, or even to anyone searching for the hashtag involved in the request.

This is almost surely too indirect and too abstract to ever make sense as a mainstream feature. But it’s fun to imagine. If I’ve made an investment in a tag on delicious, or WordPress, or somewhere else, I’d like to be able to bring those items to the attention of people who encounter the corresponding Twitter hashtag.

The general idea behind all this goes way beyond Twitter, of course. Waiting in the wings is a whole class of services that reconcile different web namespaces.


1 That feed used to include a mix of items marked [DELICIOUS] and [TWITTER]. But the Twitter items are less durable and seem to have aged out of the combined feed.

The other day my colleague Scott Hanselman wrote a useful essay called 10 Guerilla Airline Travel Tips for the Geek-Minded Person. It’s a mixture of technical and social strategies. The tech strategies include marshaling data with the help of services like Tripit, FlightStats, and SMS alerts. The social strategies include being nice to service reps, and using the information you’ve marshaled in order to make precise requests that they’re most likely to be able to satisfy.

Scott writes:

I’m a geek, I like tools and I solve problems in my own niche way.

That statement, along with the essay’s tagline — …Tips for the Geek-Minded Person — has been bothering me ever since I read it. Why is it geeky to marshal the best available data? Why is it geeky to use that data to improve your interaction with people and processes?

My Wikipedia page includes this sentence:

Udell has said, “I’m often described as a leading-edge alpha geek, and that’s fair”. 1

I did say that, it’s true. But I’ve come to regret that I did. For a while I thought that was because geek was once defined primarily as a carnival freak. That’s changed, of course. Nowadays the primary senses of the word are obsessive technical enthusiasm and social awkwardness. Which is better than being somebody who bites the heads off chickens. But it’s still not how I want to identify myself. Much more importantly, it’s not how I want the world to identify the highest and best principles of geek identity and culture.

Fluency with digital tools and techniques shouldn’t be a badge of membership in a separate tribe. In conversations with Jeannette Wing and Joan Peckham I’ve explored the idea that what they and others call computational thinking is a form of literacy that needs to become a fourth ‘R’ along with Reading, Writing, and Arithmetic.

The term computational thinking is itself, of course, a problem. In comments here, several folks suggested systems thinking which seems better.

Here’s a nice example of that kind of thinking, from Scott’s essay:

#3 Make their job easy

Speak their language and tell them what they can do to get you out of their hair. Refer to flights by number when calling reservations, it saves huge amounts of time. For example, today I called United and I said:

“Hi, I’m on delayed United 686 to LGA from Chicago. Can you get me on standby on United 680?”

Simple and sweet. I noted that UA680 was the FIRST of the 6 flights delayed and the next one to leave. I made a simple, clear request that was easy to grant. I told them where I was, what happened, and what I needed all in one breath. You want to ask questions where the easiest answer is “Sure!”

I see two related kinds of systems thinking at work here. One engages with an information system in order to marshal data. Another engages with a business process — and with the people who implement that process — in a way that leverages the data, reduces process friction, and also reduces interpersonal friction.

These are basic life skills that everyone should want to master. If we taught them broadly, and if everyone learned them, then this sort of mastery wouldn’t attract the geek label. But we don’t teach these skills broadly, most people don’t learn them, and the language we use isn’t our friend. If systems thinking is geeky then only geeks will be systems thinkers. We can’t afford for that to be true. We need everyone to be a systems thinker.


1 Actually I’d say that Scott Hanselman is a leading-edge alpha geek. I am, at best, a trailing-edge beta or gamma geek. But if someone were to remove the word entirely from my Wikipedia page, I’d be fine with that. I no longer want to be labeled as any kind of geek.

The sound track for yesterday’s run was a compelling talk by Atul Gawande about his new book The Checklist Manifesto, which grew from an article in the New Yorker. Although his story is grounded in the practice of health care, the lessons apply much more broadly to every field in which we grapple with complexity.

For most of human history, he argues, we were limited by lack of knowledge. We just didn’t know how to do things right. Now that knowledge is abundant the enemy is no longer ignorance but rather ineptitude — the failure to marshal and apply what we know.

The surprising thing Atul Gawande learned, and now passionately conveys, is that simple checklists turn out to be extraordinarily powerful tools for marshalling knowledge and for ensuring its correct use.

The biggest roadblock is pushback from highly-trained experts who are offended by the idea. After 8 years of medical school, and in a regime that already demands vast amounts of paperwork, why should a doctor have to check off basic items on a list? Because we are fallible in the face of complexity, Gawande says, and because checklists work. Although he led research in this area he was skeptical about adopting checklists in his own operating rooms. But when he did, he made two critical discoveries. First, well-made checklists are easy to use. Second, they almost always caught errors.

Most of those errors turned out to be non-critical. Only a few of the catches saved lives. That alone, of course, is enough reason to adopt checklist discipline. But it was shocking for the medical teams to discover that simple and basic procedures, which they thought were being carried out with 100% fidelity, in fact weren’t.

We are willing to tolerate failure when it results from unavoidable ignorance, Gawande says. If we really don’t know how to cure a disease, then OK. You tried your best, you failed, that’s how it is. But if we do know, and screw up, that’s unforgivable. What do you mean she died because somebody forgot to administer the antibiotic, or to wash his hands? Unacceptable.

The struggle with complexity I know best happens in the realm of software. What do our checklists look like? One obvious form is the test suite. If my software keeps passing its tests as I evolve it, there’s still plenty that can and will go wrong, but at least I know it still does what the tests say it does. Once, recently, I deployed a version of the service I’m building that failed in a way my tests would have caught. How did that happen? I was so sure I hadn’t changed anything that the tests would catch that I didn’t bother to rerun them. That’s an unforgivable lapse of discipline I don’t plan to repeat.

But software tests aren’t really the sort of checklist that Gawande writes and speaks about. Here’s something closer to what he means: Best practices in web development with Python and Django. That list comes from Christopher Groskopf, a web developer at the Chicago Tribune, who writes:

In our fast-paced environment there is little justification for being confused when it could have been avoided by simply writing it down.

We need to recognize and honor this kind of work. It is unsexy but heroic, and I use that word deliberately. The power of the checklist discipline, Gawande says, should prompt us to rethink our definition of heroism. Consider Capt. Chesley “Sully” Sullenberger:

It was fascinating to watch people responding to the miracle on the Hudson. All of us, staring in amazement, thinking what a hero he was. But none of us willing to listen to what he really was saying. He kept saying it wasn’t flight ability, but instead adherence to discipline, and teamwork. But it was as if we couldn’t process what he was trying to tell us.

Because there were checklists, and because everybody used them, Sully could rise above the dumb stuff and focus on the one key decision for which human judgement was required. The heroic part of that flight was not the flight ability of Capt. Sullenberger, it was the willingness of the entire team — including the flight attendants, who then acted through their protocols to get the passengers off that plane in three minutes — to acknowledge their fallibility, admit that they could fail by relying only on training and memory, and exercise the discipline to overcome that fallibility.

The talk raises important questions for practitioners in every field. What makes checklists easy to use? What makes them effective? In the realm of software, we have plenty of examples to look at: django, WordPress, C#, ASP.NET, etc. It might be fruitful to explore these, merge similar lists, and codify stylistic patterns that can govern all such lists.

Last night I attended a lecture by Vincent Malmström who, in 1973, published a paper in Science proposing an answer to the mysterious (and still controversial) question: Why did the Maya use a 260-day calendar?

Malmström’s 1997 book Cycles of the Sun, Mysteries of the Moon, which he has also made freely available here, tells the whole story from his point of view. It’s a remarkable tale of geography, religion, culture, computation, science, and human foibles.

The Maya actually used three different calendars. The Tzolk’in ran on a 260-day cycle, and the Haab’ used a 365-day cycle. Then there was the Long Count, which counted days since a mythical beginning of time and also included the other two.

The Long Count’s start date was written, in its full form, like this:

0.0.0.0.0, 4 Ahau 8, Cumku

The first five digits measure days in units of 144,000, 7,200, 360, 20, and 1. 4 Ahau is a Tzolk’in day, based on a cycle of 13 numbers with a cycle of 20 days names. 8 Cumku is a Haab’ day, based on 18 20-day months.

Today’s date is 12.19.17.2.3, which Wikipedia’s Long Count page helpfully computes for you using this markup:

Today, {{CURRENTDATE}}, in the Long Count is {{Maya date}} (GMT correlation)

(Here GMT doesn’t stand for Greenwhich Mean Time, but rather for Goodman-Martinez-Thompson.)

But today might be 12.19.17.2.2, according to this calculator. There has, evidently, been epic confusion and controversy about whether the mythical start date was 584,283 or 584,284 or 584,285 days ago. Thompson originally thought 584,285, then changed his mind and decided on 584,283.

Prof. Malmström likes 584,285, which fixes the start date as August 13, 3114 B.C. Why? Thompson didn’t think there was any astronomical basis for the 260-day calendar, but Malmström figured there had to have been. And he wondered where, in that part of the world, you might observe a 260-day astronomical cycle.

It turns out that at latitude 14.8 º N, the sun is directly overhead on August 13 passing southward, and again on April 30 passing northward, an interval of 260 days. August 13 is also the day after the peak of the Perseid meteor shower. Malmström writes:

The signs were therefore unmistakable. First the heavens would give their notice. All night long the skygazer would watch as stars burst from behind the towering mountains to the northeast and flashed across the sky. And the following morning, as the sun arched higher and higher across the heavens, he would watch as the shadow it cast grew steadily shorter, until, as the sun reached its zenith, its shadow completely disappeared. This then, he decided, was the day for his count to begin.

Why count days? If you’re planting maize, you need to calibrate carefully to the arrival of the monsoon rains. The two solar passages correspond roughly to the beginning of the rainy season at the end of April, and the harvest in mid-August.

Note that these passages, and the associated latitude 14.8 º N, don’t apply to the Maya in the Yucatán Peninsula, but instead to an earlier Olmec civilization to the southwest, on the Pacific coast near what is now the border between Mexico and Guatemala. The Mayan new year was July 26, not August 13. But the 260-day calendar predated the Mayans by a millenium.

Just a few decades after its inception, the 260-day “sacred” calendar was augmented by a 365-day “secular” calendar. The problem was that the sacred calendar didn’t quite work. There were 13 20-day cycles — or 20 13-day cycles — during the sun’s southward passage, and what seemed like 8 more 13-day cycles during the northward passage. So when the calendar started running, things seemed to work out — albeit in a delightfully curious way.

Each time the zenithal sun passed overhead on its way south, a new 260-day cycle would begin on a day numbered “1″ but with a different name. Thus, the skygazer watched as the beginning of each successive cycle shifted from “1 Alligator” to “1 Snake” to “1 Water” to “1 Reed” and then to “1 Earthquake.”

That didn’t last long, though.

Where the priest had erred, of course, was in concluding that the cycle of the sun could be measured in 28 “bundles” of 13 days. This meant that he had equated its annual migration through the heavens with an interval of 364 days, when in actuality it took about a day and a quarter longer than that. Thus, after only four years had elapsed his count was already off by 5 days. This might go unnoticed by the commoners at first, but certainly, as the error increased with each passing year, it wouldn’t be long before “the cat was out of the bag.”

What a colossal screwup! I like to imagine the priests furiously backpedaling.

OK, wait, I know we said 260, but it’s really 365, but we’ll keep both, don’t worry, it’ll work out, trust us, we know what we’re doing.

Of course the fun never stops. We’re less than two years away from Y 13.0.0.0.0. That’s in 2012, on Dec 23. Or on Dec 22, or Dec 21, depending on which correlation constant you choose. On one of those dates the world will end. Or not. Prof. Malmström suggests you choose 584,285. That’ll give you two extra days to put your affairs in order.


For more on the endlessly weird human reckoning of time, see A literay appreciation of the Olson/Zoneinfo/tz database.

A while ago I asked the Lazy Web for a service that would produce a tag cloud of the names of the lists on which a Twitter user appears. Mine, for example, would look like this:

The Lazy Web seems not to have taken up the challenge, so I took a crack at it. The solution I came up with is a single-page application, which is just a web page that uses HTML, CSS, and Ajax to do something that’s (hopefully) interesting and useful.

Here’s the page: http://jonudell.net/NamesOfTwitterListsFor.html

It defaults to my Twitter name but you’ll of course want to try yours, and those of others you’re curious about. The first time through, you’ll be prompted to authenticate to api.twitter.com. This looks like the password anti-pattern, but really isn’t. You’re authenticating yourself to the Twitter API in the same way that you normally do to the Twitter website.

Note that since the API call used to build the tag cloud is rate-limited, queries through this page will be charged against your daily allotment of Twitter API usage, just as when you use client applications like TweetDeck or Seesmic.

What will your tag cloud say about you? I don’t think you’ll be surprised. It’s just another of the unique signatures written for us by others. That those signatures do get written, though, and that they can be discovered and read, never ceases to surprise me.

The dynamics of single-page applications also never cease to surprise me. In this case, a tiny 4K web page is all that’s delivered from my modestly-equipped personal webserver. It would probably survive a Slashdotting. If not, the page could be hosted on any other server, or on a other local drive, and would continue to work the same way.

I’m also using jQuery, in this case served from the Microsoft content delivery network, so that’s unlikely to be a bottleneck. The only real limit is Twitter API usage, and that’s spread across all the Twitter users who authenticate through the page.

When you arrange and deploy a tiny amount of HTML, CSS, and JavaScript in this way, you can create a lot of leverage!

I’ve long been familiar with the idea of software patterns. But I didn’t connect it to its roots in the architectural writings of Christopher Alexander until I recently listened to Kent Beck’s keynote at the 2008 Rails conference. Kent was deeply influenced by The Timeless Way of Building. That book wasn’t available in my local library. But the companion volume, A Pattern Language: Towns, Buildings, Construction, was. It’s been a revelation to read it for the first time, more than thirty years after it was published, through lenses formed by my experience with software and networks.

Here’s how A Pattern Language summarizes a pattern called FOUR-STORY LIMIT:

Therefore, in any urban area, no matter how dense, keep the majority of buildings four stories high or less.

And here’s how the Portland Pattern Repository summarizes the Singleton Pattern:

Therefore, let the class create and manage the single instance of itself, the Singleton. Wherever in the system you need access to this single instance, query the class.

The stylistic allusion shows a direct literary influence flowing from architectural pattern language to software pattern language. Alexander’s book, by the way, is a pre-Web hypertext. The pattern called FOUR-STORY LIMIT (21), for example, refers to NUMBER OF STORIES (96), DENSITY RINGS (29), BUILDING COMPLEX (95), HOUSING HILL (39), and HIGH PLACES (62). Each of these numbered patterns links to a set of related patterns, as does each page in the Portland Pattern Repository — which was also, of course, the Ur-wiki from which all things wiki are descended.

I suspect we’ve yet to fully elaborate the connections between software, architecture, and networks. Consider these pattern names from Alexander’s 1977 book:

THE DISTRIBUTION OF TOWNS
WEB OF PUBLIC TRANSPORTATION
NETWORK OF LEARNING
WEB OF SHOPPING
ACTIVITY NODES
NECKLACE OF COMMUNITY PROJECTS
CONNECTED PLAY
NETWORK OF PATHS AND CARS
CIRCULATION REALMS

These evocative names, and the sketches that accompany them, arise from a deeply network-oriented way of thinking. Many of the higher-level patterns express core values about connectivity and decentralization. And those values resonate more powerfully now, in our Net-aware world of 2010, than they might have in 1977.

Some of the prescriptions in A Pattern Language can seem absurd. For example, Alexander argues that an optimal urban core should serve a “catch basin” of about 300,000 people, that these cores should be widely distributed, and that each should specialize in some way that makes it world-class. Why?

The problem is clear. On the one hand people will only expend so much effort to get goods and services and attend cultural events, even the very best ones. On the other hand, real variety and choice can only occur where there is concentrated, centralized activity; and when the concentration and centralization becomes too great, then people are no longer willing to take the time to go to it.

Which is fine in theory, but we’ve already built megalopolises surrounded by suburbs. It’s not like we can do it over.

Except when we can. In America the most striking examples are in Michigan where I lived for many years. Detroit, once a city of two million, is being recreated as a city that may end up at less than a third that size. What will become of the rest? It just might be plowed into farmland. If so, a pattern called CITY COUNTRY FINGERS may turn out to be a useful guide.

Likewise there are plans afoot to gather the remaining population of Flint into a few viable neighborhoods and let the vacated land become parks and forests. If that happens, many of the ideas in A Pattern Language, about how to organize neighborhoods and their transportation networks, could come into play.

In Asia, meanwhile, entire new cities are being built from scratch. I recently met an engineer who works for the global consulting firm Arup. For one of their projects, he told me they’re using a simulation of wind flow as one of the constraints on the layout of streets and buildings. The layout is also informed by RING ROADS and LOCAL TRANSPORT AREAS, patterns that yield a tiered distribution network which optimizes the use of delivery trucks.

Networked software is highly malleable, and we take for granted that we can try out different design patterns. The built environment rarely affords the same opportunity. But in this century of urbanization, as circumstances force us to rethink our energy, transportation, and settlement networks, it may turn out to be softer than we suppose, and more open to the influence of pattern languages.

Linda Stone, coiner of the marvelous phrase continuous partial attention, has lately been exploring another modern pathology she calls email apnea, which means failure to breathe while checking email. In retrospect, we shouldn’t be surprised. Look:

  • The new 25-payline special edition of Wheel of Wealth will have you holding your breath in excitement…

  • Play Online Slot Machine Game. Coin in – spin – hold your breath……Watch those symbols…..Will it or won’t it?

  • After the first two hits you’re holding your breath for the third reel…

We don’t talk about slot-machine apnea but it’s the same syndrome, produced by the same cause: an intermittent, or variable-interval, schedule of reinforcement. Any activity that exhibits this pattern will be powerfully addictive. A dog begging for scraps of food at the table, rewarded only once in a thousand times, will always beg. Likewise a human begging for scraps of attention.

The link between variable-interval reinforcement and email addiction is well known. Less studied is how this plays out in other modes of electronic discourse. The architecture of those modes introduces another key variable: attention payoff. In a group-structured system, like email or Facebook, the payoff is bounded by group size. It’s true that email messages can escape and go viral, but when that happens the attention payoff is never the kind you want.

But in open pub/sub systems, like blogs or Twitter, the payoff is unlimited. Any item that you post could attract worldwide attention, boost your reputation, land you a job, or make a key personal or professional connection. However there’s no guarantee that you’ll get any reinforcement at all. So some fall by the wayside, others become addicted.

“Technology is here to stay,” Linda says. “Can our relationship to it change?”

It must, it can, and it will. But we’ll need to develop some intuitions about global scale and connectedness for which evolution did not prepare us. And then we’ll need to translate them back down to the human scale. Evolution has taught us how to be social. Technology amplifies our ability to give and receive attention, but it doesn’t change the rules of the game. There’s a time to listen, a time to talk, a time to breathe. We’ll remember, and we’ll figure it out.

My guest for this week’s Innovators show is Sal Khan. He’s the creator of http://khanacademy.org, a catalog of more than 1000 YouTube video lessons in math, physics, biology, chemistry, and economics. All of these videos are made by Sal himself, in an engagingly personal style, using simple screencasting tools.

When I first got interested in screencasting, I envisioned the medium not only as a way to demonstrate software, but also as a way to share knowledge at Internet scale. Sal’s work fulfills that vision, and points the way toward a profound and much-needed disruption of our educational system.

At its core, Sal’s project isn’t about YouTube screencasts. It’s about intuition.

I always got frustrated by what went on in the classroom. You see otherwise intelligent peers memorizing facts and not really caring about the actual intuition. And because they didn’t care about the intution in their junior year, when that same idea pops up in senior year, it’s like they’ve never seen it before. It boggled my mind. You’re just relabeling the same concept over and over.

Sal cares about the intuition, and he wants others to care about the intution too. The first beneficiary of that desire was his cousin Nadia, whom he tutored remotely. Then followed other cousins and family friends. Then it dawned on him that there were no limits. The project could scale out. He could become a superempowered individual, reaching anyone who finds value in his method.

One of the key ingredients of that method is improvisation. These videos aren’t carefully planned, and they aren’t edited. As a viewer, you find yourself looking over the shoulder of a smart and broadly knowledgeable person who is solving problems by thinking on his feet. You watch a practitioner at work: engaged with his medium, wrestling with his tools, correcting false starts.

It was Chris Gemignani who first showed me the value of this approach, in a screencast that teaches how to do unexpectedly powerful and elegant Excel charting. He did it in one take. I’d have been tempted to edit out the false starts. But Chris knew better. Learning how a practitioner really thinks about solving a problem is even more valuable than learning the solution to the problem.

One thing that Sal’s lessons can’t be, of course, is interactive. Nor does he pretend that these videos will make teachers obsolete. But he does suggest, and I violently agree, that teachers can and should become curators of online assets like the ones Sal is creating, and should know when and how to weave those assets into their classes.

Teachers should also become connectors. Sal won’t be the only game in town. Other superempowered tutors will emerge. Each will have a unique style. For a given student, a given subject, and a given problem, one or another of those styles may be right. The best teachers will know their own strengths and limitations, will know which online tutors complement their strengths in a variety of ways, and will connect their students with those tutors.

Sal Khan is on fire. He burns with a passion to share his intuitions with anyone and everyone. It is a beautiful thing to see. He has abandoned a lucrative career in finance to do this fulltime, and I am quite sure he will find a way to keep doing it.


PS: The title of this piece refers to Richard Ankrom’s Los Angeles freeway project. At a busy intersection, millions of motorists have been directed to North 5 by a sign that Caltrans omitted. Ankrom created and installed that missing sign.

PPS: I wrote to my son’s math teacher about Sal Khan. She replied: “Thanks for that link to the Khan Academy. I was overwhelmed by how many video lessons he has! He does seem like an inspiring man. Unfortunately, You Tube is blocked here at the high school.”

OData, the Open Data Protocol, is described at odata.org:

The Open Data Protocol (OData) is a web protocol for querying and updating data. OData applies web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores.

The other day, Pablo Castro wrote an excellent post explaining how developers can implement aspects of the modular OData spec, and outlining some benefits that accrue from each. One of the aspects is query, and Pablo gives this example:

http://ogdi.cloudapp.net/v1/dc/BankLocations?$filter=zipcode eq 20007

One benefit for exposing query to developers, Pablo says, is:

Developers using the Data Services client for .NET would be able to use LINQ against your service, at least for the operators that map to the query options you implemented.

I’d like to suggest that there’s a huge benefit for users as well. Consider Pablo’s example, based on some Washington, DC datasets published using the Open Government Data Initiative toolkit. Let’s look at one of those datasets, BankLocations, through the lens of Excel 2010′s PowerPivot.

PowerPivot adds heavy-duty business analytics to Excel in ways I’m not really qualified to discuss, but for my purposes here that’s beside the point. I’m just using it to show what it can be like, from a user’s perspective, to point an OData-aware client, which could be any desktop or web application, at an OData source, which could be provided by any backend service.

In this case, I pointed PowerPivot at the following URL:

http://ogdi.cloudapp.net/v1/dc/BankLocations

I previewed the Atom feed, selected a subset of the columns, and imported them into a pivot table. I used slicers to help visualize the zipcodes associated with each bank. And I wound up with a view which reports that there are three branches of WashingtonFirst Bank in DC, at three addresses, in two zipcodes.

If I were to name this worksheet, I’d call it WashingonFirst Bank branches in DC. But it has another kind of name, one that’s independent of the user who makes such a view, and of the application used to make it. Here is that other name:

http://ogdi.cloudapp.net/v1/dc/BankLocations?$filter=name eq ‘WashingtonFirst Bank’

If you and I want to have a conversation about banks in Washington, DC, and if we agree that this dataset is an authoritative list of them, then we — and anyone else who cares about this stuff — can converse using a language in which phrases like ‘WashingtonFirst Bank branches in DC’ or ‘banks in zipcode 20007′ are well defined.

If we incorporate this kind of fully articulated web namespace into public online discourse, then others can engage with it too. Suppose, to take just one small example, I find what I think is an error in the dataset. Maybe I think one of the branch addresses is wrong. Or maybe I want to associate some extra information with the address. Today, the way things usually work, I’d visit the source website and look for some kind of feedback mechanism. If there is one, and if I’m willing to provide my feedback in a form it will accept, and if my feedback is accepted, then my effort to engage with that dataset will be successful. But that’s a lot of ifs.

When public datasets provide fully articulated web namespaces, though, things can happen in a more loosely coupled way. I can post my feedback anywhere — for example, right here on this blog. If I have something to say about the WashingtonFirst branch at 1500 K Street, NW, I can refer to it using an URL: 1500 K Street, NW.

That URL is, in effect, a trackback that points to one record in the dataset.1 The service that hosts the dataset could scan the web for these inbound links and, if desired, reflect them back to its users. Or any other service could do the same. Discourse about the dataset can grow online in a decentralized way. The publisher need not explicitly support, maintain, or be liable for that discourse. But it can be discovered and aggregated by any interested party.

The open data movement, in government and elsewhere, aims to help people engage with and participate in processes represented by the data. When you publish data in a fully articulated way, you build a framework for engagement, a trellis for participation. This is a huge opportunity, and it’s what most excites me about OData.


1 PowerPivot doesn’t currently expose that URL, but it could, and so could any other OData-aware application.

I’m listening to the audio version of a very cool talk given by astronaut-turned-artist Alan Bean. (Skip the hokey intro, though, and jump in at minute 7 when he starts.)

He tells great stories about the space program, but also offers wider perspectives on life, art, and human potential.

Along the way, he tells an amusing anecdote about the famous picture of Neil Armstrong planting an American flag onto the moon’s surface. Armstrong told Bean it had been a scary moment, and Bean asked why. Armstrong said (as paraphrased by Bean):

Well, I couldn’t get that flag into the ground, like in training. Up there, those particles in the dirt aren’t rounded like regular sand. On Earth I would just do like that, and it would go in. But up there I did like that and it didn’t go in.

I imagined that when I let go, it would fall into the dirt, and people all over the world would see the American flag fall into the dirt. So I tipped it back until the center of gravity was over the hole. Then I put a little dirt around it. I knew that if I could get it balanced, and get away from it, that without any wind it would stay balanced. So that’s what we did. We got away from it, and we never got close to it again.

Bean adds: “It probably blew over when they launched, but it didn’t make any difference. That’s an engineer’s solution!”

What a great hack!

Today on a conference call I was reminded of another. A few years ago, in an airport, I saw a guy with a cellphone in one hand and a payphone in the other. His ear, brain, and mouth were trying to bridge two phone networks together, it wasn’t working well, and he was visibly frustrated. Finally he removed his head from between the two phones, stuck them together, and reversed them earphone-to-microphone, so the two parties were talking directly to each other.

My conference call today presented a different version of that scenario. It was scheduled as a VOIP call, then was switched to a POTS call, but not everybody got the memo. So I made the POTS call. And since I have a podcast rig that lets me do POTS calls through my computer, using the same headset I use for VOIP, I made the call that way.

Then people started to show up on both the POTS side and the VOIP side. I realized that, unexpectedly, I was hearing both sides and they were hearing me. Both were being conveyed through my computer’s audio subsystem. I was just like the guy with the cellphone on one ear and the payphone on the other.

It would have been cool to do the same kind of earphone-to-microphone hack. But before I got the chance to try, the VOIP folks hung up and dialed back in on the POTS side.

Oh well, maybe next time.

If you’re interested in the use of computers and networks to support collaboration, you’ll have heard of PLATO. It was an early courseware system, and by early I mean circa 1960, running on vacuum tubes. But it was also a petri dish in which much of what we now know as online culture first evolved.

I’ve long known that PLATO inspired many other systems, including VAX Notes and Lotus Notes. But I never heard the backstory. So when I found out that Brian Dear is completing a history of PLATO, and planning a conference to commemorate its 50th anniversary, I invited him onto my weekly show to find out more about it. PLATO matters, Brian says, because

it challenges our assumptions of how the online world evolved. It rewrites the history. It’s as if we discovered Wilbur and Orville Wright were not the first to fly a powered plane — that it’d been done faster and longer with a jet aircraft 30 years earlier.

Of couse the same can be said of other early technologies, notably Smalltalk, which introduced ideas and methods that are only now hitting the mainstream. It’s fun to wax nostalgic, but I’d rather explore how these systems arose, why they flourished, and what accounts for the propagation of their memes but not their genes.

From that perspective Brian reminds us, first, that PLATO was expensive. Few universities were willing or able to invest millions in a Control Data mainframe and a fleet of gas-plasma flat-panel bitmapped touch-screen display terminals. Those terminals enabled some extraordinary things, like the interactive music software that captivated Brian as a University of Delaware undergrad. They also enabled a now-extinct species of emoticons, which relied on the bitmapped graphics. But since much of what became PLATO’s essential DNA required only character-mapped graphics, those expensive bitmapped screens became an evolutionary bottleneck.

Another feature that didn’t pass through that bottleneck was PLATO’s ability to make sense of natural language input. Many thousands of programmer hours were invested in enabling PLATO to recognize a variety of human utterances. That in turn enabled courseware authors to create lessons that responded intelligently — and, Brian says, in ways that are sadly still not typical of modern courseware.

Today we can attack that problem by creating open source libraries, by reusing them, and by extending them. That’s a great way to create DNA that can propagate. But it’s useful to consider why it might not. We still, for the most part, create dependencies on specific programming languages, and on the environments in which they run.

As we move into an era of services, though, we can start to imagine a more fluid environment in which capabilities persist across language and system boundaries. Consider this exhibit from an antique PLATO library:

This is a screenshot from the live PLATO system running (in emulation) at cyber1.org. It’s a page from the catalog of functions in PLATO’s CYBIS library. Shown here are some of the methods available to process responses to questions.

Some of those methods might still be useful. And if they’d been packaged in a language- and system-independent way, some might conceivably still be in use.

PLATO programmers didn’t have the option to package their work in a such a way. Now we’re on the cusp of an era in which these kinds of library services can also be language- and system-independent web services. Will we exploit this new possibility? Will some of today’s core services still be delivering value decades from now, freeing developers to add value farther up the stack? It’s worth pondering.

A while back I reviewed the reading machine that my mom, who suffers from macular degeneration, now depends on. I gave it a thumbs up, but also noted that she was having some problems.

On my last visit I came up with a method that will help, if she can get the hang of it. The method is non-obvious, and isn’t documented anywhere I’ve been able to find, so I made a short movie to illustrate it.

The key insights are:

Use the left margin screw to set a left margin somewhere

It almost doesn’t matter where, you just need a guide for carriage returns.

Position the book and the tray

Getting this right makes a huge difference. My mom was constantly fiddling with the position of the book on the tray. This frustrated her, and seriously impaired her ability to read fluidly.

But if you position the tray correctly, and the book relative to the tray, then you can easily read the whole page without touching or moving the book at all. Here’s how:

Align the bottom left corner of the book with the bottom left corner of the screen.

This is counter-intuitive. The natural expectation is to start at the top of the page. And you do want to start reading there. But I found that establishing a bottom margin is a crucial first maneuver, and it involves three steps:

1 Push tray all the way forward and rightward

2. Place book on tray

3. Move book to align bottom left corner of page with bottom left corner of screen

With the tray still as far forward and as far right as it will go, you have defined both a left margin and a bottom margin for the page. Now read the whole page without touching the book again. Here’s how:

Find the top of the page.

To do that you pull the tray out (forward, towards yourself) until the top margin of the page lines up with the top of the screen.

Read as many lines vertically as the screen can display.

Use only a two-stroke left/right motion of the tray. The sequence is:

1. Slide tray left to reveal ends of lines

2. Slide tray right for carriage return

My mom had been advancing the tray (by pushing it in) once per line. This wastes effort and disrupts context. If the left margin screw is set, a carriage return always goes to the same place. So it was easy — at least for her — to make a visual connection from the end of the previous line to the beginning of the next one.

I realize this part may not work for everyone, and maybe not even for her as her vision worsens. Right now, at her magnification, her screen can display 8 or 10 lines. At higher magnification, when only a few are visible, there will be less context to help make that connection. Then it may become necessary to scroll vertically once per line. But the longer that can be avoided, the better.

Why was this necessary?

Shouldn’t multi-thousand-dollar gizmos like this come with training materials that help people figure this stuff out? Yes, but I’ve given up being shocked that they don’t.

If you’ve got a friend or relative in the same boat, let me know if this writeup — and/or the accompanying video — makes sense.

A note on making the movie

The video combines slides with a side-by-side animation of the tray and the screen. I wound up using PowerPoint, which conveniently handles the three ingredients: text, bitmap graphics, and vector graphics.

Rather than use PowerPoint’s animation features, though, I made a sequence of frames, nudging objects by small increments from frame to frame. This turned out to be a surprisingly easy and approachable technique.

Then I turned on a screen recorder — I used Camtasia, but it could have been any other — and stepped through the frames.

On this week’s podcast, Greg Wilson tells the story of a university course he created, and has taught for many years, called Software Carpentry. I have known Greg for a long time. We are kindred spirits in several ways. Most notably, we like to mine veins of knowledge, experience, and technique that some practitioners take for granted, but that many others haven’t yet discovered — or don’t yet use as well as they could.

I, for example, wonder why we don’t teach everyone basic principles of structured information, namespace design, and syndication. Greg, similarly, wonders why student programmers — and student scientists whose careers increasingly depend on computational methods — are not taught basic principles of version control, debugging, and refactoring. And why we don’t read great software in the same way we read great literature or study landmark scientific experiments. And why the controlled reproducibility of commercial software development isn’t typical of computational science.

If you care about these issues, there are two ways you can help. First, take a look at the reboot of the Software Carpentry course that Greg’s experience has led him to propose. Second, help him find the funding to keep doing this work.

On FiveThirtyEight.com the other day, Andrew Gelman posted this chart illustrating the high cost of US health care:

He did so to correct a “somewhat misleading (in my opinion) presentation of these numbers [that] has been floating around on the web recently.” The misleading graph, which appeared on a National Geographic blog, was — I agree — a confusing way to show information better represented in a scatterplot.

But I’ve seen this data before, and there’s more to the story. Neither the National Geographic nor FiveThirtyEight has anything to say about which numbers they’re charting.

Back in 2005, in a review of John Abramson’s excellent book Overdo$ed America, I noted that he had used a different source to reach a slightly different conclusion.

His chart, based on OECD health-expenditure data (link now 404) and WHO healthy life expectancy data (link still alive), looked like this:

He used it to make the oft-cited point that US healthcare isn’t just wildly expensive, but that it also correlates with worse life expectancy than in many countries that spend less.

I wondered what the chart would look like if based on the same OECD expenditure data but on the OECD’s rather than the WHO’s definition of life expectancy. The result looked like this:

The U.S. is the clear cost outlier on both charts. The first chart, however, places us near the low end of the life expectancy range, justifying Abramson’s assertion that we combine “poor health and high costs.” The second chart places us near the high end of the life expectancy range, suggesting that while value still isn’t proportional to cost, we’re at least buying more value than the first chart indicates.

Although based on older data, this second chart closely resembles the ones recently shown and discussed by the National Geographic and FiveThirtyEight.

My review of Abramson’s book concluded:

Has Abramson spun the data to make his point, just as he accuses the pharmaceutical industry of doing? Of course. Everybody spins the data. What matters is that:

  • Everybody can access the source data, as we can in the case of Abramson’s book but cannot (he argues) in the case of much medical research
  • The interpretation used to drive policy expresses the values shared by the citizenry

Would we generally agree that we should measure the value of our health care in terms of healthy life expectancy, not raw life expectancy? That the WHO’s way of assessing healthy life expectancy is valid? These are kinds of questions that citizens have not been able to address easily or effectively. Pushing the data and surrounding discussion into the blogosphere is the best way — arguably the only way — to change that.

That was five years ago. The data was, and is, out there. So it’s disheartening to see the same chart pop up again without any further discussion of the sources of its data, or of the definitions underlying those sources.

On this week’s Innovators show, Doug Day joins me to discuss the new iCalendar validator he has recently deployed on Azure.

The project draws inspiration from the pathbreaking RSS/Atom feed validator originally created by Mark Pilgrim and Sam Ruby. The RSS/Atom validator’s test-driven and advice-oriented approach is exemplary, and the iCalendar validator follows in its footsteps.

The tests, in this case, are iCalendar snippets that are, or are not, valid according to the spec. These snippets, packaged into XML files, form a library of examples that does not depend on the programming language used to run the tests. So although Doug’s validator, based on his open source parser, is written in C#, another validator written in Java or Python or Ruby could use the same test suite.

The advice offered is minimal so far, but I hope will expand as the test suite grows. Sam Ruby observes:

Identifying real issues that prevent real feeds from being consumed by real consumers and describing the issue in terms that makes sense to the producer is what most would call value.

In that spirit, I am gathering examples of calendars in the wild and looking for ways to help Doug add value.

In the podcast we discuss a nice example that came up recently in the curators’ room of the elmcity project. A custom-built calendar contained events (VEVENT components, in iCalendar-speak) with no start or end times (DTSTART and DTEND properties). This, it turns out, is not prohibited by the spec. But reporting no error is unhelpful. The author of the calendar — or of the software that produced the calendar — ought to be warned that such a calendar won’t yield a useful or expected result.

Why would anyone produce such a calendar in the first place? This harkens back to the early days of RSS. Many of us found that we could craft simple ad-hoc feeds in order to leverage RSS as a lightweight data exchange. It was liberating to be able to do that. But hand-crafted feeds, or feeds written by hand-crafted software, were valuable only to the extent they would reliably interoperate. Often they would not. The feed validator, by showing what was wrong with these feeds, and explaining why and how to fix them, was a powerful ally for those of us trying to bootstrap a feed ecosystem.

The iCalendar validator has a long way to go yet. But the road ahead is well lit, and I’m grateful to Doug Day for resolving to travel it.

The other day I listened to a Spark (CBC Radio) interview with Larry Lessig about his New Republic essay Against Transparency, which begins:

We are not thinking critically enough about where and when transparency works, and where and when it may lead to confusion, or to worse. And I fear that the inevitable success of this movement–if pursued alone, without any sensitivity to the full complexity of the idea of perfect openness–will inspire not reform, but disgust. The “naked transparency movement,” as I will call it here, is not going to inspire change. It will simply push any faith in our political system over the cliff.

The essay was published in October 2009. In this interview from November, Prof. Lessig reflected on the reactions that it provoked. Although the delicious and bitly feedback now suggests that most people understood the essay to be a thoughtfully nuanced critique, there were evidently some early responders who read it as a retreat from openness and an assault on the Internet.

I’m glad I missed the essay when it first appeared. Reading it along with a cloud of feedback from readers and from the author amplifies one of the key points: We don’t really want naked transparency, we want transparency clothed in context.

The Net can be an engine for context assembly, a wonderful phrase I picked up years ago from Jack Ozzie and echoed in several essays. But it can also be a context destroyer.

In the interview, Lessig notes one example of context destruction. The article, which most people will read online, spans eleven pages, each of which wraps its nugget of “content” in layers of distraction. Some early negative comments, Lessig says, came from people who had clearly not read to the end.

Our increasingly compressed and fragmented attention can also be a context destroyer:

What about when the claims are neither true nor false? Or worse, when the claims actually require more than the 140 characters in a tweet?

This is the problem of attention-span. To understand something–an essay, an argument, a proof of innocence– requires a certain amount of attention. But on many issues, the average, or even rational, amount of attention given to understand many of these correlations, and their defamatory implications, is almost always less than the amount of time required. The result is a systemic misunderstanding–at least if the story is reported in a context, or in a manner, that does not neutralize such misunderstanding. The listing and correlating of data hardly qualifies as such a context. Understanding how and why some stories will be understood, or not understood, provides the key to grasping what is wrong with the tyranny of transparency.

Transparency is a necessary but not a sufficient condition. Recently my town’s crime data and council meetings have appeared online. But this remarkable transparency does not alone enable the sort of collaborative sense-making that we all rightly envision.

In the case of crime data, we require a context that includes historical trends, regional and national comparisons, guidance from government about how its local taxonomy relates to regional and national taxonomies, and reporting by newspapers and citizens.

In the case of city council meetings, we require a context that includes relevant state law and local code, and reporting by stakeholders, by newspapers, and by affected citizens.

To enable context assembly, we’ll need to organize the numeric and narrative data produced by the “naked transparency” movement in ways friendly to linking, aggregation, and discovery.

But these principles will need to be adopted more broadly than by governments alone. Everyone needs to understand the principles of linking, aggregation, and discovery, so that everyone can help create the context we crave.