Ever since I saw tomatoes growing in a greenhouse that had a suspension system to hoist them up, I’ve wanted to do something like that. I’ve also been wanting to make a structure using Starplate connectors. This year the two ideas came together to create a tomato suspension dome.

The structure

The kit

The Starplate kit is just 11 metal plates that accept 2-by-3s or 2-by-4s on edge, like so:

I used 8-foot 2-by-3s. Around the edge of the pentagonal base I planted peas, pole beans, and morning glories. Inside, it was all tomatoes and basil. Although I used indeterminate vines, they didn’t reach as high as I’d imagined. So I never had to climb a ladder to pick tomatoes.

The big question in my mind was how to hoist the tomatoes. I ended up putting eyehooks into the upper struts, spaced about 18″ apart, and running string through them to form concentric pentagons descending from the peak. Then I could toss the weighted end of a string up and over to make a pulley anywhere in the enclosure.

Suspension

Here’s the suspension method:

It entails:

  1. Wrapping a loop of tomato velcro around the vine
  2. Tying one end of string to the loop
  3. Running the other end up over a skyhook, down through the loop, and back up six inches or so
  4. Hoisting the vine
  5. Tying the end into a slipknot around the pair of strings

Every couple of weeks, as the vines grew, I’d detach the collar, raise it up, reattach, and hoist.

Outcomes

The peas and beans did OK, but were happier in other parts of the garden. The tomatoes rocked. I’m not ambitious enough to do any real canning, but here’s one happy outcome: 6 quarts of fresh salsa and a couple of gallons of juice infused with jalapenos, serranos, and poblanos.

Another outcome: oven-dried tomatoes. These are just like sun-dried except they only take 12 hours in the oven at 200 instead of days in the sun.

The salsa was a ton of work but oven drying is dead easy. I’ve got a lot more tomatoes still to come, and this the future for many of them.

Next year

Things to do differently:

  1. Start the morning glories sooner. When the peas and beans didn’t cooperate, I wanted another use for all the height I’d created, but the morning glories got a late start.
  2. Abandon netting. Part of the problem with the peas and beans was that I hung netting for them to climb. Bad idea. Next time, I’ll just dangle a bunch of strings.
  3. In late winter, dump in manure to generate heat and enclose with plastic to create a greenhouse.

Is this really practical?

Probably not. If you’ve ever been bitten by the dome bug, it’s just something you have to get out of your system sooner or later. Domes are preposterous structures, really, as Stewart Brand pointed out hilariously in How buildings learn. There’s a reason why we build rectangularly: You can use standard materials, you can expand outward, you can use interior space efficiently. Domes create big structures from small amounts of material, but they’re not very practical structures. There are surely easier ways to hoist tomatoes. Still, it’s been fun!

I had a hunch that if I grew sunflowers in a fenced enclosure inside the chicken run they’d get big, since that’s the most fertile part of my backyard. Tonight I measured the tallest at 10 feet, 8 inches (3.25 meters). It’s stout, too, I feel like I could almost climb it. Impressive!

Yeah, but how impressive? And, even more interesting to me, how can we find data to help answer the question? Perhaps with a sequence of searches like so:

“1-foot sunflower”

“2-foot sunflower”

…etc…

“26-foot sunflower”

“27-foot sunflower”

These are parallel searches of Google and Bing for [1..27]-foot sunflower”. Here are the resulting counts, with Bing scaled up by a factor of 100 to make the trends comparable:

So, maybe my near-11-footer isn’t so special after all. This method of finding out is interesting, though. It seems incredibly naive. If you try those queries you’ll find all sorts of stuff that isn’t relevant to what I mean by an n-foot sunflower. But if the amount of irrelevance is constant across the range, it factors out, right? And the two independent search engines make this a controlled experiment.

I wonder how well this proxy for sunflower height distribution correlates with the actual distribution. Of course there are a million other questions you could try to answer this way. It’d be easy to make a web app to automate this method. I lazily hope somebody already has, or will, so I don’t have to.


PS: My sunflowers are actually a second crop. The first one had a crazy head start, because we had freaky warm weather in February. But then in early April, when they were already 3 feet high, the chickens broke into the enclosure and demolished them. What lofty heights could my sunflowers have reached this summer? We’ll never know.


PPS: Here’s the data:

1,2,0
2,994,10
3,8,4
4,10,4
5,9,4
6,3270,37
7,74,11
8,135,12
9,176,11
10,1690,39
11,75,9
12,472,37
13,82,12
14,220,8
15,54,9
16,9,4
17,2,1
18,55,4
19,6,2
20,119,8
21,0,0
22,2,0
23,0,0
24,8,3
25,891,2
26,3,2
27,0,0

Last week Kevin Curry dug into some data about school violence in his district. In this case the data was made available as HTML, which means it was sort-of-but-not-really published on the web. Kevin writes:

Whenever I come across data like this the first thing I want to know is whether or not it can actually be used as data. In order to be used/usable as data the contents of this HTML table need to be, at minimum, copy-and-paste-able into a spreadsheet.

Or, alternatively, the HTML table needs to be parseable as data. In this case, I was surprised to find that a couple of tools I normally use to do that parsing — Dabble DB and Excel — didn’t work. That’s because Kevin’s target page doesn’t include a static HTML table. It’s dynamic instead: First you select a district, then the table appears. This mechanism defeats tools that try to parse data from HTML tables, so it’s a bad way to publish data that you want to be available as data.

Lacking the option to parse the HTML table, Kevin’s only choice was to copy and paste. That’s clumsy, and you have to be really motivated to do it, but it can be done. Here’s the Google spreadsheet Kevin made from the data he copied and pasted. And here’s the same stuff as an Excel Web App.

If you haven’t tried out the new Excel Web App, by the way, it’s interesting to compare the two. One key difference, at least from my point of view, is — not surprisingly — the Excel Web App’s ability to roundtrip with Excel. A Google spreadsheet is, at this point, more functional in standalone mode. While you can edit both a Google spreadsheet and an Excel Web App in the browser, for example, the Google spreadsheet can insert and modify charts, whereas the Excel Web App only edits data.

Of course if you have Excel you’d rather use it to insert and modify charts. It’s a lot more capable than any browser app is likely to be anytime soon. So it’s pretty sweet to be able to open the cloud-based Excel spreadsheet, edit locally, and then save to the web. A related limitation of the Google spreadsheet is that you lose charts when you download to, or upload from, Excel.

Another key difference: The Excel Web App currently lacks an API like the one Google provides. I really hope that the Excel Web App will grow an OData interface. In this comment at social.answers.microsoft.com, Christopher Webb cogently explains why that matters:

The big advantage of doing this [OData] would be that, when you published data to the Excel Web App, you’d be creating a resource that was simultaneously human-readable and machine-readable. Consider something like the Guardian Data Store (http://www.guardian.co.uk/data-store): their first priority is to publish data in an easily browsable form for the vast majority of people who are casual readers and just want to look at the data on their browsers, but they also need to publish it in a format from which the data can be retrieved and manipulated by data analysts. Publishing data as html tables serves the first community but not the second; publishing data in something like SQL Azure would serve the second community and not the first, and would be too technically difficult for many people who wanted to publish data in the first place.

The Guardian are using Google docs at the moment, but simply exporting the entire spreadsheet to Excel is only a first step to getting the data into a useful format for data analysts and writing code that goes against the Google docs API is a hassle. That’s why I like the idea of exposing tables/ranges through OData so much: it gives you access to the data in a standard, machine-readable form with minimal coding required, even while it remains in the spreadsheet (which is essentially a human-readable format). You’d open your browser, navigate to your spreadsheet, click on your table and you’d very quickly have the data downloaded into PowerPivot or any other OData-friendly tool.

Some newspapers may be capable of managing all of their data in SQL databases, and publishing from there to the web. For them, an OData interface to the database would be all that’s needed to make the same data uniformly machine-readable. But for most newspapers — including even the well funded and technically adept Guardian — the path of least resistance runs through spreadsheets. In those cases, it’ll be crucial to have online spreadsheets that are easy for both humans and machines to read.

Last week Scott Hanselman summed up the principle of keystroke conservation like so:

There are a finite number of keystrokes left in your hands before you die. Next time someone emails you, ask yourself “Is emailing this person back the best use of my remaining keystrokes?”

Several of the comments on Scott’s post focused on the notion that keyboards will one day be obsolete, and that speech recognition will break the typing bottleneck. But that’s not the real bottleneck. The keystroke conservation principle is just one way of getting at the notion of scalable communication powered by network effects.

One of my favorite stories comes from Larry Moore, who was a Lotus executive. To illustrate why people didn’t “get” Lotus Notes, he used to talk about the early days of the telephone business, when there were roadshows to introduce people to the concept of telephony. Demonstrators would set up two phones on either end of a stage, with a wire strung between, and talk to each other. But it made no sense to the audiences. Obviously those people could already hear each other! Who needed the wire?

It’s the same thing with the principle of keystroke conservation. If I talk to one person, or a few people, faster than I can type messages to one or a few, I can communicate more, but not orders of magnitude more, and not in ways that fully exploit the power of the network.

Forget keystrokes for a moment and look at how Sal Khan is rewiring math and science education. He started out doing one-on-one tutoring with his cousin Nadia. It’s clearly ridiculous to say that his ability to scale that effort is constrained by the rate at which he can talk. On his instructional videos he talks no faster than normal. But he has strategically placed those videos in a pub/sub network where they can be discovered, subscribed to, shared, and reused. There are nearly 60,000 subscribers to his YouTube channel. That’s scalable communication.

The problem with examples like this one, of course, is that most of us aren’t rock-star performers like Sal Khan. If we push all the communication that we can into open networks, we’re not going to boost our reach by five orders of magnitude. Maybe only two. Maybe even just one. But that’s significant! You’ll never type a message 10x faster, or speak it 10x faster. But you can easily reach 10x more people by adopting communication habits that make it more likely that your message will be discovered, shared, and reused.

Face-to-face discussion, phone calls, email, and text messages are narrowcasting modes that don’t scale in this way. Blogs, Twitter, Facebook, wikis, and audio or video podcasts are broadcasting modes that do. How do we use both together in the right ways for given situations? It’s subtle. One commenter on Scott’s post writes:

My emails very rarely contain anything to blog about or update a wiki with.

What amount of email do you think is actually appropriate to becoming a blog entry in your life or in a less technical person’s life?

For what it’s worth, I think in terms of an inventory of reusable parts and the DRY (don’t repeat yourself) principle. For example, I’m often asked about how to publish iCalendar feeds from popular calendar apps. So I’ve written up a series of how-to blog posts. And I’ve encapsulated that series into a query: http://delicious.com/judell/icalpub+howto. None of those posts would have been email messages. But there are many email messages in my outbox that contain links to the series. Because the link is a query, it yields fresh results for anyone who has ever received the link in email as well as for anyone who ever will. The same posts are also quite often found directly by way of search.

Counting keystrokes is just one way to think about the underlying pattern. It’s not about typing versus talking. It’s about choosing the mix of modes that will best repay the effort you invest in communication.

Wakened this morning, about three o’clock, by Mr. Griffin with a letter from Sir W. Coventry to W. Pen

So begins today’s installment of The Diary of Samuel Pepys, as rendered by Phil Gyford. It’s a remarkable project that maps January 1, 1660 (the start of Pepys’ famous diary) to January 1, 2003 (the start of Phil’s Moveable Type recreation of the diary) and has continued faithfully ever since.

The Pepys blog is enhanced in all sorts of useful ways. People, places, and topics are cross-linked with indexes, places are mapped, all references are viewable on a timeline — it’s a brilliant example of advanced blog customization.

Back in 2003 I mused about what kind of content management system would enable somebody to do a project like this without a lot of inspired hacking. The question came up again recently when my sister Ruth decided to recreate an archive of letters that my parents wrote home from our 15-month stay in New Delhi during 1961 and 1962.

I’ve long held that blog publishing systems are really lightweight content management systems that can be used for almost any purpose. So I pointed her to WordPress.com, explained that you can use pages instead of posts to arrange items however you like, and waited to see what would happen.

Well, it didn’t work. It’s true that you can build an arbitrary collection of pages, but there’s no way Ruth would be able to manage that collection without automation. I could write code to help her, but I don’t want to. That’s partly laziness, and partly curiosity about how to use the standard kit to achieve the desired effects.

One of the biggest limitations of pages, in WordPress, is something I’d never noticed until now: No tags! So ended my plan to have Ruth use tags on pages to achieve a lightweight version of Phil Gyford’s indexes.

Why not just use posts? Originally I thought it would be cool to mimic the Pepys diary: start with a date in 1961, and continue in “real” time. But Ruth doesn’t want to do it that way. She wants to be able to process the archive in any order that’s convenient. And she wants it to read forward, like a book of letters, not backward like a blog. These perfectly reasonable requirements turn out to be harder to satisfy than you’d think.

It turns out that you can make the letters run forward on the Posts page by manipulating the publication dates. So here was the scheme I tried first:

July 2 1961 -> Jan 01 1961 15:01
July 4 1961 -> Jan 01 1961 15:00
...
Oct 19 1961  -> Jan 01 1961 04:01
Oct 22 1961  -> Jan 01 1961 04:00

In this scheme, every letter maps to the same day, chosen arbitrarily as Jan 1, 1961. Every month maps to an hour of that day, each letter maps to a minute within that hour, and the times run backward. Since WordPress reverses the sequence again when displaying items on the Posts page, that makes time run forward in that view.

The benefits are huge. Now Ruth can use tags to organize sets of letters, imposing as much or as little structure as she wants. Views by tag are neatly presented as sets of blurbs with “Continue reading” links. Each item automatically links to its predecessor and successor.

But there’s irreducible weirdness too. For example, the Jan 01 1961 date — which has now become an abstract database key used only for sorting — is part of every post URL. You wind up with patterns like this:

/1961/01/01/june-30-1961-from-anita/

This gets even weirder because dates prior to the start of Unix time — Jan 1, 1970 — don’t display in the management UI. However that turns out to be both a feature and a bug. It’s a feature because WordPress reverts to the current date for display, so you see “Posted on June 28, 2010 by Ruth” instead of “Posted on January 1, 1961 by Ruth.” And it’s a bug because you can’t easily scan and adjust the dates that control sorting.

More weirdness arises from the deeply hardwired assumption — in WordPress, but also in all blogs, really — that entries post in reverse chronological order. Although the backwards time mapping seemed at first glance to work, it turned out to be broken in two ways. On the Posts page, after the break, the link pointed to “Older entries” which were really, in our scheme, “Newer entries.” And within posts, the next and previous logic was also reversed.

So for now I’ve gone back to a forward mapping of hours and minutes within Jan 1, 1961. I’ve ditched the default Posts page in favor of a hand-crafted page that presents items in ascending order. Once you’re in an item, the next and previous links work as expected because, when you move from item to item, WordPress uses a forward arrow of time.

I’m not complaining. It’s astonishing that WordPress provides a free service that Ruth can use publish this archive of letters, and I’m hugely grateful. I think we’ll be able to come up with a technique that will satisfy her requirements — without demanding heroic effort from her or custom software from me. But it sure is interesting to see what happens when you mess with a blog’s notion of the direction of time.

In Defensive surveillance for cyclists I made a LazyWeb request for a helmet-mounted camera that can strongly identify passing vehicles. John Faughnan isn’t in a position to satisfy that request, so he did the next best thing: he refined the specification.

That makes at least two customers. How many more, I wonder? If only there were a way to make the demand visible.

Yesterday the Flickr blog announced that the Keene Public Library has joined the Flickr commons. I’ve been watching the library’s photo stream for a few years now, as its archive of historical photos and postcards and has been steadily and carefully uploaded, described, cataloged, and tagged.

Last weekend I went rock climbing with some friends at the Stone Arch Bridge and wondered what it looked like when the trains ran. Here’s a postcard that answers the question.

Postcard of the Cheshire Railroad Bridge (Stone Arch Bridge) in Keene New Hampshire, also called the Keystone Arch Bridge. The bridge had a 90 foot span and was 60 feet high at the center of the arch.

“An enduring example of the excellence which characterized the construction of the Cheshire Railroad. It was designed by Lucian Tilton. The stone came from a quarry on the Thompson farm, within a half-mile of the bridge. The keystone was set Dec.9, 1846. The removal of the original parapet and the substitution of an iron railing has detracted somewhat from its beauty.”

This postcard says “Largest Stone Arch in New England.”

Nice!

I’m just wrapped up a screencast about the elmcity project. It’ll stand in for me at an upcoming event I can’t attend, and serve as an explanation I can point others too. This is the first screencast I’ve worked on in ages, and also the first in which I appear as a picture-in-picture talking head. The process has been challenging, and I want to write about it while the details are fresh.

Software teleprompters

After writing the script, I realized I’d need a teleprompter in order to read it effectively into the camera. You’d expect to find lots of software prompters floating around on the web, including some free ones, and you’d be right about that. But I had to work through a bunch of them before I found one that worked well for me. I tried CuePrompter, TeleKast, and many others. All failed in some dimension of control: margins, speed, transport. Finally I settled on PromptDog, which is free to try but is the one I’ll buy when I go this route again. It does everything well, but what really put it over the top for me was the way it wires scroll speed to the mouse wheel.

If you’ve read from a software-based teleprompter before, you’ll already know this, but it was new to me. No matter what scroll speed you choose, you need to vary it as you go along. That’s because words and sentences take varying amounts of time to speak, but you need to keep your eyes focused near the top of the screen where the camera sits. With most of the programs I tried, you manage this focal zone by stopping and starting the scroll. But for me, at least, the stops and starts were distracting. PromptDog’s mouse-wheel-driven variable speed control made it much easier to stay in the focal zone. Reading from a software teleprompter is hard, at least for me. I was happy for all the help I could get.

Picture-in-picture video

For this screencast, I upgraded from Camtasia 5 to Camtasia 7. It can record directly from a camcorder, but my second-hand Panasonic PV-GS400 doesn’t seem to work well in that mode. So I recorded to tape, imported the results to a file, and imported that file into Camtasia as a PIP (picture-in-picture) video. On import you tell Camtasia how big your PIP window will be, and where it will show up in the larger video window. I made the PIP window a quarter the size of, vertically centered in, and flush right with the larger 1024×768 video window.

I’d sssumed you could move the PIP window around, and grow it or shrink it, to accomodate different kinds of underlying screencast action. But that assumption was wrong. For a given segment of PIP video, the window stays where you put it. This leads to my first feature request for Camtasia 8: a PIP preview rectangle when recording the screen.

Often it’s OK to let the PIP video just overlay the screen action. But sometimes you don’t want it to hide an essential part of the screen. To avoid that, you have to compose the screen around the PIP window. Lacking a visual cue for the PIP window’s borders, I had to guess. Often I guessed wrong, and had to recompose and reshoot a piece of screen action.

Note that you can vary the size and location of the PIP window by splitting the PIP video into segments and assigning different sizes and locations to each segment. That’s a lot of work, though. And you don’t really want to split the PIP video into segments because then you can’t manipulate the whole track.

Editing audio, motion video, and screen video all together

I made things hard on myself because I’d forgotten that Camtasia invites you to do more integrated editing than you should. In principle you can, for example, run a noise-reduction pass on your audio in Camtasia. In practice, I would prefer to do that in Adobe Audition, which does the job faster and better. What I should have done is grab the sound track out of the captured motion video, run Audition’s noise reducer, recombine the audio and video, and then import into Camtasia for editing.

Instead I edited everything down in Camtasia, then tried to do an export/process/import pass on the audio. When you export, Camtasia renders the audio based on your edits. Unfortunately it came out a few seconds longer than expected. I think that’s because the differing frame rates for screen video on the one hand, and motion video plus audio on the other, make it hard to keep things in synch. Next time around I’m going to try matching the frame rates to see if that helps.

(In the end I decided it was worth redoing the edit anyway, so I split the AVI file I’d recorded from the camcorder, fixed the audio, imported it back into Camtasia, and redid the relatively few edits I’d made to the PIP video.)

In the past, I’ve done some carefully edited screencasts where things that I say are tightly synched to things happening on screen. (“…when I click on this link, we see that …) It’s easy to pull that off when you can’t see the speaker, because you can mess with the screen video, or the audio, or both. When you can see the speaker, it’s much harder. Motion video isn’t nearly so forgiving as audio, so you have to do almost all the synch adjustment in the screen video. Or else re-record some or all of the motion video.

To PIP or not to PIP?

Is all this effort worth the trouble? When Scott Hanselman surveyed his readers about screencasts, he asked, among other things, “PIP or no PIP?” More than half agreed with the statement: “Too much PIP (Picture in Picture) video of the presenter is distracting.” And I think that’s true for screencasts that show how to do stuff with software.

When a screencast shows why to do stuff with software, though, I think the talking head may make more sense. Now, my instinct is to be a voice only, as I am on my podcast. But if the screencast is going to represent me at an event, it seems like I should try to project myself there.

More broadly, the topic is something I care about and have struggled to communicate effectively. If this method of presentation works better than others I’ve tried, even if only for some people, then it’s worth doing. My communication kit needs as many tools as I can pack into it.

Now that I’ve knocked the rust off my screencasting skills, I’m looking forward to redoing this video based on feedback. And since it was made for a ten-minute conference slot, I should probably also do some shorter versions that will work in different contexts.

One thing that’s becoming terribly clear: If I want these to make sense to broad audiences, I need to speak plainly and illustrate with simple everyday examples. I’ve been embarrassingly slow to figure that out, but I am learning. In the screencast I just wrapped, which is all about syndication, I never use that word. It’s a start!

This could have been me:

A bicyclist riding along Old Homestead Highway was hit by a vehicle Friday evening.

At about 6:43 p.m. Swanzey Police and Fire Department responded to a reported hit-and-run accident on Route 32.

The vehicle was described as a white SUV, possibly a Chevy Blazer, with a black roof rack. It’s missing its passenger-side mirror as a result of the accident, according to Cpl. Robert Eccleston of the Swanzey police.

The cyclist suffered serious injuries and was transported to Cheshire Medical Center/Dartmouth-Hitchcock Keene.

A couple of years ago it was me. I got sideswiped on a bike ride in another part of the county. In that case too, the impact also broke off the passenger-side mirror. Luckily I only suffered a bruised leg. According to a follow-up report, this cyclist suffered “skull fractures on the left side of his head, where his helmet hit the pavement, a broken shoulder and severe road rash.”

When it happened to me, I was furious for weeks. Every time I saw a sedan similar to the one that knocked me off my bike I looked for the telltale missing passenger-side mirror. And I formed a clear idea of a product that might have prevented the hit-and-run, or failing that, nabbed the perpetrator. It’s a pair of bicycle-mounted cameras, front and rear, that trigger on approaching traffic and take sequences of shots that can identify approaching vehicles.

Here’s why I imagine this could work. I don’t know about yesterday’s hit-and-run, but in my case it didn’t feel like an accident. We were the only two vehicles on the road. There was plenty of room for the car to give me wide berth. But some motorists like to hassle cyclists verbally, and once in a while that escalates to a cat-and-mouse game. That’s a game people these people play because they think they can get away with it. There’s no expectation that the sideswiped cyclist will be able to prove that it happened, or capture the identity of the car. In my case, when I jumped to my feet after tumbling along the roadside, only to see the car speeding over the top of the next hill, I remember thinking: “You bastard, if I only had your license plate number you would regret this.”

Defensive surveillance isn’t just a capability that cyclists need, of course. It makes sense for motorists to identify and record oncoming traffic too. But car-on-car violence is a game played on a level field. Car-on-bike violence is so unequal that I’ll jump at any advantage I can get.

Does the product I imagine already exist? Maybe, but I don’t think so. There are obviously scads of cheap helmet- or bike-mountable cameras. What I’m looking for, though, is one that’s optimized for defensive surveillance. I think that means a gadget that senses oncoming traffic, and then shoots sequences of high-resolution stills. Ideally it’d come with two pairs of mounts. One pair would be fitted to my bike’s handlebar and seat. The other pair would be fitted to my car’s dashboard and rear deck. For extra credit, the car would keep the cameras charged so they’re always ready to defend the bike.


PS: Meanwhile, my low-tech solution is a helmet-mounted rear view mirror. I have always used one, and can now scarcely imagine what it used to be like to have to crane my head around — and wobble my bike — in order to see what’s behind me. With a helmet mirror, situational awareness only requires rapid eye flicks that become an automatic habit. Obviously the habit wasn’t fully automatic, but after the incident a couple of years ago I’m even more vigilant. I watch every car that approaches from the rear, and am always mentally preparing a dive into the ditch.

Noting that Windows 7 has been shipping with multi-touch support since October 2009, Charles Fitzgerald recently asked: Where are the Windows 7 tablets? Well, I’ve got one. It’s the Acer Aspire 1420P, which is same the machine that Microsoft PDC attendees got last fall. The moniker is “convertible tablet PC” but for me, it’s really a “do-everything PC” because I use it in three modes: as a desktop, laptop, and tablet.

In tablet mode it’s no iPad, I’ll be the first to admit. But as a general-purpose machine that morphs into a tablet, it has exceeded my expectations. Conventional wisdom holds that Windows 7 running standard apps can’t make effective use of a multi-touch screen. But while standard apps clearly aren’t optimized for multi-touch, basic gestures work and are very useful. Tapping and scrolling are my staple gestures, but I was delighted to find that pinching and spreading map to font size adjustment in browser windows. I do this all the time now.

My primary use for tablet mode is reading — mostly reading web pages. Before I got this machine, I had already developed the habit of loading up a bunch of pages into browser tabs, using Readability to discard the cruft, and then kicking back on a sofa, or in an airplane seat, to cycle through the tabs. Now I can do this with the Acer in tablet mode. At 1.7kg (3.8 pounds) it’s not something I can conveniently hold for a long time without propping up with my legs or with a pillow. But to put things in perspective, Wolfram Alpha reports that 1.7 kg is 0.68 x the mass of the book A New Kind of Science.

Full-on tablet mode is just one option for reading, though. The other day, sitting in an airport bar reading, I used the machine in laptop mode but with the screen spun around so that the keyboard was safely away from my drink. Later, in a meeting with colleagues, I spun the screen to a variety of angles to show things to them, and to enable them to show things to one another. Now that I’ve had a taste of this kind of flexibility, I’ll never want another laptop that doesn’t have a screen you can spin around and fold back.

As a pure laptop it’s a bit of a compromise, as you’d expect. The keyboard is solid, but the screen outweighs the body of the machine which can make it tippy. The screen is also wider and skinnier than I’d like. That said, multi-touch makes it a different kind of laptop than I’ve ever used before. Now, when using other machines, I find myself reaching for the screen to scroll or adjust fonts. It’s true that general-purpose computers aren’t optimized for touch. But it’s an incredibly useful adjunct. I won’t ever want another computer that doesn’t support touch.

With previous laptops I’ve always used docking stations. For the Acer, though, I just plug in a giant second monitor. And I use a USB keyboard/mouse adapter to command the machine from my Captain Kirk chair.

Now I’m really looking forward to a next-gen version of this do-everything computer. It would be a bit squarer. It would be a bit lighter — say, .4 x the mass of A New Kind of Science. The accelerometer and multi-touch display would be more responsive. Given all that, though, I’m not sure I’d ever need or want a slate-style tablet. This machine has raised my expectations for just how flexible an all-purpose computer can be.

In my town there’s one guy who does shoe and leather repair: Ed Hutchins. He resoles my Birkenstocks, fixes kids’ hockey skate boots, refurbishes leather jackets. If you search the web, you’ll find two reviews of Ed’s work:

1. They do a great job repairing shoes here and just recently he did a fantastic job repairing my son’s hockey skate boot. His rates are very reasonable the only downfall is he takes his time so don’t be in a hurry to get your item back soon.

2. Ed Hutchins is a very nice man who does very good work. The only complaint I can think of is that he is not the fastest at getting work done. He does a fabulous job though and his repairs last!

Those comments appear on a number of sites: Kuzdu, InsiderPages, LocalTom. Beyond address and phone number, that’s all the web knows about Ed’s business. The comments are accurate. I’ve been waiting months for my Birkenstocks. It’s clear to me that there’s unmet demand here for shoe and leather repair. But you’re somebody who could help Ed meet that demand, how could you know about the opportunity?

In principle, the demand can be made visible. I think it was Esther Dyson who coined the phrase visible demand. In 2006 she published an issue of Release 1.0 entitled Visible Demand: The New Air-Taxi Market. The idea, which I discussed at length with DayJet’s Ed Iacobucci, is that when we use the web to aggregate demand — in this case, for direct flights among regional airports — we can optimize the delivery of services.

The same idea shows up in Eventful Demand:

1. Demand that your favorite performers come to your town.

2. Spread the word to get your friends and family to join the Demand.

3. Eventful will alert you when your events are scheduled.

DayJet was up and running until the 2008 credit crunch killed it. Eventful Demand is alive although not really kicking. (It’s unlikely that you’ve ever Demanded a performer. And I suspect it’s also unlikely that you’ve ever attended a performance at a venue chosen to satisfy a Demand.) But the idea of visible demand seems so powerful, and so right, that I hope it will play out on a broader stage.

How? That’s a $64 billion question which I hope people smarter than me are trying to answer. Meanwhile, I’ll just put this fact out onto the web. If you’re a great shoe and leather repairer, and you’d like to ply your trade in a picturesque New England town, the folks in Keene will welcome you with open arms.

Last week Peter Wayner wrote in the NY Times about the Canon Hack Development Kit, which makes it possible to write scripts to control Canon PowerShot cameras. The article describes how hobbyists fly this kit on weather balloons to perform high-altitude surveillance.

It can also be used for low-altitude surveillance. Last week I moderated a panel at Gov 2.0. One of the panelists, John Crowley, showed ultra-high-resolution (9cm/pixel!) imagery of the Gulf oil spill that was taken from a kite carrying one of these hacked Canon cameras. This isn’t just way faster and way cheaper imagery than we’ve seen from official sources. It’s also way better.

John Crowley doesn’t regard this as a hobby. Working for the Harvard Humanitarian Initiative and STAR-TIDES, he researches and develops emergency infrastructure for stressed populations. That means shelter, water, power, and sanitation, but also information and communication technologies (ICT). Kite surveillance of the Gulf was one of his compelling ICT examples. Another was an OpenStreetMap project that collaboratively mapped out the affecteed areas of Haiti in the days following the quake. I don’t yet have links for these examples but I’m going to ask @jcrowley to provide them, and I hope he’ll join me for an Innovators podcast.

When I posted Permalinks and hashtags for city council agenda items last week, I embedded a permalink and a hashtag to illustrate the idea. The post links to the video of Keene’s recent city council meeting, at the point where Patty Little introduces Tom LePage’s request to expand the Armadillo’s sidewalk cafe. The post also refers to this agenda item using the hashtag generated for it by the Granicus system.

I figured this would enable two ways to find pages, like my blog post, that refer to agenda items, like Tom’s request. First, you could search for pages that mention the hashtag. For example, this combined search of Google and Bing for granicus732_7716 finds my blog post because it mentions that tag. These searches also find my tweet containing the tag, and some echoes of the tweet. Finally, of course, you could search Twitter directly for the tag.

A second approach would be to search for pages that link to the video segment. I expected to be able to find my blog post by searching for this permalink which it cites:

http://keene.granicus.com/MediaPlayer.php?view_id=2&clip_id=77&meta_id=7716

I planned to use the link: operator, which finds pages pointing to an URL. And I figured this would work for both Google and Bing. But I was wrong on several counts. Bing doesn’t seem to support the link: operator. And even though Google does, this query doesn’t find my blog post.

Using the permalink as a plain search term doesn’t work either. And after reviewing the advanced search operators for both Google and Bing, I’m left wondering: How do you find pages that cite a permalink?

One weekend last year I was hiking with my dog along the Washington Street Extension in Keene, NH. It’s the old Route 9, now an abandoned road that runs alongside Beaver Brook and climbs up to Beaver Brook Falls. The road has been returning to nature since before we came to Keene. It’s lined on both sides, for over a mile, with 25-year trees that now entangle a course of utility cables. On that hike last year, I wondered if the owner of those cables might want to take a look and maybe schedule some pruning.

I tried calling the power company first. Directory services gave me the main number, but I failed repeatedly to find any path through the IVR system that would enable me to report the problem. When I got home I also failed to find the PSNH web page that has number to call: 1-800-662-7764. (Menu path: Residential or Business -> Safety Center -> Tree Trimming. Effective search: tree trimming not report a problem.) When I tweeted my query to Martin Murray (@psnh), though, he got back to me promptly. It turns out these aren’t power cables, they’re telephone cables.

So I tried to report the problem to Fairpoint. Again there was no obvious way to do it online. And I couldn’t find anybody at the phone company who would answer the phone on the weekend. Eventually I got distracted by other things and never followed up.

Fast forward to yesterday. I’m hiking with my dog along the same abandoned road. The 25-year trees are now 26-year trees. And some big 60- and 80-year trees, tilting on banks eroded by spring floods, threaten to bring down the cables.

So I call again. There’s got to be some way to report this, right?

It becomes a game. Every path through the IVR system leads, after much delay — and, infuriatingly, an advertisement — to a message saying that business hours are Monday through Friday, 9 to 5. I might have tried the website again, but:

a) I am not carrying a connected, browser-equipped device.

b) You are the fracking phone company. Answer the phone!

Finally somebody answers. It’s Patrick, in Internet tech support.

Patrick: What’s your phone number for DSL service?

Me: 603.355.xxxx

Patrick: And what operating system are you using?

Me: Never mind that, here’s the deal. I’m standing on the old Washington Street Extension, looking at what I suppose is Keene’s Internet trunk. There are 26-year-old trees entangling it for a mile. And right here, at pole 13-T, there are 60- and 80-year old trees leaning at a 45-degree angle over the cables. They’re going to bring those wires down in the next big ice or wind storm, if not before.

Look, I know this isn’t your department, but I’m having a hell of a time finding anybody at Fairpoint who cares about this. There must be some way to report the problem.

Patrick: I totally get what you’re saying. But you’ve reached the lowest guy on the totem pole. And, I hate to say it, but this really isn’t my department.

Me: I know. But you’re several hops closer to the right department than I am. Can you please just take a report, email it to your supervisor, and cc me on the email?

Patrick: OK, hang on…done.

Me: Thanks Patrick! You may have just prevented a whole shitload of Internet technical support calls!


Update: Got these responses from @MyFairPoint on Monday AM:

@judell Hi, Jon – thanks so much for the heads up (just saw your tweet come up in our alerts). I really appreciate you looking out!

@judell Also, our active acct is @MyFairPoint and we’re working to ramp up our social media efforts, so expect to hear more soon! Thx again!

@judell – I’ll see what I can do based on this and your attached article. ^JP

Nice!

Among the agenda items that came before the Keene City Council last night was a request by Tom LePage, who runs Armadillo’s Burritos, to extend his sidewalk cafe around the corner onto Railroad Square. I’m in favor of it! As we head into the summer season, I’ll be going to Armadillo’s a lot. I always want to sit outside, but there are only a few tables out front, and they’re usually occupied. Around the corner, where the restaurant abuts Railroad Square, there’s more space available, and it’d be fun to have a ringside seat for the various musical, artistic, and political activities that happen in the square.

Tom’s proposal came up at this point in the meeting. The relevant piece of video, served up by Granicus, lasts only about five seconds. That’s how long it took for city clerk Patty Little to mention the item, and for mayor Dale Pregent to refer it to the Planning, Licences, and Development Committee. But thanks to a new feature added to the Granicus service, that bit of city business now has a permalink and also a hashtag (#granicus732_7716).

This is a simple idea, it’s easy to implement, yet it’s a powerful enabler of modes of communication that we all envision. When the folks on the Planning, Licenses, and Development Committee gets around to considering Tom’s request, they could — and I hope they will — search the web for pages (like this one) that use the permalink, and tweets (like the one I’m about to write) that cite the hashtag. Citizens who want to express views on the matter can do so in their own online spaces, wherever those may be. No single authority is responsible for monitoring, or gathering, or moderating, or displaying the set of items joined together by these unique tokens. But the web’s ability to find that set of things, easily and reliably, assures that they can be brought together in a variety of contexts, to serve a variety of purposes.

The title of a recent post, Every package has its own home page on the web, echoes an epiphany that Andrew Schulman had in 1997 when he realized the implications of every Fedex package having its own unique URL. Every piece of public business should have one too. It’s easy to mint new unique names for things. It’ll be a bit harder to show people how and why to use those names as rendezvous points for loosely-coupled, decentralized interaction. But I hope examples like this one will help get the idea across.

My wife Luann wants to help promote an annual event called the Fall Foliage Artist Studio Tour (FFAST). The organization has a website, and could publish a calendar there, but a calendar with only a single date doesn’t make much sense. And yet this event wants to be written down only once, then flow through the Keene hub as well as other local and regional hubs. How can you arrange that?

As the curator of the Keene hub, I keep a special calendar of one-off and recurring events. These are events that I happen to know and care about, aren’t available in any existing calendar feed, but ought to be syndicating through the hub. I only do this for stuff that I care about, though, and the FFAST event is Luann’s thing, not my thing.

She’s willing and able to curate certain art-related events for our region. One way to do that would be for her to spin up a new elmcity hub for the purpose. But that’s a heavyweight solution. For things like FFAST she needs something lighter. Hence the method described in this post, which for lack of a better term I am calling subcuration.

The idea, in a nutshell, is to combine private and public use of an online calendar. I’ll demonstrate it for Google Calendar, and also for Windows Live Calendar. In both cases, the method entails:

1. Using a private calendar for your personal stuff.

2. Using an auxiliary public calendar for public stuff.

3. Viewing both calendars together so you see everything, just as if you kept it all in your personal calendar.

4. Making the public calendar’s iCalendar feed available for syndication.

Subcuration with Google Calendar

My personal Google Calendar is called Jon Udell (private). To verify that it’s private, I can follow this trail of links from the GCal home page: Settings -> Calendar Settings -> Calendars -> Jon Udell (private) -> Share this calendar. The checkbox named Mark this calendar public is unchecked, as it should be.

Now I’ll create a new calendar, called Jon Udell (public). To make it public, I check the checkbox.

As Google explains, that means the events here will appear in public Google search results. As Google does not explain, it also means that the iCalendar feed for this calendar is open to syndication.

Now I’ll add the FFAST event to my public calendar:

Here’s a view of both calendars. It combines stuff from my personal calendar (birthdays) with stuff from my public calendar (FFAST). From this point of view, it’s just like keeping everything in my personal calendar.

But there’s a key difference. The public calendar has a public iCalendar feed, and I can give its URL to the curator of a syndication hub. To find the URL, I follow this link trail: Settings -> Calendar Settings -> Calendars -> Jon Udell (public). Scrolling down from there, I find a section labeled Calendar Address which contains:

The URL for the iCalendar feed is hiding behind the green ICAL button. To capture it:

1. Right-click (or alt-click) the button.

2. Copy the link address.

3. Bookmark it (if you’re a curator), or paste it into an email to a curator (if you’re a subcurator).

In case you’re curious, here’s the actual feed that a personal calendar app, or a syndication hub, will retrieve at that URL:

BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Jon Udell (public)
X-WR-TIMEZONE:America/New_York
BEGIN:VEVENT
DTSTART;VALUE=DATE:20101009
DTEND;VALUE=DATE:20101011
DTSTAMP:20100519T151655Z
UID:kf4e4qjk08tfd0cmm1v9mc5kbc@google.com
CREATED:20100519T150628Z
DESCRIPTION:http://www.fallfoliageartstudiotour.com/
LAST-MODIFIED:20100519T151054Z
LOCATION:http://www.fallfoliageartstudiotour.com/
SEQUENCE:2
STATUS:CONFIRMED
SUMMARY:Fall Foliage Art Studio Tour
TRANSP:TRANSPARENT
END:VEVENT
END:VCALENDAR

Subcuration with Windows Live Calendar

As before, my private calendar is Jon Udell (private). Now I’ll create a new calendar, called Jon Udell (public).

To make it public I click Edit Sharing which leads to:

Here I check Share This Calendar and Make Your Calendar Public.

Now I add the FFAST event to the public calendar:

Here’s the same combined view of private and public events:

To capture the URL of the public iCalendar feed, I follow this link trail from the Live Calendar home page: Calendars -> Jon Udell (public) -> Edit sharing -> ICS: Import into another calendar application. That leads to:

That’s is the URL of the iCalendar feed. When a client (personal calendar app or a syndication hub) retrieves the feed, it gets this:

BEGIN:VCALENDAR
METHOD:PUBLISH
VERSION:2.0
PRODID:-//Microsoft Corporation//Windows Live Calendar//EN
BEGIN:VTIMEZONE
BEGIN:VEVENT
UID:29ca7340-9f29-43f5-a62e-7e989ddb99a9
CLASS:PUBLIC
X-MICROSOFT-CDO-BUSYSTATUS:FREE
TRANSP:TRANSPARENT
SEQUENCE:0
CREATED:20100519T164446Z
LAST-MODIFIED:20100519T164446Z
DTSTART;VALUE=DATE:20101009
DTEND;VALUE=DATE:20101011
SUMMARY:Fall Foliage Art Studio Tour
LOCATION:http://www.fallfoliageartstudiotour.com
PRIORITY:0
END:VEVENT
END:VCALENDAR

Summary

These two examples illustrate a set of principles in the context of two different online calendar applications. The same principles will apply to other calendar applications that support multiple calendars, and can publish selected calendars in iCalendar format to open URLs.

The principles are, once again:

1. Use a private calendar for your personal stuff.

2. Use an auxiliary public calendar for public stuff.

3. View both calendars together so you see everything, just as if you kept it all in your personal calendar.

4. Make the public calendar’s iCalendar feed available for syndication.

The KUOW Speakers’ Forum continues to deliver the most consistently valuable talks I listen to these days. The latest is Hernando de Soto on Shadow Economies. It’s about facts, relationships, linked data, identity, property rights, the rule of law, derivatives, toxic assets, and permanent credit crunch. Bottom line: We need to get the facts about those assets, link them together, and bring them out of the shadows. So far as I can tell, the current crop of financial reform bills aren’t saying that. The following excerpts from de Soto’s talk explain why they should, and also why they probably won’t.


Facts were the subject of all the reformers who made the market economy come into being, between 1850 and 1950. We’re all clear about the ideology of the people who talked about the market, and the capitalist system, from 1750 to 1850: Adam Smith, Marx. They all talked about division of labor. What they didn’t say is that once labor is divided, and you have many sources of production, how do you coordinate them?

That crisis actually came. The whole system faltered in the 19th century because feudalism had collapsed, patrimony had collapsed, there was freedom, but freedom without law and structure. So different people, who wrote very little — you find the details in things that stopped being published a hundred years ago — said, We are in front of swarms of facts. They have nothing to do with our immediate vicinity, our village, our feudal lots, it’s about the world as a whole, and we can’t digest it.

So, property rights had to become universal. We had to make them explicit as facts. And we had to make sure that everybody had access to a new business instrument, the corporation. Before, even in the US, you needed an act of Congress to make a corporation. That changed. It was a big battle, but finally the argument that won was, they’re doing it anyway, and if we don’t get them on the books they’ll stay in the shadows. So gradually textiles, and cotton, and machinery started recording facts, and it all started coming under property law.

Facts isn’t just information. Here we have an apple, it’s mine, it looks just like a stolen apple, but it has a property right associated with it. That apple can be bought, sold, rented, used as a mortgage, there are a hundred things I can do with the apple. Those are its relations to the rest of society. For that you need something that describes those relations.

Charles Sanders Peirce, when asked to describe the universe, said: “Things in relation to one another.” The wonderful thing about the rule of law, especially as developed in the United States, is that you’ve been able to put together things and relationships in organized documents that are accessible and actionable. When that happens, the shadow economy goes away and you’re in control. You know who you’re dealing with, and you know what their assets are.

Now, here’s my concern about what’s happening with the recession. I’m watching TV, October 2008, and I see your Mr. Paulson, secretary of the Treasury, say, “We’re in trouble. We have troubled assets. So I’m going to buy them up, and then we’ll see what’s what.” Basically, he was saying: “We don’t have the facts, so I’ve got to produce them so we know who’s solvent and who isn’t.”

Later, I turn on the TV and he says, “We’ve thought about it, and we’ve decided we’re not going to buy the toxic assets, these derivatives, and sort them out. Instead we’ll just give enough money to the banks so that everybody knows they’re not going to break.” In other words, I’m not going to find out where the assets are, or record property rights.

Why that change? I asked. The reply was: “Well, he couldn’t find the toxic assets.” I thought that was really interesting. In the United States, everything is recorded: every house, every car, every boat. You know where things are. You’ve got facts. It is a factual economy, not like my economy which is a shadow economy where there are no facts.

I asked Chris Cox: “How many of these assets that are called derivatives are not on record?” And he said, “Well, we think there’s 600 trillion dollars of them.” That violates the crucial law of property as you have developed it over 150 years. No wonder nobody feels safe. You have created the world’s largest shadow economy.

As long as you don’t know who owns the greatest amount of your assets, there’s no info as to who owns what, who is related to what, you have a shadow economy. We live in one, and it has as a characteristic a permanent credit crunch. We know more about it than you do. Credit crunch is where you don’t know who you’d be lending to, so you don’t lend. It’s permanent, we live with it, and now you’re going to have to learn to live with it too, because until you know who is solvent how can you give anybody credit? You’re flying blind.

Einstein used to say: “What does the fish know about the water in which it swims?” That was his way of saying you have to be outside the aquarium to understand what’s going on inside the aquarium. Well, as an outsider looking in, I’m a great admirer of the United States, of your rule of law, which says that everything has to be identified because you are a nation of facts. As opposed to us, a nation of rumors and shadows. But you’ve slipped up really badly. You’ve got to get your banks to put these things on the record.

Back in the 1930s, Roosevelt saw that it was important to find out how much liquidity there was. To do that he needed to know where the gold was. He made a law, you had to record your holdings of gold or go to jail for ten years. Very soon he knew where all the gold was. That’s where you’re at. The problem is, what happens if when you do it, you find out that most of your top banks are insolvent? So you’ll need to involve the FDIC. But you’ve got to get the facts.

It’s very easy to get there, but it will mean that a sector of your society that is today in power will not be in power a month later, because they’ll be broke. Peter Munk, who owns gold mines in Canada, is building a marina in Montenegro for the biggest yachts in the world. When he was thinking that the U.S. administration was going to clean up the mess, and find out where the derivatives were, he said “You see all those yachts?” (He was looking at Sardinia.) “Well, in 2011, 4/10 of them will belong to somebody else.” Those 4/10 are holding out, obviously, because they don’t want that to be known. But they’re really screwing the rest of us.

Update: From Crain’s:

The Senate legislation would push most of the $615 trillion in over-the-counter derivatives onto regulated exchanges or similar electronic systems, a measure that would make it easier for the market and regulators to track the trades.

Really? Well OK then! Fingers crossed.

In his latest essay, Cliff Gerrish riffs on my keynote at the Kynetx conference:

There was a moment when he was talking about a meeting in local government where the agenda was managed using a web-based tool. Udell talked about wanting to be able to hyperlink to agenda items, he had a blog post that was relevant to one of the issues under discussion. The idea was that a citizen attending the meeting, in person or virtually, should be able to link those two things together, and that the link should be discoverable by anyone via some kind of search. And while the linking of these two things would be useful in terms of reference, if the link simply pulled Udell’s blog post into the agenda at the relevant spot, that might be even more useful.

The reason this kind of thing probably won’t happen is the local government doesn’t want to be held responsible for things a citizen may choose to attach to their agenda items. A whole raft of legal issues are stirred up by this kind of mixing. However, while the two streams of data can’t be literally mixed, they can be virtually mixed by the user. Udell was looking at this agenda and mixing in his own blog post, creating a mental overlay. A technology like Kynetx allows the presentation of a literal overlay and could provide access to this remix to a whole group of people interested in this kind of interaction with the agenda of the meeting.

The Network provides the kind of environment where two things can be entirely separate and yet completely mixed at the same time. And the mixing together can be located in a personal or group overlay that avoids the issues of liability that the local government was concerned about.

That’s one example of a general pattern and best practice at the core of what we mean when we talk about linked data. In 1997, Andrew Schulman gave a talk entitled The Web is the API in which he meditated on the then-revolutionary UPS tracking application:

A URL can drive a process. Thus, UPS was not only opening up its business practices to its customers, but also publishing an implicit API (hmm, “the UPS software developers kit”?). For example: http://wwwapps.ups.com/tracking/tracking.cgi?tracknum=1Z742E220310270799.

Every package has its own home page on the web!

So should every city council agenda item. So should every ARRA contract. This was obvious in 1997, it’s even more obvious now.

The other day I listened to Tim Berners-Lee’s 2009 TED talk on linked data. He said nothing about RDF, but a whole lot about HTTP. The message boiled down to: “Give everything that matters its own home page on the web.”

That simple idea is easy enough to understand, but entails much that isn’t obvious. It costs nothing to mint new web namespace. We can name trillions of things, and we can declare trillions of relationships among named things. No central authority will govern these names and relationships. There is an infinite supply of unique names, each a needle in a vast haystack. Search engines can easily find any needle in that haystack, and they can easily find related needles too.

Here’s one more non-obvious point: Naming is hard. When we have to stop and think what to name something, we can end up thinking for a long time. In order to get to trillions of named things we’ll need to automate the naming. That’s part of what I was driving at in OData for collaborative sense-making. Given any set of things, the web names for those things, and for the relationships among those things, need to arise organically from the systems we use to create and share them. Information systems should mint usefully granular web namespace. If the right kind of naming is built in, we won’t have to bolt it on later. Things will naturally form relationships with other things. Views of those relationships will emerge from many perspectives, for many purposes.

At a service stop on the Merritt Parkway over the weekend, I was approached by a young couple in a jam. They were halfway to their destination, had pulled in for gas, then realized neither had brought a wallet. They were both on their phones, working the problem, and the guy looked up to ask if I’d heard of a roadside assistance program that could help in that situation. I wound up giving them ten bucks. Maybe it was a scam, in which case I only lost $10. But maybe it wasn’t, in which case I helped some folks in need.

Ten bucks wasn’t enough to get them as far as they said they needed to go, though. And later I got to thinking about how we might have created enough trust, in an ad-hoc way, for me to make a short-term loan of, say, $50. It’s an interesting thought experiment. I wonder what solutions you can imagine? Here are a few that occurred to me.

Web identity. Given a web connection, I could have searched for the couple’s names, found their web footprints, and verified that their photographs, locations, and other attributes matched what they claimed.

Six degrees of separation. If we could trace our connection through social network space, that might be enough. It might even be possible to do that with voice calls, but with a web connection it could be almost trivial.

PayPal. Given a web connection, we could have brought up a browser and done a PayPal transaction. In that case I wouldn’t even be making a loan, I’d know that the funds had been transferred before handing over cash.

Losing my wallet while traveling is a nightmare scenario for me. It’s never happened but I dread the thought. I hate being so dependent on documents that I carry around in a wallet that could easily be lost or stolen.

Those documents embody claims made on my behalf by identity providers that we have all agreed to trust. That arrangement became necessary when society grew beyond what interpersonal trust could scale out to support. And it will remain necessary. But as voice and data connectivity become ubiquitous, and as interpersonal trust scales out in ways it never could before, I wonder if we’ll see a re-emergence of pre-bureaucratic modes of identity.

Last week I said that confusion about the visibility of events in Facebook had thwarted my plan to include Facebook as an event source for elmcity hubs. The day after I wrote that post, though, Stephen Judd noted in a comment that a new data entry method has appeared — one that clears up the confusion.

Until April 30, your choices when publishing an event were:

Open: Anyone can see this Event and its content.

Closed: Anyone can see this Event, but its content is only shown to guests.

Secret: Only people who are invited can see this Event and its content.

Some people opted for Closed when they really ought to have picked Secret. With the advent of API-based search that meant automated tools like the elmcity aggregator could surface events — like surprise birthday parties — not meant to be seen.

But on May 1 the choices had narrowed to just public or private. It’s implemented as a checkbox:

[x] Anyone can view and RSVP (public event)

It defaults to checked, i.e. public. That’s consistent with the general tilt, in Facebook, toward public rather than private defaults. Many people think that’s the wrong default, and I’m inclined to agree. But at least a confusing three-valued choice has been reduced to an easier-to-understand two-valued choice.

Given that, I’ve decided to add Facebook as an elmcity event source. I’m mindful of the power of defaults, so haven’t made this a default behavior. When a curator spins up a new elmcity hub, the event sources included by default — that is, before you add any iCalendar feeds to your registry — were, and still are, Eventful, Upcoming, and EventBrite. If you want to add Facebook events, you can now do so by adding a new name/value pair to your hub’s metadata record in delicious:

facebook=yes

Curators can, by the way, now include or exclude any of the services. These are the defaults:

eventful=yes
upcoming=yes
eventbrite=yes
facebook=no

All of these settings can be tweaked.

The elmcity service finds Facebook events by searching for them using the location you specify in your metadata record. Here are some sample searches:

If you change the location parameter in that URL you can see which Facebook events will be included for your town. So far, I’m not seeing many public events, even for very populous locations. Facebook’s event system was always more appropriate for friends-and-family events that you wouldn’t expect to see on a community calendar. If you wanted to advertise an event open to the general public, services like Eventful or Upcoming or EventBrite were better ways to do it. Or you can create a public iCalendar feed.

It will be interesting to see if Facebook’s new event system, which defaults to public, produces more public events than before. To the extent that it does, it could become a useful source for elmcity curators. But if people who create public events in Facebook want that to happen, they’ll need to learn more about those events appear in other contexts.

Consider this event, one of a handful that turns up in a search for Keene, NH. Here’s what anyone can see in Facebook:

What film is being screened? Neither the title nor the description tells us. My guess is that if you know Susan Hay, and are affiliated with mothersuniting.org, that information is part of a shared context that Susan just took for granted when she posted this “public” Facebook event. When she marked the event Public it hadn’t occurred to her that the actual scope of Public means she ought to have named the film in the title or description.

Note that there is an events page at mothersuniting.org, albeit five years behind the times. My own view is that mothersuniting.org should be the authoritative source for its own event information. It could use Google Calendar, for example, to publish an HTML view of a calendar into the events page on its website, while at the same time producing an iCalendar feed that could be listed in a community registry. Facebook really ought to be a downstream consumer of that kind of event source, not an upstream producer.

But no matter what I think, there will be people, maybe a lot of people, who end up making Facebook the authoritative source for their event information, instead of their own websites. So I’m enabling curators to capture those streams. We’ll see how it unfolds. As always, it’ll be fascinating to watch people walk the slippery path that divides private from public.


PS: If you’re a developer working with the new events API, here’s an odd quirk I’ve uncovered. Dates and times reported through the Facebook API don’t correlate sensibly with dates and times reported in the Facebook application.

At first I thought this was a timezone issue, and tried various Ptolemaic adjustments to make things work out. It got weirder and weirder, until finally I went empirical and made this table of observations:

    Where: Keene: GMT-5
 FB start: 2010-05-09 19:00
API start: 2010-05-10 02:00
     Diff: +7 hours

http://www.facebook.com/event.php?eid=113825365318937

    Where: Chicago: GMT-6
 FB start: 2010-05-01 11:00
API start: 2010-05-02 06:00
     Diff: +7 hours

http://www.facebook.com/event.php?eid=114150548619789

    Where: Salt Lake City: GMT-7
 FB start: 2010-04-28 06:00
API start: 2010-04-28 13:00
     Diff: +7 hours  

http://www.facebook.com/event.php?eid=115044215191908

    Where: Fresno: GMT-8
 FB start: 2010-05-03 11:00
API start: 2010-05-03 18:00
     Diff: +7 hours

For no reason I can see, the API reports a local time that’s 7 hours ahead of the time you see when you view the event in Facebook. After making that adjustment, things seem to work. Why 7 hours? Beats me.

After long study of the psychological effects that computers and information systems are having on us, Linda Stone has turned to the physiological effects. Her elevator pitch used to be continuous partial attention. Now it’s email apnea. When we use these technologies, Linda says, we project ourselves into them, we become disembodied, we lose the ability to regulate our posture and breathing. On this week’s Innovators show she discusses what she has learned, and challenges us to find ways to remain embodied as we interact with networked devices and information systems.

Coincidentally, I got to hang out with Linda this weekend and try the HeartMath system that she’s been experimenting with. It sense your pulse, displays the variability of your heart rate, and then guides you through a breathing exercise that helps you regulate it. The HeartMath hardware and software supports regulation of the autonomic nervous system, bringing awareness to breathing patterns that emphasize fight or flight (sympathetic) or a rest and digest (parasympathetic) state.

One expert in this field, Steve Elliott, refers to this state as Coherent Breathing, and also offers a set of exercises. Now, to be honest, when I land on web pages like this one, where scholarly charts and footnotes rub elbows with ads for Swarovski Crystal Reminder Bracelets, my instinct is to move along. I’m fiercely non-mystical. I had to quit a yoga class because I just couldn’t listen to all the chatter about sun energy and moon energy. Can’t we just breathe and stretch?

The thing is, I’m also fiercely rational about physiology and health. I know that good posture, deep breathing, and slow stretching have profound benefits. I have resolved several health crises, ones that our medical system would prefer to address with drugs and surgery, by paying attention to my body and then adjusting how I use it. But I’ve never had a chance to try biofeedback. So I was intensely curious about the HeartMath system. It uses a pulse monitor clipped to your earlobe to monitor your heart rate, plus software to guide you through an exercise that levels out the variability and leads you into a state of breathing and pulse “coherence.”

For me it was easy, and fun, to achieve a high coherence score. Linda asked: “Are you a meditator?” No. “An athlete?” Yes. So that makes sense. I have decades of experience regulating my own breathing and heart rate. But never in a work context, and that’s the point Linda is driving at. In our work environments we leave our bodies and project ourselves into computers and networks. If we can reconnect with our bodies in those environments, we’ll be healthier. I can’t prove that, but I feel sure that it’s right.

I haven’t yet plunked down $300 for the HeartMath system, but I’m trying to talk myself into it. Although the company advertises it as a “desktop personal stress relief system,” I like the way that Linda is articulating a larger vision. For her, its about human performance. We are more powerful when augmented by computers and networks, but also less healthy. One answer is to decouple ourselves from computers and networks, and sometimes that’s the right answer. But another answer is to find ways to remain embodied as we use computers and networks. Linda thinks that’s a crucial way forward, and I agree.

Why not just jump on the bandwagon then? Because my antipathy to mysticism has lately also extended to geek crazes. I’m suspicious of the instinct to solve problems created by our computerized gadgets by acquiring and using more computerized gadgets. And I’m wary of the quasi-autistic compulsion at the heart of the quantified self movement whose manifesto, the data-driven life, appeared in this Sunday’s New York Times Magazine.

I have been a runner and a biker for decades. People always ask: How far did you run? How fast do you bike? I don’t know. I don’t want to know. It’s enough for me to be outside, moving over the landscape, breathing deeply, thinking my own thoughts and listening to other people’s thoughts.

I’m the kind of guy who hates waiting for a machine at the Y while the person who just did 20 reps pauses to scribble in a journal that he did 20 reps.

I’m certain that we will see, in a year or two, the emergence of 12-step programs for people who are addicted to self-monitoring.

And yet…I really liked the coherent breathing exercise. I want to repeat it, and I think it can become a helpful part of my routine.

I’ve long wanted to be able to add Facebook to the list of sources that my elmcity service queries for local event information. It was never possible before, but the recent changes to the Facebook API (and terms of service) prompted me to take another look.

At first glance, it seems doable. Here are some sample queries:

http://elmcity.info/fb_events?location=keene,nh

http://elmcity.info/fb_events?location=ann arbor, mi

http://elmcity.info/fb_events?location=portsmouth,nh

You can see what turns up for your town by swapping in your city and state. A lot of the events are public and could reasonably be included in a citywide aggregation. But then there are ones like this:

SURPRISE Lantheaume Baby Shower
1000 Market Street, Portsmouth, NH 03801
2010-06-26T20:00:00+0000

Clearly this baby shower should not appear on a citywide public calendar. Why does search find it? Let’s look at the data about this event that’s visible to the world:

{ “id”: “314667046847″,
“owner”: {
“name”: “Jesse Barnes”,
“id”: “11000551″},
“name”: “SURPRISE Lantheaume Baby Shower”,
“description”: “Baby \”Ox\” is on his or her way! Come and celebrate with the mom-to-be and her closest friends and family! Please remember to bring your decorated onesie so that we can display them for Kris. \n\nLook on this site for additional details that are still being determined. “,
“start_time”: “2010-06-26T20:00:00+0000″,
“end_time”: “2010-06-26T23:00:00+0000″,
“location”: “1000 Market Street, Portsmouth, NH 03801″,
“privacy”: “CLOSED”,
“updated_time”: “2010-04-02T15:01:10+0000″}

When you create a private event, there are three options:

Open: Anyone can see this Event and its content.

Closed: Anyone can see this Event, but its content is only shown to guests.

Secret: Only people who are invited can see this Event and its content.

Clearly Jesse should have marked this event Secret, not Closed. Until very recently, an error like that would be unlikely to result in an embarrassing information leak. But now things have changed, and people are going to start learning harsh lessons about the visibility of their Facebook stuff.

I don’t see any way to teach my service to exclude events that people marked as Closed because they thought it meant Secret. So I guess elmcity’s Facebook feature is going to have to wait until those lessons are learned.

Kingsley Idehen pointed me to this nice little Ottawa Trash Schedule app, based on a set of iCalendar feeds derived by Shawn Hooper from a set of PDF files published by the city.

These iCalendar feeds were Shawn’s contribution to the Open Data Ottawa Hackfest. In a letter to the city councillors Shawn writes:

If you are not yet familiar with the topic of Open Data, the basic idea is that information should be made available in standard formats that can be read by a variety of computer programs. This data should be available for use by the public, without restrictions.

For example, the City currently publishes its Garbage Collection Calendar as a PDF file on its website. Although PDF files are a popular format in which to present data, they are not open. All you can do with a PDF-based calendar is look at it, or print it.

An open version of this same calendar would be, in its simplest form, a list of dates on which garbage collection would occur and which types of garbage and recycling are collected on those dates. Although not as visually appealing as the current PDF calendar, this list of dates can be read by many different pieces of software to provide value-added services such as sending reminders to your cell phone or e-mail address the night your recycling should be put out, or adding the collection schedule to your calendar software of choice.

In addition to www.ottawatrash.ca, another service that can work with these feeds is the Ottawa hub of the elmcity calendar syndication service.

It’s great to see these kinds of things happening. But we shouldn’t depend on inspired citizen activism to make them happen. The publication of data is a routine act that will increasingly be performed by individuals, groups, non-profit and for-profit organizations, governments. The default setting, in almost all cases, involves publishing that data in formats that people can read but that machines can’t. We’ve got to flip the default setting. Publication of machine-readable formats along with human-readable formats has to be the new default.

In my own town, Keene, NH, I’m delighted to be able to point to an example. Last year, the schedule for hazardous household waste collection was only available as a PDF file. But now it’s in iCalendar. The next collection date shows up on the combined calendar: Saturday, May 8. It got there by way of the city’s calendar which is available both as a human-readable HTML page and an iCalendar feed available for syndication. So simple, so useful.

The endnotes for the book I’m now reading are a mixture of conventional citations and URLs. The former, expressed as publisher, book or journal title, author, date, and page number, seem not nearly so useful as the latter. Would you rather visit the library or click a link? But nowadays cited URLs also come with disclaimers like this: Accessed July 27, 2009. It might be inconvenient to verify a conventional citation in its original context, but I know that if I had to, I could. There’s no guarantee that I’ll be able to revisit a cited URL. Even if the page itself has not gone missing, there’s no way to know that the page I view on April 22, 2010 is the same one that the author viewed on July 27, 2009.

This anecdote was the springboard for my conversation with Herbert Van de Sompel about Memento, a proposed (and prototyped) method for adding the dimension of time to the web’s existing mechanism for content negotiation.

That mechanism has, to be sure, not taken the world by storm. The most common scenario involves a browser telling a multilingual server that its user prefers to read, say, French. A paper about Memento published last fall walks through the HTTP protocol that enables this negotiation. Odds are, though, that you’ve never seen this actually happen. It’s much more likely for a multilingual website to present itself as “a multiplication of language-specific mini-sites, instead of thinking of it as one site, with one set of URIs, only with different versions and languages available.” Wikipedia, for example, works that way.

The quote comes from a 2006 W3C article, Content Negotiation: Why it is useful, and how to make it work. The article blames the awkwardness of Apache’s implementation of the protocol (since corrected):

For a long time, with the most popular negotiation-enabled Web server (the ubiquitous apache), failed negotiation (for instance, a reader of french being proposed only english and german variants of a document), resulted in a nasty “406 not acceptable” HTTP error, which, while technically conforming to HTTP, failed to follow the recommendation that a server should try to serve some resource rather than an error message, whenever possible.

Is there any reason to suppose that time negotiation will succeed where language negotiation has so far mainly failed? That’s a hard question, and one I wish I’d thought to ask Herbert in the interview, but maybe we can continue the dialogue here.

Meanwhile, the fact that content negotiation is tricky to get right doesn’t invalidate the core of the Memento proposal. Time is fundamental, the web could have a reliable memory, and if we can build such a memory into the fabric of the web the benefits will be profound.

Examples are everywhere. Consider mediabugs.org. Founded by Scott Rosenberg, whom I interviewed last week, the site is dedicated to finding and fixing errors in media reports. A few days ago, the first bug was marked Closed:Corrected. The mediabugs.org bug page initially said:

Listing for Josh Kornbluth’s show “Andy Warhol: Good for the Jews?” says the show is at the Jewish Community Center in SF, but actually it’s at The Jewish Theater in the Theater Artaud building.

There’s a comment pointing out the error but it’s still showing with the wrong info on the Express home page.

And later:

This is fixed now!

If you visit the original news report, though, there’s no record of the correction. It’s no big deal in this particular case, but media organizations should want to be transparent about when and how they alter published items.

Likewise governments. The Citability project aims to account for the history of changes made to items published on government websites. As with mediabugs.org, the approach will initially require third-parties to monitor and chronicle the changes.

The Memento idea is that media organizations, governments, and other kinds of web publishers will be accountable for their own change histories.1 And they’ll do so in a standard way, so that people viewing these sites in browsers can straightforwardly say: “Show me this page as it existed on July 7, 2009.”

This is wildly ambitious, but I applaud the ambition. Every since I made the Heavy Metal umlaut screencast, I have imagined what it would be like to scroll back and forth along the timelines of evolving web pages. At one point Andy Baio sponsored a contest to write a script that would animate the revision history for any Wikipedia page, and I made a screencast of Dan Phiffer’s solution.

Clearly we want this. Will it be hard to arrive at a well-known and well-used standard? Sure. Is it worth doing? Absolutely.


1 Third-party watchdogs will often be needed, of course. We’d like to trust self-reported change histories, but we’d also like to verify them. Even so, third parties shouldn’t be the only mechanisms. Self-reported histories should exist.

While reading Jonathan Safran Foer’s Eating Animals I got to wondering about global and national trends in the production of meat and fish. He mentions, for example, that US chicken production is way up over the past few decades. How do we compare to other countries? Here’s how I answered that question:

The screenshot is of Excel 2010 augmented by PowerPivot, the business intelligence add-in that’s the subject of last week’s Innovators show with John Hancock.

Using the same spreadsheet, I asked and answered some questions about fish production:

What’s the worldwide trend for capture versus aquaculture?

How much fish are certain countries capturing?

On a per capita basis?

How much fish are certain countries growing?

On a per capita basis?

The book raises very different kinds of questions. How should we treat the animals we eat? How much land should we use to raise crops that feed the animals we eat, instead of raising crops that feed people directly? We won’t find the answers to these kinds of questions in spreadsheets. But what I hope we will find — in spreadsheets, in linked databases, in data visualizations — is a framework that will ground our discussion of these and many other issues.

In order to get there, a number of puzzle pieces will have to fall into place. The exercise that yielded this particular spreadsheet led me to explore two that I want to discuss. One is PowerPivot. It’s a tool that comes from the world of business intelligence, but that I think will appeal much more widely as various sources of public data come online, and as various kinds of people realize that they want to analyze that data.

The other piece of this puzzle is Freebase Gridworks, which I’m testing in pre-release. The exercise I’ll describe here is really a collaboration involving Excel, PowerPivot, and Gridworks, in a style that I think will become very common.

My starting point, in this case, was data on fish and meat production from the Food and Agriculture Organization, via a set of OData feeds in Dallas. These series report total production by country, but since I also wanted to look at per-capita production I added population data from data.un.org to the mix.

To see how Gridworks and Excel/PowerPivot complement one another, let’s look at two PowerPivot tables. First, population:

Second, fish production:

PowerPivot is relational. Because these tables are joined by the concatenation of Country and Year, the PerCapita value in the production table is able to use that relationship. Here is the formula for the column:

=[value] / RELATED(‘population’[Pop in Millions])

In other words, divide what’s in the Value column of the production table by what’s in the Pop in Millions table for the corresponding Country and Year. This declarative style, available in PowerPivot, is vastly more convenient that Excel’s procedural style which requires table-flattening and lookup gymnastics.

But here’s the thing. In the world of business intelligence, the tables you feed to PowerPivot are likely to come from relational databases with strong referential integrity. In the realm of open-ended data mashups from a variety of sources, that’s not likely. For example, the Food and Agriculture series has rows for United States, but the population series has rows for United States of America. You have to reconcile those names before you can join the tables.

Enter Gridworks. To feed it the list of names to reconcile, I took this population table from Excel:

And stacked it on top of this food production table from Excel:

The only column that lines up is the Country column, but that’s all that mattered. I read the combined table into Gridworks, and told it to reconcile that column against the Freebase type country:

On the first pass, 8146 rows matched and 868 didn’t.

I focused on the rows that didn’t match.

And then I worked through the list. To approve all the rows with British Indian Ocean Territory, I clicked on the double checkmark shown here:

Sometimes the reconciled name differs:

Sometimes Gridworks doesn’t know what match to propose. One problem name that came up was Côte d’Ivoire, not Côte d’Ivoire, which is something that happens commonly when the proper character encoding is lost during data transfer. In that case, you can search Freebase for a match.

Proceeding in this manner I quickly reduced the set of unmatched names until I got to one that should not match.

Should it be Belgium? Luxembourg? Actually neither. At this point I realized that the population table was a mixture of country names and region names. I wanted to exclude the latter. So I matched up everything else, and was left with 202 rows that had names like Belgium/Luxembourg, Australia/New Zealand, Northern America, Western Africa, and World. When I selected just the matching rows for export, these unmatched rows were left on the cutting room floor.

I split the tables apart again, took them back into the Excel/PowerPivot environment, and found that things still didn’t quite work. In cases where the original and reconciled names differed, Gridworks was exporting the original name (e.g. Iran, Islamic Republic of) rather than the reconciled name (e.g., Iran). To export the reconciled names, I added a new column in Gridworks, based on the country column, and used a Gridworks expression to display the reconciled name.

There will be much more to say about PowerPivot and Gridworks. Each, on its own, is an amazing tool. But the combination makes my spidey sense tingle in a way I haven’t felt for a long time.

Part of my weekend bicycling sound track was a 2007 talk by Frederick Brooks, author of the seminal book The Mythical Man-Month. I found it on John D. Cook’s blog, which sums up the themes of the talk and of Brooks’ new book, The Design of Design. Here, amusingly, is the result of a Worldcat Find a copy in the library for that book:

Something tells me that inter-library loan may not work in this case!

Anyway, as noted in John D. Cook’s blog, the theme of the talk (and of the book) is how to maintain the conceptual integrity of a large and complex design. Although Brooks fiercely opposes the waterfall model, he asserts that you do need to have a single mind — or pair of minds — running the show.

Assuming that’s the case, how do you organize the work? At one point, reflecting on the task of reinventing the air traffic control system, Brooks argues that the open source model could not produce a system he’d trust. Challenged on that point in the Q and A, he allowed that the Linux development process has cathedral-like as well as bazaar-like aspects.

What if you’re not part of the aristocracy? What motivates you to get out of bed in the morning and go to work? Here Brooks makes a wonderful point. The many who implement designs can have as much scope for creative expression as the few who specify the designs. If we fail to teach and celebrate that, shame on us.

I also loved Brooks’ take on remote collaboration. When he looked at the literature he found it was always about tools for document-sharing and telepresence, never about the biological, social, and organizational issues that such tools might (or might not) address. He cites, by way of example, a separated team of engineers who have easy access to a world-class videoconferencing system but prefer to use screensharing in combination with a voice-only call.

This happens a lot, and we don’t yet understand all the reasons why. Clearly the emotional bandwidth of the voice channel dwarfs its network bandwidth. I am starting to wonder if that’s because mirror neurons can sync up over the voice channel. In any event, having read The Mythical Man-Month so long ago, it was delightful to finally hear the voice of Frederick Brooks.

I’m editing an interview with John Hancock, who leads the PowerPivot charge and championed its support of OData. During our conversation, I told him this story about how pleased I was to discover that OData “just works” with PubSubHubbub. His response made me smile, and I had to stop and transcribe it:

Any two teams can invent a really efficient way to exchange data. But every time you do that, every time you create a custom protocol, you block yourself off from the effect you just described. If you can get every team — and this is something we went for a long time telling people around the company — look, REST and Atom aren’t the most efficient things you can possibly imagine. We could take some of your existing APIs and our engine and wire them together. But we’d be going around and doing that forever, with every single pair of things we wanted to wire up. So if we take a step back and look at what is the right way to do this, what’s the right way to exchange data between applications, and bet on a standard thing that’s out there already, namely Atom, other things will come along that we haven’t imagined. Dallas is a good example of that. It developed independently of PowerPivot. It was quite late in the game before we finally connected up and started working with it, but we had a prototype in an afternoon. It was so simple, just because we had taken the right bets.

There are, of course, many kinds of efficiency. Standards like Atom aren’t most efficient in all ways. But they are definitely the most efficient in the “it just works” way.

The elmcity project’s newest hub is called Madison Jazz. The curator, Bob Kerwin, will be aggregating jazz-related events in Madison, Wisconsin. Bob thought about creating a Where hub, which merges events from Eventful, Upcoming, and Eventbrite with a curated list of iCalendar feeds. That model works well for hyperlocal websites looking to do general event coverage, like the Falls Church Times and Berkeleyside. But Bob didn’t want to cast that kind of wide net. He just wanted to enumerate jazz-related iCalendar feeds.

So he created a What hub — that is, a topical rather than a geographic hub. It has a geographic aspect, of course, because it serves the jazz scene in Madison. But in this case the topical aspect is dominant. So to create the hub, Bob spun up the delicious account MadisonJazz. And in its metadata bookmark he wrote what=JazzInMadisonWI instead of where=Madison,WI.

If you want to try something like this, for any kind of local or regional or global topic, the first thing you’ll probably want to do — as Bob did — is set up your own iCalendar feed where you record events not otherwise published in a machine-readable way. You can use Google Calendar, or Live Calendar, or Outlook, or Apple iCal, or any other application that publishes an iCalendar feed.

If you are very dedicated, you can enter invidual future events on that calendar. But it’s hard, for me anyway, to do that kind of data entry for single events that will just scroll off the event horizon in a few weeks or months. So for my own hub I use this special kind of curatorial calendar mainly for recurring events. As I use it, the effort invested in data entry pays recurring dividends and builds critical mass for the calendar.

Next, you’ll want to look for existing iCalendar feeds to bookmark. Most often, these are served up by Google Calendar. Other sources include Drupal-based websites, and an assortment of other content management systems. Sadly there’s no easy way to search for these. You have to visit websites relevant to the domain you’re curating, look for the event sections on websites, and then look for iCalendar feeds as alternatives to the standard web views. These are few and far between. Teaching event sponsors how and why to produce such feeds is a central goal of the elmcity project.

When a site does offer a Google Calendar feed, it will often be presented as seen here on the Surrounded By Reality blog. The link to its calendar of events points to this Google Calendar. Its URL looks like this:

1. google.com/calendar/embed?src=surroundedbyreality@gmail.com

That’s not the address of the iCalendar feed, though. It is, instead, a variant that looks like this:

2. google.com/calendar/ical/surroundedbyreality@gmail.com/public/basic.ics

To turn URL #1 into URL #2, just transfer the email address into an URL like #2. Alternatively, click the Google icon on the HTML version to add the calendar to the Google Calendar app, then open its settings, right-click the green ICAL button, and capture the URL of the iCalendar feed that way.

Note that even though a What hub will not automatically aggregate events from Eventful or Upcoming, these services can sometimes provide iCalendar feeds that you’ll want to include. For example, Upcoming lists the Cafe Montmartre as a wine bar and jazz cafe. If there were future events listed there, Bob could add the iCalendar feed for that venue to his list of MadisonJazz bookmarks.

Likewise for Eventful. One of the Google Calendars that Bob Kerwin has collected is for Restaurant Magnus. It is also a Eventful venue that provides an iCalendar feed for its upcoming schedule. If Restaurant Magnus weren’t already publishing its own feed, the Eventful feed would be an alternate source Bob could collect.

For curators of musical events, MySpace is another possible source of iCalendar feeds. For example, the band dot to dot management plays all around the midwest, but has a couple of upcoming shows in Madison. I haven’t been able to persuade anybody at MySpace to export iCalendar feeds for the zillions of musical calendars on its site. But although the elmcity service doesn’t want to be in the business of scraping web pages, it does make exceptions to that rule, and MySpace is one of them. So Bob could bookmark that band’s MySpace web page, filter the results to include only shows in Madison, and bookmark the resulting iCalendar feed.

This should all be much more obvious than it is. Anyone publishing event info online should expect that any publishing tool used for the purpose will export an iCalendar feed. Anyone looking for event info should expect to find it in an iCalendar feed. Anyone wishing to curate events should expect to find lots of feeds that can be combined in many ways for many purposes.

Maybe, as more apps and services support OData, and as more people become generally familiar with the idea of publishing, subscribing to, and mashing up feeds of data … maybe then the model I’m promoting here will resonate more widely. A syndicated network of calendar feeds is just a special case of something much more general: a syndicated network of data feeds. That’s a model lots of people need to know and apply.

I’ve posted the Python script I used to make the Pivot visualization of this blog. I need to set it aside for now and do other things, but here’s a snapshot of the process for my future self and for anyone else who’s interested.

Using deepzoom.py to create Deep Zoom images and collections

I’m using this Python component to create Deep Zoom images and collections. I made the following changes to it:

1. tile_size=256 (not 254) at line 59, line 160, and line 224

2. source_path.name instead of source_path at line 291

3. destination + '.xml' instead of destination at line 341

Let’s assume that Python is installed, along with the Python Imaging Library, and that your current directory contains the files 001.jpg, 002.jpg, and 003.jpg:

001.jpg
002.jpg
003.jpg

For each image file, you could run deepzoom.py thrice from the command line, like so:

python deepzoom.py -d 001.xml 001.jpg
python deepzoom.py -d 002.xml 002.jpg
python deepzoom.py -d 003.xml 003.jpg

My script doesn’t actually do it that way, it enumerates JPEGs and instantiates deepzoom.py’s ImageCreator object once for each. But either way, for each JPEG you end up with a DZI (Deep Zoom Image) package that consists of (for 001.jpg):

  • A settings file: 001.xml
  • A subdirectory: 001_files
  • More subdirectories (named 0, 1, etc.) inside 001_files
  • JPG files inside those subdirectories

Now, in this case, the current directory looks like this (using -> to mark additions):

001.jpg
-> 001.xml
-> 001_files
002.jpg
-> 002.xml
-> 002_files
003.jpg
-> 003.xml
-> 003_files

To build a collection, do something like this in Python:

from deepzoom import *
images = ['001.xml','002.xml', '003.xml']
creator = CollectionCreator()
creator.create(images, 'dzc_output')

Now the current directory looks like:

001.jpg
001.xml
001_files
002.jpg
002.xml
002_files
003.jpg
003.xml
003_files
-> dzc_output.xml
-> dzc_output_files

The Pivot collection’s CXML file will refer to dzc_output.xml, like so:

<Items ImgBase="dzc_output.xml">

Using IECapt to grab screenshots

This tool uses Internet Explorer, so only works on Windows. There is also CutyCapt for WebKit, which I haven’t tried but would be curious to hear about.

Here’s an example of the IECapt command line I’m using:

iecapt –url=http://blog.jonudell.net/… –delay=1000 –out=tmp.jpg

The result in most cases is a tall skinny JPEG, because it renders the whole page — which can be very long — before imaging it. When I ran it over a 600-item collection, it hung a couple of times because of JavaScript errors. So I went to Internet Options->Browsing in IE, checked Disable script debugging, and unchecked Display a notification about every script error.

Using ImageMagic to crop screenshots

Here’s a picture of an image produced by IECapt, overlaid with a rectangle marking where I want to crop:

The rectangle’s origin is at x=30 and y=180. Its width is 530 pixels, and height 500. Here’s the ImageMagick command to crop a captured image in tmp.jpg into a cropped image in 001.jpg:

convert -quality 100 -crop 530×500+30+180 -border 1×1 -bordercolor Black tmp.jpg 001.jpg

I’m writing this down here mainly for myself. ImageMagic can do everything under the sun, but it always takes me a while to dig up the recipe for a given operation.

Parsing the WordPress export file

I found to my surprise that WordPress currently exports invalid XML. So the script starts with a search-and-replace that looks for this:

xmlns:wp="http://wordpress.org/export/1.0/"

And replaces it with this:

xmlns:wp="http://wordpress.org/export/1.0/"
xmlns:atom="http://www.w3.org/2005/Atom"

Then it walks through the items in the Atom feed, extracting the various things that will become Pivot facets. For the description, it tries to parse the content:encoded element as XML, and find the first paragraph element within it. If that fails, it just treats the element as text and grabs the beginning of it.

Weaving the collection

There are two control files that need to be synchronized. First, there’s dzc_output.xml, for the Deep Zoom collection. It has elements like this:

<I Id=”596″ N=”596″ Source=”2245.xml”>

Then there’s pivot.cxml which drives the visualization. It has elements like this:

<Item Id="596" Img="#596"
  Name="Freebase Gridworks: A power tool for data scrubbers"
  Href="http://blog.jonudell.net/2010/03/26/...
<Description><![CDATA[
I've had many conversations with Stefano Mazzocchi and David Huynh [1, 2, 3]
about the data magic they performed at MIT's Project Simile and now perform
at Metaweb. If you're somebody who values clean data and has wrestled with
the dirty stuff, these screencasts about a forthcoming product called
Freebase Gridworks will make you weep with joy.
]]></Description>
<Facets>
  <Facet Name="date">
    <DateTime Value="2010-03-26T00:00:00-00:00" />
  </Facet>
<Facet Name="tag">
<String Value="freebase" />
<String Value="gridworks" />
<String Value="metaweb" />
</Facet>
  <Facet Name="comments">
    <Number Value="24" />
  </Facet>
</Facets>
</Item>

In this example, Source="2245.xml" in dzc_output.xml refers to a Deep Zoom image whose name comes from the WordPress post_id for that entry, which is:

<wp:post_id>2245</wp:post_id>

But Id="596", which is the connection between dzc_output.xml and pivot.cxml, comes from a counter in the script that increments for each item processed. I don’t know why the numbering of items in the WordPress export file is sparse, but it is, hence the difference.

Things to do

Here are some ideas for next steps.

1. Check the comment logic. I just noticed the counts seem odd. Maybe because I’m counting all comments instead of approved comments?

2. Use HTML Tidy to ensure that item content will parse as XML, and then count various kinds of elements within it: tables, images, etc.

2. Use APIs of various services — Twitter, bit.ly, etc. — to count reactions to each item.

My guest for this week’s Innovators show is Scott Rosenberg. He’s the author of two books, most recently Say Everything, subtitled How blogging began, what it’s becoming, and why it matters. Before that he was the Chandler project‘s embedded journalist, and told its story in Dreaming in Code. His current project is MediaBugs, a soon-to-be-launched service that aims to crowd-source the reporting and correction of errors in media coverage.

We began with a discussion of Say Everything. Its account of how blogging came to be is a great read, and a much-needed history of the era. Since I know that story quite well, though, we focused on the blogosphere’s present state and future prospects. Blogging is still a new medium. But those of us who experienced blogging as a conversation flowing through decentralized networks of blogs have now seen still newer (and more centralized) social media capture a lot of that conversation.

The good news is that more people are able to be involved. The fact that millions of people fired up blogs was, and remains, astonishing. But active blogging has proven to be a hard thing to sustain. Meanwhile hordes of people find it relatively easy to be active on Facebook and Twitter.

The bad news is that, as always, there’s no free lunch. While it’s easier to create and sustain network effects using Facebook and Twitter, you sacrifice control of your own data. Scott thinks we’re moving through a transitional phase, and I hope he’s right. We really need the best of two worlds. First, control of the avatars we project into the cloud, and of the data that surrounds them, insofar as that’s possible. Second, frictionless interaction. The tension between these two conflicting needs will define the future of social media.

Two of Scott’s other projects, Dreaming in Code and MediaBugs, are connected in an interesting way. The media project adopts terminology (“filing bugs”) and process (version control, issue tracking) from the realm of software. If MediaBugs helps make non-technical people aware of that crucial way of thinking and acting, it will be a bonus outcome.

Next Page »