Mystery outlet at O’Hare

28 Sep 200728 Sep 2007 ~ Jon Udell ~ 20 Comments

On my last trip through Chicago’s O’Hare Airport I got fooled by the mystery outlet depicted in the photo. At a distance it looked like an AC power outlet, but it’s not one of these. Annoying, because I’d rather spread out on the floor than crouch among the huddled masses at the power bar. What is that thing anyway?

First look at Resolver, an IronPython-based spreadsheet

27 Sep 200710 Jun 2009 ~ Jon Udell ~ 55 Comments

Last month in an item about working with crime data I asked:

Will there be a role for IronPython (or IronRuby) here, someday, such that you could use these languages inside Excel? That’d be very cool.

Several folks suggested that I should take a look at Resolver, an IronPython-based spreadsheet that deeply unifies Pythonic object-oriented programming with the sort of direct manipulation that makes the spreadsheet so useful. Resolver was and still is in private beta, but today’s screencast (Flash, Silverlight) will give you a good sense of what it’s all about.

The presenters are Giles Thomas, managing director and CTO of Resolver Systems (and creator of his own Resolver screencast), and Michael Foord, who blogs about Python, contributes to the IronPython cookbook, and is also working on the forthcoming book IronPython in Action.

If you are (or would like to be) using Python to wrangle business data, Resolver will make sense immediately. You’ll love the idea of wielding Python’s powerful data manipulation features in that context. You’ll appreciate what it would mean to harness not only the Python standard libraries but, because Resolver is IronPython-based, also the .NET Framework and the universe of third-party .NET assemblies. And you’ll be intrigued by the way in which the IronPython code that represents and animates a Resolver spreadsheet can be reused elsewhere — for example, in web applications.

But there’s more to the story. Because a cell in a Resolver spreadsheet can contain a reference to any .NET object, Resolver creates, as Giles Thomas says, “a somewhat pathological but entirely new way of programming using a spreadsheet.” You can, for example, define an anonymous function — say, a function that returns the square of its argument — and store it in cell B4. Then you can place a value — say, 5 — in cell A2. Then you can store this formula in cell B6:

=B4(A2)

That says: “Apply the squaring function in B4 to the value in A2.” The result in B6 will be 25.

I’ve long argued that the interactive and exploratory style of dynamic object-oriented languages is an important but underappreciated benefit. As I may have mentioned before, IronPython’s creator Jim Hugunin told me that when he first showed IronPython to folks at Microsoft, he was surprised by their reaction. He thought the big wow would be IronPython’s ability to streamline and accelerate use of the .NET Framework. But while people did appreciate that, they were truly wowed by something that’s second nature to every Python programmer — the read/eval/print loop which traces all the way back to the earliest Lisp systems.

It is a magical and powerful thing to be able to explore and modify a running program’s code and data. From those early Lisp systems to today’s Python and Ruby implementations, we have been doing that exploration and modification using a command line.¹ We can trick it out with recall, name completion, and search, but it’s still a command line with all the limitations that entails. If I’ve defined an object A and stored some code or data there, my definition and invocations of A will scroll out of view as I continue to work. They won’t be visually persistent.

In a Resolver spreadsheet, these objects are visually persistent. I haven’t yet got my hands on Resolver, but here’s an example of what I think that will mean. Suppose that I have a data set I want to transform, against which I’m testing five different versions of a transformation function. I’d put the data in cell A1, the functions in cells B1..B5, and the results in C1..C5. Now I’ll see everything at a glance. The spreadsheet that would conventionally have been the results viewer at the end of a series of tests becomes the environment in which the tests are written, performed, and evaluated.

The spreadsheet is also an important bridge between programmers and their business sponsors. It’s no accident that Ward Cunningham’s FIT (Framework for Integrated Test) was originally inspired by Ward’s experience of inviting business analysts to write test cases in spreadsheets. In its current form, FIT uses HTML tables in a wiki as the bridge between analysts who write tests and developers who write the code that must pass those tests. I think Resolver and FIT may prove to be a marriage made in heaven.

While Resolver will initially appeal to business programmers who appreciate Python as a language, and IronPython as a way of leveraging the .NET Framework and .NET-based business logic, the ideas it embodies transcend Python and .NET. I’ll be fascinated to see how this “pathological but entirely new way of programming using a spreadsheet” will evolve.

¹ Smalltalkers will note that they have been using a three-pane browser all along, and that’s true. However the spreadsheet metaphor, in this context, is something else again.

Screencasting and scripting

26 Sep 2007 ~ Jon Udell ~ 8 Comments

I was chatting the other day with Jim Hugunin about an earlier posting on automation and accessibility, and Jim highlighted a point that’s worth calling out separately. If you had a script that could drive an application through all of the things shown in a screencast, you wouldn’t need the screencast. The script would not only embody the knowledge contained passively in the screencast, but would also activate that knowledge, combining task demonstration with task performance.

Of course this isn’t an either/or kind of thing. There would still be reasons to want a screencast too. As James MacLennon pointed out yesterday:

All too often, the classic on-the-job training technique has been “just follow Jim around, and do what he does for the next three weeks …”. This kind of unstructured training doesn’t lend itself to easily to written documentation – it’s the nature of the process as well is the nature of the people. Video, however, allows us to simulate this “follow him around” approach.

Citing Chris Gemingnani’s Excel recreation of a New York Times graphic, James says:

This kind of approach clicked with me, because this was my preferred method for learning a new programming environment. If I could just get an experienced programmer to take me through the edit / compile / debug / build cycle, I would be off and running.

So you’d really want both the screencast and the script — for extra credit, synchronized to work together.

What stands in the way of doing this? Don’t the Office applications, for example, already have the ability to record scripts? Yes, they do, but that flavor of scripting targets what I called engine-based rather than UI-based automation. Try this: Launch Word, turn on macro recording, and then perform the following sequence of actions:

Mailings
Recipients
Type New List

Now switch off the recorder and look at your script. It’s empty, because you haven’t yet done anything with the engine that’s exposed by Word’s automation interface, you’ve only interacted with the user interface in preparation for doing something with the engine.

It would be really useful to be able to capture and replay that interaction. And in fact, I’ve written a little IronPython script that does replay it, using the UI Automation mechanism I discussed in the earlier posting. It’s not yet even really a proof of concept, but it does contain three lines of code that correspond exactly to the above sequence. Each line animates the corresponding piece of Word’s user interface. So when you run the script, the Mailings ribbon is activated, then the Recipients button is highlighted and selected, and then the Type New List menu choice appears and is selected.

What I’m envisioning here is UI-based semantic automation. I call it UI-based to distinguish it from the engine-based approach that bypasses the user interface. I call it semantic because it deals with named objects in addition to keystrokes and mouseclicks. Is this even possible? I think so, but so far I’ve only scratched the surface. Deeper in there be dragons, some of which John Robbins contends with in the article I cited. I’d be curious to know who else has fought those dragons and what lessons have been learned.

Talk faster! No, slower!

25 Sep 200725 Sep 2007 ~ Jon Udell ~ 5 Comments

Peter Williams:

I learned a couple of things and it spurred some interesting ideas. However, neither of them talk very fast…I just cannot stand that most people talk so slowly.

Peter Ring:

Have no fear of pauses, they help frame and structure the noise between the pauses.

Silverlight for screencasters

24 Sep 200724 Sep 2007 ~ Jon Udell ~ 9 Comments

I’ve been doing some experiments to find out how the Silverlight plug-in will work as a player for screencasts. On this test page you’ll find four different versions of a 23-second clip. There’s one for Quicktime, one for Windows Media, one for Flash, and one for Silverlight.

Some important variables, from a screencaster’s perspective, are: legibility, file size, and convenience of production, deployment, and viewing.

That legibility matters seems obvious, but I see an awful lot of screencasts delivered at squinty resolutions. This puzzles me. The purpose of a screencast is to show and describe on-screen action. If you can’t read the screen, what’s the point?

All four of these examples are legible. The Quicktime version achieves the best clarity, but there’s a tradeoff: it’s also the largest file.

That size matters is perhaps less obvious to those of us living in the developed world. But as I’ve been recently reminded by both Beth Kanter and Barbara Aronson, much of the world remains bandwidth-challenged. Videos that don’t squeeze themselves down will not be seen in many places where they should be.

Among these four examples, Windows Media weighs in lightest at under half a megabyte. That works out to about a megabyte per minute, which is the target I like to shoot for. If it’s possible to deliver a legible screencast at a data rate significantly less then that, I’d like to know how.

The sizes of the other versions in this example, in ascending order: Flash 1.2MB, Silverlight 1.5MB, Quicktime 2MB.

Of course these sizes depend on which encoder is used, and on which settings are applied. For these tests, I produced all of the screencasts in Camtasia. For Quicktime and Windows Media, Camtasia uses the encoders that come with those platforms. For Flash, it supplies an encoder. For Silverlight, it doesn’t yet supply an encoder so I produced an uncompressed AVI and then used Expression Encoder to create a Silverlight-compatible WMV file.

I should add here that, despite all the work I’ve done in this area, I’m still a bit vague on the concept of a screen encoder — that is, a video encoder that’s tuned for the kinds of low-motion but text-rich content that’s typical of screencasts. In beta versions of Silverlight and Expression Encoder, for example, there wasn’t a screen video option, so the only way to produce a legible screencast was to crank up a motion-video encoder to the maximum data rate, which produced a massive file. Now Expression Encoder provides a screen encoding option, which I used for this test and which Silverlight 1.0 can obviously play back.

It seems to me that Camtasia should be able to use that encoder directly, but until I figure out how, it will be less convenient to produce Silverlight screencasts from Camtasia than to produce the other formats. Rendering to AVI as an intermediate step is doable, but time-consuming.

In terms of deployment convenience, one measure is the number of supporting HTML, JavaScript, configuration, and other files required in order to play a screencast. I’m a minimalist, so when I deploy Camtasia screencasts I throw away the wrappers that Camtasia generates and go with the Simplest Thing That Could Possibly Work. From my perspective, that winds up being an OBJECT tag (and, sigh, also an EMBED tag) for Quicktime or Windows Media, plus a reference to a minimal player in the case of Flash. By comparison, my Expression-generated Silverlight example has lots of moving parts — an HTML file, a XAML file, a flock of JavaScript files, and the WMV file.

The Silverlight example could of course be simplified by coalescing the JavaScript support, but that alone won’t solve another issue of deployment convenience. It’s nice to be able to embed a screencast in any arbitrary web host. From the perspective of my WordPress.com blog, that’s an issue for all four of these approaches. WordPress is always coming up with new ways to embed video from various services, but the reason that’s necessary is that WordPress.com — quite rationally — strips out most of the advanced HTML tags and JavaScript support that you might want to include in your blog postings. In general, embedded video seems to be a game of point solutions. In order to embed video flavor X in web host Y you need a specific X+Y adapter. I understand the reasons why, but it’s frustrating.

One of those adapters, by the way, will be needed for WordPress.com and Silverlight Streaming, which is the Microsoft hosting service announced at the MIX conference earlier this year. I’ve hosted another version of my Silverlight example there. It’s the same set of files as this example, minus the HTML wrapper and the core Silverlight JavaScript code, plus an XML manifest, all packaged up in a zip file. I’m not expecting my little test to attract millions of viewers, but if I were, this hosting service would be one way to handle the load.

In terms of viewing convenience, the Silverlight example exhibits a nice property that I wasn’t expecting. When you resize the window containing the player, the player scales to fit. I’m pretty sure the embeddable Quicktime and Windows Media players can’t do that. Flash-based media players are more customizable, and can respond to container resize events, but I don’t think I’ve ever seen the technique applied to a screencast. It’s a nice idea. A screencast at 1:1 resolution is guaranteed to be legible, but will also consume a lot of screen real estate. So it’s tempting to shrink its width and height in production. But by how much? Any fixed resolution will work well for some people and not others. Resizable screencasts would be great for accessibility.

Of course you can resize any standalone player. So this issue boils down to what’s possible when the player is embedded in a web page. And as we’ve seen, embedding can be problematic. In general, we need to work toward a smoother transition between embedded and standalone viewing experiences.

The ultimate test of viewing convenience is, of course: Does it play instantly, regardless of the operating system or browser I happen to be using? Flash leads the way in that regard. Silverlight aspires to the same level of plug-in ubiquity, and with the announcement of Moonlight that aspiration seems achievable.

Ultimately a screencaster wants to be able to produce one video that works well for everyone, everywhere, for various definitions of works well. That’s a hard problem. Solutions depend on the raw capabilities of media players, it’s true. But they also depend on an ecosystem of plug-ins, browsers, encoders, operating systems, and hosting services.

A conversation with Beth Kanter about social software and non-profit organizations

24 Sep 2007 ~ Jon Udell ~ 2 Comments

My guest for this week’s ITConversations show is Beth Kanter. We share a common interest in showing people how and why to use social software. In this conversation Beth reflects on her work with “digital immigrants” in non-profit organizations. The cornucupia of free services is a blessing for these organizations. But even when financial hurdles are stripped away, conceptual hurdles remain. Helping people to understand what’s possible, and to exploit online services in appropriate ways, is both a great challenge and a great opportunity.

The fourth platform

21 Sep 200721 Sep 2007 ~ Jon Udell ~ 7 Comments

In my podcast with Ed Iacobucci about DayJet’s approach to reinventing air travel, Ed recalls the moment when he knew that the Eclipse VLJ (very light jet) represented the hardware component of a new platform. His contribution would be to create the operating system that would enable new travel applications.

Antonio Rodriguez, who joined me for another podcast about Tabblo, the online photo service he founded, enjoyed the Ed Iacobucci podcast but concluded:

I think I am beginning to develop an aversion to the term platform.

When I read that, I said to myself: “Yeah, me too.”

But we can’t help ourselves. On the very same day, Antonio responded to Marc Andreessen’s taxonomy of platforms. So did lots of others, including Joshua Allen.

Marc’s post defines three levels of platform:

Flickr-style data-access APIs.
Facebook-style containers of “Internet plug-ins.” While hosted by their container, these plug-in applications must also provide their own life support.
Ning-, Salesforce-, and Second Life-style runtime environments that fully support their dependent applications.

As both Antonio and Joshua point out, Marc’s level 3 runtime leads to the sort of lock-in scenario that developers have learned to regard with suspicion. Antonio pushes back on Marc’s characterization of Amazon’s S3 and EC2 as “only sort of” a level 3 platform. Facebook alone may be a level 2 platform, he says, but in combination with Amazon’s neutral infrastructure it already reaches level 3. And of course Amazon’s services can recombine with other level 2 platforms to yield other level 3 platforms.

Joshua, meanwhile, questions the need for levels 2 and 3. He thinks that Marc’s base platform — the data flowing at level 1 — has potential we’ve scarcely begun to realize. I agree. Syndication-oriented architecture surely has limitations, but we haven’t run into them yet.

It’s also worth noting that Marc’s taxonomy is wholly cloud-centric:

I think that kids coming out of college over the next several years are going to wonder why anyone ever built apps for anything other than “the cloud” — the Internet — and, ultimately, why they did so with anything other than the kinds of Level 3 platforms that we as an industry are going to build over the next several years.

I’ll go along with that, but only if we can extend our definition of the cloud to encompass what the Internet originally was: a network of peers. With rare but notable exceptions (e.g. BitTorrent) it hasn’t been that for a long time. I think it will be that again. There’s a level 4 platform waiting in the wings. At level 4, the cloud of storage and computation is partly centralized in a handful of intergalactic clusters, and partly distributed across a network of humble peers. Microsoft’s forthcoming Internet service bus is one example of a level 4 platform. I hope, and expect, we’ll see others.

Appreciating Common Craft’s “paperworks” sketchcasts

19 Sep 200719 Sep 2007 ~ Jon Udell ~ 8 Comments

I am an immediate fan of Common Craft’s style of concept videos. Their explanations of how and why to use del.icio.us and Google Docs are crisp and entertaining. They convey the essence of these activities more clearly than any other visual explanations I’ve seen, including many of the screencasts I’ve made.

The style is called paperworks because these sketchcasts are made by capturing screenshots, printing out key elements, and then filming, animating, annotating, and narrating arrangements and rearrangements of these scraps of paper. The first time you watch one, you’ll be captivated: it’s cute, it’s fresh. But is this just a gimmick? After you watch a few more, and you begin to acclimate to the style, does its effectiveness wane? Not yet, for me, because these productions have more going for them than cuteness and freshness.

One of the principles at work here is the moral equivalent of cropping and zooming in the screencast medium. When you’re trying to explain software on a conceptual level, images captured from screens can be a mixed blessing. It’s valuable to show exactly what screens look like, and exactly how actions flow within and across them. But the amount of detail that’s visible in a typical screen can often distract from the story you’re trying to tell. By cropping the screen, and/or by zooming in on the active region, you can prune away a lot of visual clutter and focus on key interactions. The paperworks style is an extreme form of cropping and zooming; it prunes and focuses very aggressively.

Another principle is sketching. According to Bill Buxton, sketching goes hand in hand with what he calls design thinking. When I asked Bill how he would have used sketching in the design of a feature like the Office ribbon, he said:

You’d start with paper prototyping — quickly hand-rendered versions, and for the pulldown menus and other objects you’d have Post-It notes. So when somebody comes with a pencil and pretends it’s their stylus and they click on something, you’ve anticipated the things they’ll do, and you stick down a Post-It note.

If that’s a helpful way to imagine software interaction in the design phase, why wouldn’t it also be a helpful way to conceptualize the software in use? The paperworks style strongly suggests that it is. These sketchcasts are great visual explanations of working software. I suspect they’d be equally useful during the design of that software.

A conversation with Ed Iacobucci about the reinvention of air travel

15 Sep 200715 Sep 2007 ~ Jon Udell ~ 25 Comments

In Free Flight, the seminal book on the forthcoming reinvention of air travel, James Fallows tells a story about Bruce Holmes, who was then the manager of NASA’s general aviation program office. For years Holmes clocked his door-to-door travel times for commercial flights, and he found that for trips shorter than 500 miles, flying was no faster than driving. The hub-and-spoke air travel system is the root of the problem, and there’s no incremental fix. The solution is to augment it with a radically new system that works more like a peer-to-peer network.

Today Bruce Holmes works for DayJet, one of the companies at the forefront of a movement to invent and deliver that radically new system. Ed Iacobucci is DayJet’s co-founder, president, and CEO, and I’m delighted to have him join me for this week’s episode of Interviews with Innovators.

I first met Ed way back in 1991 when he came to BYTE to show us the first version of Citrix, which was the product he left IBM and founded his first company to create. As we discuss in this interview, the trip he made then — from Boca Raton, Florida to Peterborough, New Hampshire — was a typically grueling experience, and it would be no different today. A long car trip to a hub airport, a multi-hop flight, another long car trip from hub airport to destination.

In a few weeks, DayJet will begin offering a different kind of experience for travel within a trial network of small Florida airports. If all goes well, the network will then expand to the entire Southeast, and eventually — I sincerely hope — will reactivate small airports around the country, including the one that’s two miles from my home.

In this interview, Ed describes how he worked through a false start, realized that on-demand air travel would require a platform, decided that Eclipse Aviation’s line of precision-engineered, mass-produceable, and affordable jets would be the platform’s equivalent to the personal computer, and then conceived and created its network operating system and software service infrastructure.

There were two major research and development challenges. First, how do you find an optimal routing solution when there’s no fixed schedule and when every new reservation ripples through the entire network?

It didn’t take very long to figure out that if you replace one 25-million-dollar plane with 25 one-million-dollar planes, it fixes a lot of problems. And if you couple that with doing it by the seat instead of by the plane, that lets you interleave packets, or payloads, and increases the efficiency even more. So it became very clear that we needed to build a large, self-optimizing network that would take a lot of other factors into consideration, like the physics of the airplane, the temperature, the loads. The beauty of aviation is that it’s like physics meets business, right? How much you can carry depends on temperatures, altitudes, runway lengths — and safety is all expressed in terms of parameters that the optimizer has to take into account as it starts shuffling around customers. It’s not a straight optimization, it has to be done in real time, and it has an incredible number of constraints.

So I hired mathematicians, really smart guys, and we brought them on and gave them the challenge of their lifetime — really, for the rest of their lives — because you’ll never find a solution to the problem, it’s what mathemeticians call NP-hard, which means you can take every computer made between now and the end of our lives, and run them until the end of our solar system, and you’ll never find the optimal solution. You have to move from traditional hard optimization to heuristics aided by optimization techniques. So then we brought in an operations research group from Georgia Tech, real heavy hitters who did optimization for large air carriers. But optimizing assets around a fixed schedule is a vastly different problem from trying to determine the most optimal solution in real time for something that doesn’t have a fixed schedule and morphs with every new request that comes in. Their response was, “Nobody’s ever done this.”

If it were just chartering airplanes, that’s not very exciting. But now we’ve got new science, new math, it’s a lot of green fields in areas where we could get collaboration with major universities where topnotch people want to work with us, and assign Ph.D. students to work with us.

The goal was to be able to respond yes or no, within ten seconds, to a customer’s request for a flight between two participating airports, and that goal has been achieved. Given that capability, DayJet has been able to create a new business model that prices tickets according to the value of each customer’s time. If you value your time highly, you can request a narrow time window for your flight, and pay more for your ticket. If you’re willing to accept a wider window — say, you’re OK with leaving anywhere between 10AM and 3PM — you’ll pay a lot less. Ed calls this “time arbitrage” and it’s at the heart of what’s really revolutionary about this system from the customer’s point of view.

There was another big problem. As you build out the network of regional airports, you have to make big asset investment decisions. How can you model demand in order to guide those investment decisions?

Believe me, there are many more ways to go bankrupt than to make money. What we’re really building here is a value network, and the composition of that network determines the load. If I have a network of nodes A, B, C, and D, and I add a new node, E, that can have an impact on all the nodes at various times of the day. But if I add F, that could impact some nodes differently than others. It’s an interrelated loading problem that’s very difficult to model. So I thought, OK, we’ve got these guys taking chaos and organizing it into order, so we can file flight plans and make it all look organized, or actually be organized, on the back end. What I need is another group of people to create organized chaos, or complexity, to mimic the behaviors of a region of travelers, that can be used to test how well we can reorganize that chaos into order. That’s not simple either, it’s going to depend on pricing, and time/value tradeoffs, and density of your transportation network, and what nodes you introduce, and what the interactions between the nodes are, because every city you introduce has a different effect on the others.

I realized we needed the kind of thing that SimCity represents. When I was in school we called it discrete time simulation, but then it got a biological twist and became complexity science, and at one point chaos theory, though complexity science is the more accepted term. Along with one of my directors I had a served on the board of the BIOS Institute in Santa Fe, an offshoot of the Santa Fe Institute which was biological or evolutionary modeling of large complex systems. So I got in touch with some of those guys and we offered them a job. We said, hey, come on board and we’re going to build the most sophisticated regional travel model that’s ever been built, and we’re going to use it not just to postulate the future but to build a business.

So they came on and worked for about four years and came up with this other piece of technology, which is married to the optimizer, and the simplest way to describe it as SimCity on steroids, very targeted on the problem of regional travel. So we’ve got nine different types of agents or sims, populated using IRS statistics, operating in ten-square-mile zones, they all have different rules on how they book trips and what flexibility they have. Then we loaded on top of that a bunch of demographic data — some we bought, some we got from DOT, some from IRS. And then we loaded all the schedules for all the airlines between all the airports in the contiguous 48. And then we developed algorithms so we could estimate driving times, and added time-of-day congestion through various nodes. And then we added train schedules. The result is a very sophisticated, very high-fidelity model of the transportation options you would face if you lived in one ten-square-mile region of the US, and needed to go to another one.

The story so far sounds like a high-tech dream come true, and if DayJet succeeds that’s just what it will be. But there are a couple of big reality checks. First and foremost is regulation. Although DayJet’s technology is built to exploit the benefits of extreme virtualization, the FAA places severe limits on how far that can go. So while DayJet would have preferred not to own and operate its entire fleet, hire and train all its crews, and manage all of its airport facilities, that’s exactly what it must do to meet current regulations. The business can only succeed if it works within the current regulatory regime. Of course if it does succeed on those terms, and if that success paves the way for regulations that are friendlier to a more virtualized approach, DayJet’s travel-oriented network operating system will become all the more valuable.

Another reality check is the customer’s experience. Because there are no fixed schedules, there’s much more fluidity than you can reasonably present to a customer. Internally the system may reschedule your trip a dozen times, but you won’t want to be flooded with rescheduling notifications.

This is what we’ve been wrestling with for the last six months. What it means is that you just add more constraints. We won’t flip you around all over the place. You start by negotiating as big a window as you can accept, because the bigger the window, the cheaper the ticket. And it’s not a departure window, it’s a window in which we will complete the mission. Then the challenge operationally is how to shrink those windows down as you get closer to flight time, leaving enough space for disruption recovery. We’re learning, and we’ve discovered that the night before we can crunch that window down. So we notify the customer the night before that you have to be at the airport by time X, and you’ll be at your destination by time Y.

It’s all music to my ears. I’ve been dreaming about this for years. Ed’s actually doing it, and I can’t wait to see how things turn out.

Tools of the trade

14 Sep 200714 Sep 2007 ~ Jon Udell ~ 12 Comments

My wife, who is an artist, recently picked up a copy of David Hockney’s book Secret Knowledge: Recovering the Lost Techniques of the Old Masters. She’d been having a discussion with an artist friend of hers about whether it’s wrong to use tracing, or other optical aids, when doing illustrations or paintings. In the book, Hockney advances the highly controversial theory that the dramatic surge in visual realism that occurred in the early 15th century was propelled by the use of optical projection techniques. The old masters, he claims, used mirrors, lenses, and the camera obscura to capture the outlines of the people and objects they painted.

Hockney says that a newly-available form of visualization led him to this conjecture:

Now with colour photocopiers and desktop printers anyone can produce cheap but good reproductions at home, and so place works that were previously separated by hundreds of miles side by side. This is what I did in my studio, and it allowed me to see the whole sweep of it all. It was only by putting pictures together in this way that I began to notice things; and I’m sure these things could only have been seen by an artist, a mark-maker, who is not as far from practice, or science, as an art historian. After all, I’m only saying that artists once knew how to use a tool, and that this knowledge was lost.

It all sounded perfectly plausible to me, but the art world isn’t buying it. Hockney’s critics cite a research paper by Microsoft’s Antonio Criminisi and Ricoh’s David Stork in which the authors show that Hockney’s central example — a painting of a chandelier — exhibits more irregularity than you’d expect if it had been rendered using an optical aid.

It’s fascinating stuff. I was already thinking about interviewing some folks at Microsoft Research about the state of computer vision, so maybe this will be the place to start.

But meanwhile, I’m left wondering about the context of the debate. Setting aside for the moment whether Hockney is right or wrong about the old masters’ use of optical aids, would they have been wrong to have used them in the way he suggests? Is that cheating? Should it diminish our appreciation of the work?

The art world says yes, it is cheating and does cheapen the work. But Hockney doesn’t see it that way. These optical aids, he argues, were just tools used by professionals who wisely chose to automate where they could in order to free up time and energy so they could add creative value where it mattered most.

My wife’s artist friend concurs. It’s not that she can’t draw freehand. She can and she does, but she also uses tracing techniques to identify landmarks and — because it’s commercial work she’s doing — to speed up some of the foundation-laying drudgery.

I’m sure the analogy is imperfect but, to a software guy, this all sounds very familiar. There are right and wrong ways to rely on software tools and frameworks, but I don’t think less of programmers to rely on them in the right ways. On the contrary, I think less of programmers who don’t.

The blurred line between personal information management and publishing

12 Sep 200710 Jun 2009 ~ Jon Udell ~ 16 Comments

When I mothballed my InfoWorld blog and moved in here, I decided not to use WordPress categories but instead to continue the del.icio.us-based method I’d been using before. Of the many strategies woven together in my use of del.icio.us, two principal ones are keeping track of stuff in general, and keeping track of my own stuff. In terms of the latter, I like to be able to answer a question like this:

Where is your collection of articles about how to do screencasting?

With an answer like this:

http://del.icio.us/judell/howto+screencasting

If I relied on WordPress categories, the scope of such a query would be restricted to my WordPress blog. Because I use del.icio.us tags instead, the scope can include my old blog, my new blog, essays I’ve published elsewhere, and of course material from anywhere else on the web.

So that was the plan, but when I switched blogs I never got around to adapting my tagging workflow to the new setup. After a while I began to realize that I couldn’t answer questions with URLs because none of my recent items were queryable in that way.

So I went through and tagged all the items in this blog, from January to August, in a single blitz. That might sound like an insurmountable task but really it isn’t. I exported the blog to a file, captured just the titles and links, and opened those up in a browser. Then I grabbed items in batches of twenty or so, opened them into tabs, and worked through them. It took an hour and a half. Being the tagaholic that I am, it wasn’t just an exercise in drudgery. I appreciated the opportunity to reflect on the evolution of my tag vocabulary.

At the time I did worry about how this would look to somebody watching my del.icio.us tagstream. And for good reason. Here’s how it looked to Chris Muscarella:

Jon Udell tags his own things almost exclusively. That’s lame.

Historically that’s not true, but recently it looks that way, and in any case it’s a fair comment. When you mix personal information management with publishing, the lines can get blurry.

On reflection I realized that I’d made things worse by including my del.icio.us links in the blog’s sidebar. On my old blog, I filtered these to not include my own postings, which are all identified with the tag jonudell. (And eerily, although that blog is mothballed, it is still syndicating my current non-personal del.icio.us links.) I could probably do that here as well, but not with the WordPress del.icio.us widget. It offers a filter for tag inclusion:

Show only these tags (separated by spaces):

But there’s no filter for tag exclusion — e.g., everything not tagged jonudell. So I’ve yanked that widget for now. Come to think of it, that same exclusion filter would useful for my del.icio.us feed. Should WordPress and del.icio.us add these features? Perhaps. Then again, this is exactly the sort of thing a general purpose syndication bus ought to be able to do for us.

A conversation with Rohit Khare about syndication-oriented architecture

11 Sep 200710 Jun 2009 ~ Jon Udell ~ 26 Comments

This week’s ITConversations podcast with Rohit Khare focuses on a topic that is near and dear to my heart: syndication. For both of us, that is the real substance of Facebook. Says Rohit:

Imagine there’s an application someday with 35 million users, and the first thing they see every morning is a news feed, and it’ll do a really intelligent job of summarizing what everyone they know has been up to since they last logged in. You wouldn’t have thought, “I need to sign up for a new consumer service that will tell me when people break up or get married or give talks.” And yet here we have this wonderful new phenomenon showing that there is pent-up demand. Now you can come back to the office and say, “Don’t you wish you had an interface like that so all of our field service techs could know what was going on, and be just as collaborative as this is?

So how do we get there? Start by “RSSifying” everything in sight. Then flow all the feeds through a “syndication bus”:

You do in some ways centralize the information flow, but you get the benefit of decentralized awareness — it’s an interesting paradox. If I have one syndication bus that’s responsible for delivering information to all of my users, and everyone in the community, then that same piece of software is in a very good position to detect patterns and emerging trends. If you think about meme trackers that can report, hey, this is a hot story that’s come up in the last few hours, that’s going to be really powerful when it mainstreams.

By way of disclosure, the backstory for this interview begins in 2002 when Rohit — who had co-founded KnowNow in 2000 — gave a great talk at the Emerging Technology on what he was then calling application-layer internetworking (ALIN). (I mentioned it in this InfoWorld column.) Among his other talents, Rohit is a great coiner of sticky buzzphrases and acronyms. Phil Windley, for example, conceded that ALIN was catchier than his own Layer 5 routing for web services.

Then in 2004, I interviewed KnowNow’s Michael Terner and Richard Treadway. The company’s tagline then — Simple Integration Connecting Data, Applications, and People: Business-to-Business, Event-Driven, Loosely-Coupled — was descriptive but decidedly less catchy.

Now Rohit and KnowNow are pitching a new buzzphrase, Syndication-Oriented Architecture, and a new acronym, SynOA. We are admittedly pushing the envelope when it comes to variations on the -OA theme, but I can’t help myself, I like this one for two reasons. First, the idea of syndication needs all the marketing help it can get. We’ve been at this for almost a decade and it hasn’t really caught on in the way it deserves to. Second, it’s just so obviously the right thing on so many levels, one of which happens to be information flow within the enterprise.

Automation and accessibility

10 Sep 200710 Sep 2007 ~ Jon Udell ~ 7 Comments

In last week’s item on social scripting, I suggested that CoScripter’s automation strategy — based on simple English instructions that people can easily read, write, and share — could in theory work across the continuum of application styles. And arguably it will need to, because we’re increasingly likely to mix those styles. If you begin to rely on an automation sequence for your bank’s web application, for example, you’ll be sorry to have it broken by an upgrade that introduces AJAX, Flash, or Silverlight components.

What enables CoScripter to work in the web domain is the document object model (DOM) of which every web page is a rendering. Because JavaScript code can explore and interact with the DOM’s tree of user-interface objects, the browser can be driven semantically, by object names and properties, rather than literally, by mouse clicks and keystrokes. The literal method is workable, and there many tools that make excellent use of it. The semantic method is more reliable if available, but it isn’t always. So the literal method winds up being the common denominator, because every style of application will respond to mouse clicks and keystrokes.

There is another kind of semantic technique long supported by desktop applications that define object models, notably the Mac’s AppleScript object model and Windows’ Component Object Model. These technologies enable automation scripts to reach below the user interface of applications, and to work with their internal machinery.

Using the Word object model, for example, you can automate a mail merge. If you run this program, you’ll see Word launch, you’ll see a data document written by an invisible hand, and then you’ll see a mail merge appear. What you won’t see are the user-interface actions required to produce these effects, because this level of automation bypasses the user interface.

So let’s distinguish between two flavors of semantic automation. The mail merge script does what I’ll call engine-based semantic automation. And CoScripter does what I’ll call UI-based semantic automation.

These two flavors are useful in quite different ways. With the engine-based approach, an automation script uses the application as if it (the application) were a service. In this case you don’t want windows and dialog boxes popping up all over the place, you just want to feed inputs and harvest outputs. The engine-based approach works accurately and efficiently, but it doesn’t yield a representation of task knowledge that a normal person could use, learn from, adapt, or share.

With the UI-based approach, an automation script uses the application as if it (the script) were a human being. It sees and touches exactly what the human sees and touches. This is not the optimal way to crank out a thousand mailing labels. But the UI-based approach does yield a representation of task knowledge that a normal person could use, learn from, adapt, or share.

Shareable representations of task knowledge are incredibly useful and powerful. Screencasts are one such representation, and as many people have noticed in recent years, they can radically outperform traditional forms of documentation. But you can’t interact with a screencast or concisely describe it. You can only watch and learn and imitate. Although that’s way better than not being able to watch and learn and imitate, interaction and concise description would be better still.

CoScripter delivers that superior experience of interaction and concise description. It does so by means of UI-based semantic automation which, in turn, is enabled by the browser’s document object model.

What might enable a more comprehensive flavor of UI-based semantic automation? Noodling on this question I arrived at one possible answer: the Windows UI Automation API, which is part of .NET Framework 3.0. I’d heard of it, but hadn’t connected the dots. In this June 2005 article for the ACM’s Special Interest Group on Accessible Computing, Rob Haverty lays out the rationale for this relatively new mechanism:

Windows UI Automation unifies disparate UI Frameworks such as Avalon [Windows Presentation Foundation], Trident [the browser], and Win32 so that code can be written against one API rather than several.

The basis of this unification is a tree of automation elements that is, in effect, a generic document object model. Automation providers map various specific object models, notably those of the browser and of Windows, into the generic tree. The API provides mechanisms for searching the tree and interacting with its elements.

It’s a powerful system that is also accurately described by John Robbins as “intensively fiddly.” So in this March 2007 MSDN article, he provides and illustrates the use of a set of convenience wrappers around the raw System.Windows.Automation classes. The sample program included with that article drives Notepad through a few basic operations. Could it be extended in the direction of CoScripter, in a way that realizes UI Automation’s ambition to uniformly control Windows and web applications?

I took a crack at that, and concluded that creating even a proof-of-concept will require more time and more programming chops than I can muster. But I’d be interested to hear from anyone who’s gone further down that path. I think this is potentially a very big deal. Although I suspect most programmers see UI Automation in the context of software testing, for which it is indeed well suited, Rob Haverty’s article suggests that it was primarily motivated by the need for better assistive technologies and improved accessibility.

When Tessa Lau says that accessibility guidelines are the lifeblood of CoScripter, she’s talking about affordances for people whose cannot otherwise use the full capability of their software. But consider Rob Haverty’s definition of accessible technology:

Accessible technology enables individuals to adjust their computers to meet their visual, hearing, dexterity, cognitive, and speech needs.

I like his use of the word cognitive because in some sense we are all cognitively impaired when we try to use software. For most people, most of the time, the concept count is way too high. We don’t normally think of automation as an assistive technology. But arguably it is one. And when automation yields interactive documentation that lives in shared information spaces, it becomes a really potent assistive technology.

In case it’s not obvious, I am not claiming that Windows UI Automation can realize this vision of assistive automation across the spectrum of application types. It’s currently only available by default for Vista, and optionally for Windows XP if enhanced with the .NET Framework 3.0. It is not part of Silverlight or Moonlight, though conceivably one day it might be. And it clearly has nothing to do with Mac OS X, or Java, or Flash, or the Linux desktop.

But the idea of UI-based semantic automation is something that could apply in all these domains. A proof-of-concept CoScripter-like application-plus-service spanning two major domains — Windows desktop apps and browser-based apps running on Windows — would be a big step toward that broader vision.

The social scripting continuum

6 Sep 200710 Jun 2009 ~ Jon Udell ~ 18 Comments

Back in June, IBM’s Tessa Lau joined me on my ITConversations podcast to discuss Koala, “a system for recording, automating, and sharing business processes performed in a web browser.” The service is now available on the AlphaWorks site as CoScripter, where the first script I tried was Tessa’s own Update your Facebook status. Here is the text of the script as it appears in the CoScripter wiki:

* go to "http://www.facebook.com"
* enter your "e-mail address" (e.g. tlau@tlau.org) into the "Email:" textbox
* enter your password into the "Password:" textbox
* click the "Login" button
* click the "Profile" link
* click the "Update your status..." link
* enter your status into the status field

Interestingly there was a bug in that script. The fourth step was originally:

* click the "Password" button

Because there is no button labeled “Password” on Facebook’s login page, the script failed.¹ When I made the change from “Password” to “Login” in the CoScripter sidebar I simultaneously fixed the script and added the corrected version to the wiki. After posting this entry, I added a comment to the wiki that points back here. All in all, it’s a nice illustration of the emerging style of social programming that we also see in applications like Yahoo! Pipes and Popfly.

As Tessa explains in the podcast, many scripts — including this Facebook example — require secrets, notably usernames and passwords. These you can conveniently record as name/value pairs stored in a personal database. I have two observations about that. First, secrets appear to be stored remotely. If so, I’d prefer to keep them local. (Update: They are indeed local, see Tessa’s comment below.) Second, there should be a way to qualify them by domain, because names like “Email Address” and “Password” will soon become overloaded.

One of the delightful things about CoScripter is the simple and natural language used to express sequences of actions. It looks just like the instructions an ordinary user would write down for another ordinary user to follow. By embedding those instructions in an interpreter that makes it easy for anyone to run and debug them step by step, and by reflecting them into a versioned wiki, CoScripter creates a rich environment in which people can record, exchange, and refine their operational knowledge of web applications.

Currently CoScripter is a creature of the web, and specifically of a Firefox-based, Flash-free web. Adapting it to another browser would be hard but doable. Adapting it to work with RIA (rich Internet application) plug-ins like Flash or Silverlight is really problematic, though, because RIA plug-ins don’t mesh very well with the web’s RESTful style.

There are minor exceptions. Back in 2004 I raised that issue in terms of Flash, and Adobe’s Kevin Lynch showed how to materialize URLs for states within a Flash application. But this doesn’t occur normally and naturally when you write a Flash application, as it does when you write a web application. Or rather, as it used to when you wrote a web application, because AJAX also tends to hide an application’s URL namespace.

Because the same issue is going to come up all over again in the context of Silverlight, now would be a good time to think about how Silverlight apps can expose automation interfaces that cooperate with the RESTful web they’re part of.

With any flavor of web application, whether it’s based on simple HTML and JavaScript, or enriched with AJAX, or turbocharged with Flash or Silverlight, it would be great not only to be able to automate as CoScripter can, but also to share and collaboratively refine the scripts. How can we best assure that possibility? Tessa Lau thinks that web accessibility guidelines represent our best hope. If CoScripter-style automation were to catch on it would be a further incentive to adopt those guidelines, and would likely reshape them in useful ways as well.

But why stop there? In principle there’s no reason why desktop applications can’t play the same game, and there are compelling reasons why they should. Today, for example, I found the answers to the 25 top “How do I?” questions asked about Word. Those answers are pointers to articles in the Microsoft knowledge base. For the ever-popular “How do I create mailing labels?”, the answer includes instructions like these:

Open the document in Word, and then start the mail merge. To start a mail merge, follow these steps, as appropriate for the version of Word that you are running:
- Microsoft Word 2002:
  On the Tools menu, click Letters and Mailings, and then click Mail Merge Wizard.
- Microsoft Office Word 2003:
  On the Tools menu, click Letters and Mailings, and then click Mail Merge.
- Microsoft Office Word 2007:
  On the Mailings tab, click Start Mail Merge, and then click Step by Step Mail Merge Wizard.
Under Select document type, click Labels, and then click Next: Starting Document. Step 2 of the Mail Merge appears.
Under Select starting document, click Change document layout or Start from existing document. With the Change document layout option, you can use one of the mail-merge templates to set your label options. When you click Label options, the Label Options dialog box appears. Select the type of printer (dot matrix or laser), the type of label product (such as Avery), and the product number. If you are using a custom label, click Details, and then type the size of the label. Click OK. With the Start from existing document option, you can open an existing mail-merge document and use that as your main document.
Click Next: Select Recipients

The resemblance to CoScripter’s step-by-step instructions is striking. Why shouldn’t instructions like these be able to drive Word’s automation interfaces? Why couldn’t users create and share their own instructions? Sure it’s a desktop application, but nowadays that’s just an endpoint along a continuum of application styles — HTML, JavaScript, AJAX, RIA, desktop app — all of which are connected and can communicate. Collaborative automation is just one of many opportunities to exploit that ability to communicate, but it’s a huge one.

¹ I suspect that Tessa planted that bug intentionally to see if we were paying attention!

XML documents: flavors versus essence

5 Sep 20075 Sep 2007 ~ Jon Udell ~ 2 Comments

I have steered clear of the politics surrounding XML document formats both before and after joining Microsoft. But I was, and will always be, an outspoken advocate for the idea of XML documents. That’s a message that doesn’t make headlines but bears repeating. We have hardly begun to appreciate or exploit the value of XML. A couple of articles in the current issue of CTQuarterly, a journal about how cyberinfrastructure enables science, illuminate that point.

In Next-Generation Implications of Open Access, Paul Ginsparg writes:

One of the surprises of the past two decades is how little progress has been made in the underlying document format employed. Equation-intensive physicists, mathematicians, and computer scientists now generally create PDF from TeX. It is a methodology based on a pre-1980s print-on-paper mentality and not optimized for network distribution. The implications of widespread usage of newer document formats such as Microsoft’s Open Office XML or the OASIS OpenDocument format and the attendant ability to extract semantic information and modularize documents are scarcely appreciated by the research communities.

As the developer of the arXiv (formerly LANL) preprint archive, which predates the web, he understands better than almost anyone how that “pre-1980s print-on-paper” mentality thwarts the advancement of knowledge.

In The Shape of the Scientific Article in The Developing Cyberinfrastructure, Clifford Lynch writes:

We are seeing the deployment of software that computes upon the entire corpus of scientific literature. Such computation includes not only the now familiar and commonplace indexing by various search engines, but also computational analysis, abstraction, correlation, anomaly identification and hypothesis generation that is often termed “data mining” or “text mining.”

I like his tagline for this: “Scientific literature that is computed upon, not merely read by humans.”

XML document formats aren’t a panacea, but when we use them to reduce friction and lower activation thresholds, data will find data, and people will find people. To achieve those effects, the essential property of machine readability matters more than its flavor.

SharePoint, IronPython, and another lesson in the virtue of laziness

4 Sep 20075 Sep 2007 ~ Jon Udell ~ 6 Comments

I’m doing an internal project that involves reading several different data sources from a SharePoint 2007 server, merging them, and posting the merged data back to the server. Being lazy, I wanted to use IronPython, write as little code as possible, and do everything dynamically.

Reading the data sources, which are customized SharePoint lists (i.e., database tables), was straightforward. Every SharePoint list offers an “Export to Spreadsheet” link which produces an XML dump. Given that export URL, here’s a recipe for reading the data (from a Windows client that’s already authenticated to the server) and converting it to a list of Python dictionaries.

import clr
clr.AddReferenceByPartialName('System.Xml')
import System.Xml
from System.Xml import *
from System.Net import WebRequest
from System.IO import StreamReader

def getDataAsListOfXmlNodes(URL):
  request = WebRequest.Create(URL)
  request.Method = "GET"
  request.UseDefaultCredentials = True
  response = request.GetResponse()
  result = StreamReader(response.GetResponseStream()).ReadToEnd()
  doc = XmlDocument()
  doc.LoadXml(result)
  nsmgr = XmlNamespaceManager(doc.NameTable)
  nsmgr.AddNamespace( 'z', '#RowsetSchema')
  nodes = doc.SelectNodes ("//z:row", nsmgr )
  return nodes

def convertNodesToDicts(nodes):
  listOfDicts = []
  for node in nodes:
    attrs = node.Attributes
    dict = {}
    for a in attrs:
      dict[a.Name] = a.Value
    listOfDicts.append(dict)
  return listOfDicts

nodes = getDataAsListOfXmlNodes('http://host/sites/mysite/_vti_bin/...')
dicts = convertNodesToDicts(nodes)

Uploading my merged file to a document library on the server wasn’t so straightforward. I knew that SharePoint provides a set of web services APIs, so I started by acquiring the IronPython “Dynamic Web Services Helpers” from the Web Services sample. Among other things, these wrappers make it trivial to consume a WSDL-based web service. Here, for example, is a snippet that uploads a photo using the Imaging web service:

import System, clr
clr.AddReference("DynamicWebServiceHelpers.dll")
from DynamicWebServiceHelpers import *

filename = 'jon.jpg'
ws = WebService.Load('http://HOST/_vti_bin/Imaging.asmx')
ws.UseDefaultCredentials = True
bytes = open(filename,'rb').read()
bytes = map (ord, list(bytes))
bytes = System.Array.CreateArray(System.Byte,bytes)
ws.Upload('Photos','',bytes,filename,True)

So far, so good. But when I then looked for a generic service to upload any file to any document library, I found myself on a slippery slope. In the midst of exploring how to use the Swiss-Army-knife Lists service to accomplish a simple file upload, I realized I was working way too hard. Back in 2004, Bill Simser reached the same conclusion:

There seemed to be a lot of argument about using Web Services, lists, and all that just to upload a document. It can’t be that hard.

And it isn’t. As others have discovered too, SharePoint responds to a plain old HTTP PUT. Here’s an IronPython update to Bill’s recipe:

def upload(HOST,fname,rdir,rfile):
  wc = WebClient()
  wc.UseDefaultCredentials = True
  bytes = open(fname,'rb').read()
  bytes = map(ord,list(bytes))
  bytes = System.Array.CreateArray(System.Byte,bytes)
  url = '%s/%s/%s' % (HOST, rdir, rfile)
  wc.UploadData(url,'PUT',bytes)

And presto. A local file called, say, myfile.html, lands someplace like http://host/sites/mysite/MyLibrary/myfile.html.

Sheesh. If it feels like you’re working too hard, maybe you are. Step back, take a deep breath, and look for a lazier solution.