I was chatting the other day with Jim Hugunin about an earlier posting on automation and accessibility, and Jim highlighted a point that’s worth calling out separately. If you had a script that could drive an application through all of the things shown in a screencast, you wouldn’t need the screencast. The script would not only embody the knowledge contained passively in the screencast, but would also activate that knowledge, combining task demonstration with task performance.

Of course this isn’t an either/or kind of thing. There would still be reasons to want a screencast too. As James MacLennon pointed out yesterday:

All too often, the classic on-the-job training technique has been “just follow Jim around, and do what he does for the next three weeks …”. This kind of unstructured training doesn’t lend itself to easily to written documentation – it’s the nature of the process as well is the nature of the people. Video, however, allows us to simulate this “follow him around” approach.

Citing Chris Gemingnani’s Excel recreation of a New York Times graphic, James says:

This kind of approach clicked with me, because this was my preferred method for learning a new programming environment. If I could just get an experienced programmer to take me through the edit / compile / debug / build cycle, I would be off and running.

So you’d really want both the screencast and the script — for extra credit, synchronized to work together.

What stands in the way of doing this? Don’t the Office applications, for example, already have the ability to record scripts? Yes, they do, but that flavor of scripting targets what I called engine-based rather than UI-based automation. Try this: Launch Word, turn on macro recording, and then perform the following sequence of actions:

  1. Mailings
  2. Recipients
  3. Type New List

Now switch off the recorder and look at your script. It’s empty, because you haven’t yet done anything with the engine that’s exposed by Word’s automation interface, you’ve only interacted with the user interface in preparation for doing something with the engine.

It would be really useful to be able to capture and replay that interaction. And in fact, I’ve written a little IronPython script that does replay it, using the UI Automation mechanism I discussed in the earlier posting. It’s not yet even really a proof of concept, but it does contain three lines of code that correspond exactly to the above sequence. Each line animates the corresponding piece of Word’s user interface. So when you run the script, the Mailings ribbon is activated, then the Recipients button is highlighted and selected, and then the Type New List menu choice appears and is selected.

What I’m envisioning here is UI-based semantic automation. I call it UI-based to distinguish it from the engine-based approach that bypasses the user interface. I call it semantic because it deals with named objects in addition to keystrokes and mouseclicks. Is this even possible? I think so, but so far I’ve only scratched the surface. Deeper in there be dragons, some of which John Robbins contends with in the article I cited. I’d be curious to know who else has fought those dragons and what lessons have been learned.