Search Results for 'screencasts'


I’m just wrapped up a screencast about the elmcity project. It’ll stand in for me at an upcoming event I can’t attend, and serve as an explanation I can point others too. This is the first screencast I’ve worked on in ages, and also the first in which I appear as a picture-in-picture talking head. The process has been challenging, and I want to write about it while the details are fresh.

Software teleprompters

After writing the script, I realized I’d need a teleprompter in order to read it effectively into the camera. You’d expect to find lots of software prompters floating around on the web, including some free ones, and you’d be right about that. But I had to work through a bunch of them before I found one that worked well for me. I tried CuePrompter, TeleKast, and many others. All failed in some dimension of control: margins, speed, transport. Finally I settled on PromptDog, which is free to try but is the one I’ll buy when I go this route again. It does everything well, but what really put it over the top for me was the way it wires scroll speed to the mouse wheel.

If you’ve read from a software-based teleprompter before, you’ll already know this, but it was new to me. No matter what scroll speed you choose, you need to vary it as you go along. That’s because words and sentences take varying amounts of time to speak, but you need to keep your eyes focused near the top of the screen where the camera sits. With most of the programs I tried, you manage this focal zone by stopping and starting the scroll. But for me, at least, the stops and starts were distracting. PromptDog’s mouse-wheel-driven variable speed control made it much easier to stay in the focal zone. Reading from a software teleprompter is hard, at least for me. I was happy for all the help I could get.

Picture-in-picture video

For this screencast, I upgraded from Camtasia 5 to Camtasia 7. It can record directly from a camcorder, but my second-hand Panasonic PV-GS400 doesn’t seem to work well in that mode. So I recorded to tape, imported the results to a file, and imported that file into Camtasia as a PIP (picture-in-picture) video. On import you tell Camtasia how big your PIP window will be, and where it will show up in the larger video window. I made the PIP window a quarter the size of, vertically centered in, and flush right with the larger 1024×768 video window.

I’d sssumed you could move the PIP window around, and grow it or shrink it, to accomodate different kinds of underlying screencast action. But that assumption was wrong. For a given segment of PIP video, the window stays where you put it. This leads to my first feature request for Camtasia 8: a PIP preview rectangle when recording the screen.

Often it’s OK to let the PIP video just overlay the screen action. But sometimes you don’t want it to hide an essential part of the screen. To avoid that, you have to compose the screen around the PIP window. Lacking a visual cue for the PIP window’s borders, I had to guess. Often I guessed wrong, and had to recompose and reshoot a piece of screen action.

Note that you can vary the size and location of the PIP window by splitting the PIP video into segments and assigning different sizes and locations to each segment. That’s a lot of work, though. And you don’t really want to split the PIP video into segments because then you can’t manipulate the whole track.

Editing audio, motion video, and screen video all together

I made things hard on myself because I’d forgotten that Camtasia invites you to do more integrated editing than you should. In principle you can, for example, run a noise-reduction pass on your audio in Camtasia. In practice, I would prefer to do that in Adobe Audition, which does the job faster and better. What I should have done is grab the sound track out of the captured motion video, run Audition’s noise reducer, recombine the audio and video, and then import into Camtasia for editing.

Instead I edited everything down in Camtasia, then tried to do an export/process/import pass on the audio. When you export, Camtasia renders the audio based on your edits. Unfortunately it came out a few seconds longer than expected. I think that’s because the differing frame rates for screen video on the one hand, and motion video plus audio on the other, make it hard to keep things in synch. Next time around I’m going to try matching the frame rates to see if that helps.

(In the end I decided it was worth redoing the edit anyway, so I split the AVI file I’d recorded from the camcorder, fixed the audio, imported it back into Camtasia, and redid the relatively few edits I’d made to the PIP video.)

In the past, I’ve done some carefully edited screencasts where things that I say are tightly synched to things happening on screen. (“…when I click on this link, we see that …) It’s easy to pull that off when you can’t see the speaker, because you can mess with the screen video, or the audio, or both. When you can see the speaker, it’s much harder. Motion video isn’t nearly so forgiving as audio, so you have to do almost all the synch adjustment in the screen video. Or else re-record some or all of the motion video.

To PIP or not to PIP?

Is all this effort worth the trouble? When Scott Hanselman surveyed his readers about screencasts, he asked, among other things, “PIP or no PIP?” More than half agreed with the statement: “Too much PIP (Picture in Picture) video of the presenter is distracting.” And I think that’s true for screencasts that show how to do stuff with software.

When a screencast shows why to do stuff with software, though, I think the talking head may make more sense. Now, my instinct is to be a voice only, as I am on my podcast. But if the screencast is going to represent me at an event, it seems like I should try to project myself there.

More broadly, the topic is something I care about and have struggled to communicate effectively. If this method of presentation works better than others I’ve tried, even if only for some people, then it’s worth doing. My communication kit needs as many tools as I can pack into it.

Now that I’ve knocked the rust off my screencasting skills, I’m looking forward to redoing this video based on feedback. And since it was made for a ten-minute conference slot, I should probably also do some shorter versions that will work in different contexts.

One thing that’s becoming terribly clear: If I want these to make sense to broad audiences, I need to speak plainly and illustrate with simple everyday examples. I’ve been embarrassingly slow to figure that out, but I am learning. In the screencast I just wrapped, which is all about syndication, I never use that word. It’s a start!

I’ve posted the Python script I used to make the Pivot visualization of this blog. I need to set it aside for now and do other things, but here’s a snapshot of the process for my future self and for anyone else who’s interested.

Using deepzoom.py to create Deep Zoom images and collections

I’m using this Python component to create Deep Zoom images and collections. I made the following changes to it:

1. tile_size=256 (not 254) at line 59, line 160, and line 224

2. source_path.name instead of source_path at line 291

3. destination + '.xml' instead of destination at line 341

Let’s assume that Python is installed, along with the Python Imaging Library, and that your current directory contains the files 001.jpg, 002.jpg, and 003.jpg:

001.jpg
002.jpg
003.jpg

For each image file, you could run deepzoom.py thrice from the command line, like so:

python deepzoom.py -d 001.xml 001.jpg
python deepzoom.py -d 002.xml 002.jpg
python deepzoom.py -d 003.xml 003.jpg

My script doesn’t actually do it that way, it enumerates JPEGs and instantiates deepzoom.py’s ImageCreator object once for each. But either way, for each JPEG you end up with a DZI (Deep Zoom Image) package that consists of (for 001.jpg):

  • A settings file: 001.xml
  • A subdirectory: 001_files
  • More subdirectories (named 0, 1, etc.) inside 001_files
  • JPG files inside those subdirectories

Now, in this case, the current directory looks like this (using -> to mark additions):

001.jpg
-> 001.xml
-> 001_files
002.jpg
-> 002.xml
-> 002_files
003.jpg
-> 003.xml
-> 003_files

To build a collection, do something like this in Python:

from deepzoom import *
images = ['001.xml','002.xml', '003.xml']
creator = CollectionCreator()
creator.create(images, 'dzc_output')

Now the current directory looks like:

001.jpg
001.xml
001_files
002.jpg
002.xml
002_files
003.jpg
003.xml
003_files
-> dzc_output.xml
-> dzc_output_files

The Pivot collection’s CXML file will refer to dzc_output.xml, like so:

<Items ImgBase="dzc_output.xml">

Using IECapt to grab screenshots

This tool uses Internet Explorer, so only works on Windows. There is also CutyCapt for WebKit, which I haven’t tried but would be curious to hear about.

Here’s an example of the IECapt command line I’m using:

iecapt –url=http://blog.jonudell.net/… –delay=1000 –out=tmp.jpg

The result in most cases is a tall skinny JPEG, because it renders the whole page — which can be very long — before imaging it. When I ran it over a 600-item collection, it hung a couple of times because of JavaScript errors. So I went to Internet Options->Browsing in IE, checked Disable script debugging, and unchecked Display a notification about every script error.

Using ImageMagic to crop screenshots

Here’s a picture of an image produced by IECapt, overlaid with a rectangle marking where I want to crop:

The rectangle’s origin is at x=30 and y=180. Its width is 530 pixels, and height 500. Here’s the ImageMagick command to crop a captured image in tmp.jpg into a cropped image in 001.jpg:

convert -quality 100 -crop 530×500+30+180 -border 1×1 -bordercolor Black tmp.jpg 001.jpg

I’m writing this down here mainly for myself. ImageMagic can do everything under the sun, but it always takes me a while to dig up the recipe for a given operation.

Parsing the WordPress export file

I found to my surprise that WordPress currently exports invalid XML. So the script starts with a search-and-replace that looks for this:

xmlns:wp="http://wordpress.org/export/1.0/"

And replaces it with this:

xmlns:wp="http://wordpress.org/export/1.0/"
xmlns:atom="http://www.w3.org/2005/Atom"

Then it walks through the items in the Atom feed, extracting the various things that will become Pivot facets. For the description, it tries to parse the content:encoded element as XML, and find the first paragraph element within it. If that fails, it just treats the element as text and grabs the beginning of it.

Weaving the collection

There are two control files that need to be synchronized. First, there’s dzc_output.xml, for the Deep Zoom collection. It has elements like this:

<I Id=”596″ N=”596″ Source=”2245.xml”>

Then there’s pivot.cxml which drives the visualization. It has elements like this:

<Item Id="596" Img="#596"
  Name="Freebase Gridworks: A power tool for data scrubbers"
  Href="http://blog.jonudell.net/2010/03/26/...
<Description><![CDATA[
I've had many conversations with Stefano Mazzocchi and David Huynh [1, 2, 3]
about the data magic they performed at MIT's Project Simile and now perform
at Metaweb. If you're somebody who values clean data and has wrestled with
the dirty stuff, these screencasts about a forthcoming product called
Freebase Gridworks will make you weep with joy.
]]></Description>
<Facets>
  <Facet Name="date">
    <DateTime Value="2010-03-26T00:00:00-00:00" />
  </Facet>
<Facet Name="tag">
<String Value="freebase" />
<String Value="gridworks" />
<String Value="metaweb" />
</Facet>
  <Facet Name="comments">
    <Number Value="24" />
  </Facet>
</Facets>
</Item>

In this example, Source="2245.xml" in dzc_output.xml refers to a Deep Zoom image whose name comes from the WordPress post_id for that entry, which is:

<wp:post_id>2245</wp:post_id>

But Id="596", which is the connection between dzc_output.xml and pivot.cxml, comes from a counter in the script that increments for each item processed. I don’t know why the numbering of items in the WordPress export file is sparse, but it is, hence the difference.

Things to do

Here are some ideas for next steps.

1. Check the comment logic. I just noticed the counts seem odd. Maybe because I’m counting all comments instead of approved comments?

2. Use HTML Tidy to ensure that item content will parse as XML, and then count various kinds of elements within it: tables, images, etc.

2. Use APIs of various services — Twitter, bit.ly, etc. — to count reactions to each item.

I’ve had many conversations with Stefano Mazzocchi and David Huynh [1, 2, 3] about the data magic they performed at MIT’s Project Simile and now perform at Metaweb. If you’re somebody who values clean data and has wrestled with the dirty stuff, these screencasts about a forthcoming product called Freebase Gridworks will make you weep with joy.

There’s one by David, and another by Stefano. Using common public datasets about food, international disasters, and US government contracts, they fly through a series of transformations that:

  • Merge similar names using a host of methods:

    • Automatic title-casing
    • A rich expression language
    • Analysis of “edit distance” between similar phrases, using several clustering algorithms
  • Split multi-valued facets
  • Create new facets (e.g., a year column from a data column)
  • Morph linear scales to log scales where appropriate

It’s all live, undoable, and fully instrumented, by which I mean that every transformation updates the counts of the values in each facet, and displays histograms of the new distribution of values — along with sliders for selecting and focusing on subsets.

As the open data juggernaut picks up steam, a lot of folks are going to discover what some of us have known all along. Much of the data that’s lying around is a mess. That’s partly because nobody has ever really looked at it. As a new wave of visualization tools arrives, there will be more eyeballs on more data, and that’s a great thing. But we’ll also need to be able to lay hands on the data and clean up the messes we can begin to see. As we do, we’ll want to be using tools that do the kinds of things shown in the Gridworks screencasts.

My guest for this week’s Innovators show is Sal Khan. He’s the creator of http://khanacademy.org, a catalog of more than 1000 YouTube video lessons in math, physics, biology, chemistry, and economics. All of these videos are made by Sal himself, in an engagingly personal style, using simple screencasting tools.

When I first got interested in screencasting, I envisioned the medium not only as a way to demonstrate software, but also as a way to share knowledge at Internet scale. Sal’s work fulfills that vision, and points the way toward a profound and much-needed disruption of our educational system.

At its core, Sal’s project isn’t about YouTube screencasts. It’s about intuition.

I always got frustrated by what went on in the classroom. You see otherwise intelligent peers memorizing facts and not really caring about the actual intuition. And because they didn’t care about the intution in their junior year, when that same idea pops up in senior year, it’s like they’ve never seen it before. It boggled my mind. You’re just relabeling the same concept over and over.

Sal cares about the intuition, and he wants others to care about the intution too. The first beneficiary of that desire was his cousin Nadia, whom he tutored remotely. Then followed other cousins and family friends. Then it dawned on him that there were no limits. The project could scale out. He could become a superempowered individual, reaching anyone who finds value in his method.

One of the key ingredients of that method is improvisation. These videos aren’t carefully planned, and they aren’t edited. As a viewer, you find yourself looking over the shoulder of a smart and broadly knowledgeable person who is solving problems by thinking on his feet. You watch a practitioner at work: engaged with his medium, wrestling with his tools, correcting false starts.

It was Chris Gemignani who first showed me the value of this approach, in a screencast that teaches how to do unexpectedly powerful and elegant Excel charting. He did it in one take. I’d have been tempted to edit out the false starts. But Chris knew better. Learning how a practitioner really thinks about solving a problem is even more valuable than learning the solution to the problem.

One thing that Sal’s lessons can’t be, of course, is interactive. Nor does he pretend that these videos will make teachers obsolete. But he does suggest, and I violently agree, that teachers can and should become curators of online assets like the ones Sal is creating, and should know when and how to weave those assets into their classes.

Teachers should also become connectors. Sal won’t be the only game in town. Other superempowered tutors will emerge. Each will have a unique style. For a given student, a given subject, and a given problem, one or another of those styles may be right. The best teachers will know their own strengths and limitations, will know which online tutors complement their strengths in a variety of ways, and will connect their students with those tutors.

Sal Khan is on fire. He burns with a passion to share his intuitions with anyone and everyone. It is a beautiful thing to see. He has abandoned a lucrative career in finance to do this fulltime, and I am quite sure he will find a way to keep doing it.


PS: The title of this piece refers to Richard Ankrom’s Los Angeles freeway project. At a busy intersection, millions of motorists have been directed to North 5 by a sign that Caltrans omitted. Ankrom created and installed that missing sign.

PPS: I wrote to my son’s math teacher about Sal Khan. She replied: “Thanks for that link to the Khan Academy. I was overwhelmed by how many video lessons he has! He does seem like an inspiring man. Unfortunately, You Tube is blocked here at the high school.”

In his writeup on Google Wave, Dare Obasanjo says:

I’m sure there are thousands of Web developers out there right now asking themselves “would my app be better if users could see each others’ edits in real time?”,”should we add a playback feature to our service as well” [ed note - wikipedia could really use this] and “why don’t we support seamless drag and drop in our application?”. All inspired by their exposure to Google Wave.

Indeed, every application that preserves a change history needs playback. Wikipedia, as Dare notes, is a prime candidate. Back in 2006, I made this LazyWeb request:

Animation is the best way to visualize the flow of change, as I discovered when I made my Wikipedia screencast. For Wikipedia, and indeed for all kinds of living documents supported by revision history and diff tools, I can imagine being able to isolate a paragraph or section and autogenerate the screencast of its evolution. I can even imagine the content of such visualizations being considered not just cutting-room floor debris but, rather, part of the “real” document, like footnotes.

Andy Baio responded by sponsoring a contest for a tool that would do just that. And I made a screencast demonstrating Dan Phiffer’s winning entry.

That script is unavailable at the moment because, ironically, Dan’s server reports:

Oh noes! I got HACK*D. I’m sifting through my files and should restore things back to normal soon.

In any case, it probably wasn’t practical for routine use. Fetching every revision on the fly really hammers Wikipedia. What’s really needed — again, not just for Wikipedia but everywhere — is a general way to query change history, and return a stream of versions and differences.

One way of doing the latter would be to use FeedSync, an open extension to RSS/Atom that supports synchronization in Live Mesh. Another would be to use Google’s Wave protocol. Because FeedSync deals with lists of items, which can be arbitrary chunks of content, whereas Wave deals with lists of document-mutation operations, like delete-element and start-annotation, it seems to me that FeedSync is more general, albeit less immediately useful for collaborative editing.

To explain why generality matters, consider change animation in a very different domain: software configuration. My wife, for example, sometimes changes her settings — in Word or Firefox — in ways that cause problems. If these apps persisted their settings to Live Mesh, as they could and arguably should, I’d be able to debug a mishap locally or remotely. But ideally, the change visualization would be sufficiently user-friendly so that she’d have a shot at figuring it out for herself.


PS: Speaking of history and restoration, I’ve been feeling like an amnesiac ever since my InfoWorld archive went dark. So in spare moments I’ve been reconstructing and republishing it. I’ll have the text of all the old blog entries up soon. And I’ve been restoring the screencasts as well. I’m keeping track of my progress at delicious.com/judell/screencast+restored.

On this week’s ITConversations show I finally got to meet Jean-Claude Bradley, the Drexel chemistry professor who coined the phrase open notebook science and who champions the principles behind it.

There were a couple of surprises for me. First, I was intrigued to learn about Jean-Claude’s vision for mechanized research. I’ve always thought of open notebook science as a way to speed up the iterative cycle of research and publication, and to engage more human minds in collaboration. Of course Jean-Claude thinks so too. But he also thinks that when data are published in accessible formats, and exposed to computational processes running in the cloud, we’ll be able to automate certain aspects of research.

It reminds me of George Hripcsak’s effort to mechanize the interpretation of electronic health records. In general, we’re collecting way more data than the collectors can analyze. Crowdsourcing is one solution to this problem. Mechanization is another. We’ll need both.

The other surprise was hearing about Drexel’s fairly aggressive use of Second Life. I’ve been an amused skeptic on that front, but Jean-Claude’s passionate advocacy requires me to rethink that stance.

What didn’t surprise me, but might well surprise tuition-paying parents of Drexel students, was Jean-Claude’s attitude toward the classroom. He mostly doesn’t see a need for it. The content delivery aspect of education, he feels, is best handled in other ways, including screencasts and podcasts as well as traditional texts. There can, and should, be a range of sources, to accommodate the differing inclinations of learners. And teachers need to be competent producers and orchestrators of those sources. But for Jean-Claude, the best way to engage directly with students is to meet with individuals, not with whole classes.

Now admittedly, a chemistry class doesn’t invite and thrive on group discussion in the same way that, for example, a literature class does. And yet Jean-Claude says that a literature class was one of the models for his use of Second Life. When group interaction is central to the educational experience, he thinks that virtual environments — though he doesn’t require their use — may outperform real ones.

I remain skeptical on that point, but I’m always open-minded, so I hope Jean-Claude will take me up on my offer to visit one of his virtual environments and document the interactions that happen there.

I’m loving YouTube’s new video annotation feature, which Phil Shapiro alerted me to. Lots of people are going to have lots of fun with that. If you remember when MTV first started doing popup video, you’ll have some idea how much fun.

But from Phil’s perspective and mine, this is a seriously useful tool as well. He’s planning to annotate screencasts with it. And I found a great use for it here.

That short video features Bob Coffey, the senior climber at our YMCA. When I made and posted the video, I wasn’t quite sure how senior Bob was so I didn’t say. Yesterday I remembered to ask. Turns out he is 79.

It’s painful to add new information to a video. Opening up the raw file (if you even kept it around), adding a caption, recompressing, reuploading — it’s too much overhead, and unless there’s a compelling need you’re just not going to bother.

Of course you can update the textual wrapper, and alter the title or description. But in this case, I didn’t want to that. The information is much more effective when inserted midstream. After he’s scampered halfway up the wall, the popup annotation saying “Bob Coffey is 79 years old” makes the point more subtly and powerfully.

The point, by the way, is that we can do more, physically, at all ages, than we think. I’ve known a few people over the years who have redefined what’s possible, and it’s always an inspiring thing to see.

Today I’m launching a new Microsoft-oriented interview series called Perspectives. The show will touch on a variety of topics including robotics, digital identity, e-science, and social software. I’ll be speaking mostly with passionate Microsoft innovators, and sometimes also with key partners from academia and industry.

The format is an audio podcast and a blog, where the blog provides a partial (but substantial) text transcription in order to make these conversations accessible to folks who don’t listen to podcasts, and also to expose them to the Net’s ecosystem of search, linking, and aggregation. Where appropriate, I’ll also use screencasts to show software in action.

Perspectives runs on the same publishing platform that supports Channel 10 (for enthusiasts), Channel 8 (for students), TechNet Edge (for IT pros), and VisitMIX (for Web designers and developers). (Channel 9, the original site, will migrate to this platform too.) Perspectives intersects with the interests of all these sites, but it doesn’t really belong in any of them, so we’ve created an independent home for it. Thanks to the EvNet team, especially Duncan Mackenzie, David Shadle, and Jeff Sandquist, for making that happen.

The first episode, with Henrik Nielsen and Tandy Trower, explores the Microsoft Robotics initiative. We discuss why robotics is — as futurist Paul Saffo believes — a Next Big Thing. And Henrik and Tandy explain how the concurrency and decentralized-services infrastructure that supports the robotics platform is broadly relevant in an era of loosely-coupled services.

I was pleased to see the announcement that Novell and Microsoft are collaborating on the User Interface Automation (UIA) stuff. My mom can use all the help she can get. But as I discussed in Automation and accessibility, beefing up our ability to automate software in a consistent way can give us huge leverage in other areas, like education, training, and collaboration.

In The social scripting continuum I suggested that a system like CoScripter could automate desktop and web applications in a common way. Here’s one way to think of the benefit of doing that. Today, I can share software-related task knowledge in a social manner by creating and posting screencasts. But you can only watch a screencast. If I could instead share that task knowledge in the form of standardized high-level scripts, you wouldn’t need to watch the screencast. Of course, you might want to, for other reasons, but not simply to get the procedural knowledge transferred from my brain and fingers to yours.

Given how popular screencasts have become in three years, I’ve got a hunch that taking things to that next level would be huge. And lord knows I’d love to be able to convey packages of procedural knowledge to my mom that way.

I spend a lot of time recording and editing audio interviews for two shows: ITConversations and Perspectives. I also do a lot of interview-style screencasts. I’ve been meaning to write up a FAQ for interviewees, so here goes.

Preparation

As the interviewee, you need not prepare anything. Your life is the preparation. You might, however, want to help me prepare, by referring me to background materials that I may not already know about. I don’t show up with a script in mind, but I do like to be as informed as I can be.

Recording

My preference is that you use a landline, not a cellphone or a speakerphone. If you have a strong preference for Skype, I can accommodate.

Either way, it’s ideal if you can make a decent recording of your half of the call — for example, by using a USB microphone plugged into your computer, or a standalone digital audio recorder — and convey that recording to me as uncompressed audio. It’s easy to splice the two halves of the conversation together in post-production, and if you got a decent result on your end, the combined result will be way better than any current scheme for squirting audio through a long-haul network. If the local recording doesn’t pan out, we’ll just fall back to the phone recording that will occur in parallel.

Editing

As discussed in this essay on the audio digital darkroom, I’m fairly aggressive about editing audio interviews. As a result, you and I will come out sounding somewhat better than we really are. I do this out of respect both for the listeners’ attention, and for the importance of the ideas we’re discussing.

The amount of editing varies from show to show. Some hour-long interviews have produced twenty-minute shows, other hour-long interviews have produced fifty-five-minute shows. I would say the compression is normally in the ten-to-fifteen-percent range. In all cases, I apply one rule: Focus on the most interesting and important stuff. Interviewees have so far always been pleased with the results.

One of the useful consequences of this approach is that, since you know there’s a safety net, you can relax, there’s no pressure to perform flawlessly, and we can work together to capture the interesting and important stuff.

Last month in an item about working with crime data I asked:

Will there be a role for IronPython (or IronRuby) here, someday, such that you could use these languages inside Excel? That’d be very cool.

Several folks suggested that I should take a look at Resolver, an IronPython-based spreadsheet that deeply unifies Pythonic object-oriented programming with the sort of direct manipulation that makes the spreadsheet so useful. Resolver was and still is in private beta, but today’s screencast (Flash, Silverlight) will give you a good sense of what it’s all about.

The presenters are Giles Thomas, managing director and CTO of Resolver Systems (and creator of his own Resolver screencast), and Michael Foord, who blogs about Python, contributes to the IronPython cookbook, and is also working on the forthcoming book IronPython in Action.

If you are (or would like to be) using Python to wrangle business data, Resolver will make sense immediately. You’ll love the idea of wielding Python’s powerful data manipulation features in that context. You’ll appreciate what it would mean to harness not only the Python standard libraries but, because Resolver is IronPython-based, also the .NET Framework and the universe of third-party .NET assemblies. And you’ll be intrigued by the way in which the IronPython code that represents and animates a Resolver spreadsheet can be reused elsewhere — for example, in web applications.

But there’s more to the story. Because a cell in a Resolver spreadsheet can contain a reference to any .NET object, Resolver creates, as Giles Thomas says, “a somewhat pathological but entirely new way of programming using a spreadsheet.” You can, for example, define an anonymous function — say, a function that returns the square of its argument — and store it in cell B4. Then you can place a value — say, 5 — in cell A2. Then you can store this formula in cell B6:

=B4(A2)

That says: “Apply the squaring function in B4 to the value in A2.” The result in B6 will be 25.

I’ve long argued that the interactive and exploratory style of dynamic object-oriented languages is an important but underappreciated benefit. As I may have mentioned before, IronPython’s creator Jim Hugunin told me that when he first showed IronPython to folks at Microsoft, he was surprised by their reaction. He thought the big wow would be IronPython’s ability to streamline and accelerate use of the .NET Framework. But while people did appreciate that, they were truly wowed by something that’s second nature to every Python programmer — the read/eval/print loop which traces all the way back to the earliest Lisp systems.

It is a magical and powerful thing to be able to explore and modify a running program’s code and data. From those early Lisp systems to today’s Python and Ruby implementations, we have been doing that exploration and modification using a command line.1 We can trick it out with recall, name completion, and search, but it’s still a command line with all the limitations that entails. If I’ve defined an object A and stored some code or data there, my definition and invocations of A will scroll out of view as I continue to work. They won’t be visually persistent.

In a Resolver spreadsheet, these objects are visually persistent. I haven’t yet got my hands on Resolver, but here’s an example of what I think that will mean. Suppose that I have a data set I want to transform, against which I’m testing five different versions of a transformation function. I’d put the data in cell A1, the functions in cells B1..B5, and the results in C1..C5. Now I’ll see everything at a glance. The spreadsheet that would conventionally have been the results viewer at the end of a series of tests becomes the environment in which the tests are written, performed, and evaluated.

The spreadsheet is also an important bridge between programmers and their business sponsors. It’s no accident that Ward Cunningham’s FIT (Framework for Integrated Test) was originally inspired by Ward’s experience of inviting business analysts to write test cases in spreadsheets. In its current form, FIT uses HTML tables in a wiki as the bridge between analysts who write tests and developers who write the code that must pass those tests. I think Resolver and FIT may prove to be a marriage made in heaven.

While Resolver will initially appeal to business programmers who appreciate Python as a language, and IronPython as a way of leveraging the .NET Framework and .NET-based business logic, the ideas it embodies transcend Python and .NET. I’ll be fascinated to see how this “pathological but entirely new way of programming using a spreadsheet” will evolve.


1 Smalltalkers will note that they have been using a three-pane browser all along, and that’s true. However the spreadsheet metaphor, in this context, is something else again.

I’ve been doing some experiments to find out how the Silverlight plug-in will work as a player for screencasts. On this test page you’ll find four different versions of a 23-second clip. There’s one for Quicktime, one for Windows Media, one for Flash, and one for Silverlight.

Some important variables, from a screencaster’s perspective, are: legibility, file size, and convenience of production, deployment, and viewing.

That legibility matters seems obvious, but I see an awful lot of screencasts delivered at squinty resolutions. This puzzles me. The purpose of a screencast is to show and describe on-screen action. If you can’t read the screen, what’s the point?

All four of these examples are legible. The Quicktime version achieves the best clarity, but there’s a tradeoff: it’s also the largest file.

That size matters is perhaps less obvious to those of us living in the developed world. But as I’ve been recently reminded by both Beth Kanter and Barbara Aronson, much of the world remains bandwidth-challenged. Videos that don’t squeeze themselves down will not be seen in many places where they should be.

Among these four examples, Windows Media weighs in lightest at under half a megabyte. That works out to about a megabyte per minute, which is the target I like to shoot for. If it’s possible to deliver a legible screencast at a data rate significantly less then that, I’d like to know how.

The sizes of the other versions in this example, in ascending order: Flash 1.2MB, Silverlight 1.5MB, Quicktime 2MB.

Of course these sizes depend on which encoder is used, and on which settings are applied. For these tests, I produced all of the screencasts in Camtasia. For Quicktime and Windows Media, Camtasia uses the encoders that come with those platforms. For Flash, it supplies an encoder. For Silverlight, it doesn’t yet supply an encoder so I produced an uncompressed AVI and then used Expression Encoder to create a Silverlight-compatible WMV file.

I should add here that, despite all the work I’ve done in this area, I’m still a bit vague on the concept of a screen encoder — that is, a video encoder that’s tuned for the kinds of low-motion but text-rich content that’s typical of screencasts. In beta versions of Silverlight and Expression Encoder, for example, there wasn’t a screen video option, so the only way to produce a legible screencast was to crank up a motion-video encoder to the maximum data rate, which produced a massive file. Now Expression Encoder provides a screen encoding option, which I used for this test and which Silverlight 1.0 can obviously play back.

It seems to me that Camtasia should be able to use that encoder directly, but until I figure out how, it will be less convenient to produce Silverlight screencasts from Camtasia than to produce the other formats. Rendering to AVI as an intermediate step is doable, but time-consuming.

In terms of deployment convenience, one measure is the number of supporting HTML, JavaScript, configuration, and other files required in order to play a screencast. I’m a minimalist, so when I deploy Camtasia screencasts I throw away the wrappers that Camtasia generates and go with the Simplest Thing That Could Possibly Work. From my perspective, that winds up being an OBJECT tag (and, sigh, also an EMBED tag) for Quicktime or Windows Media, plus a reference to a minimal player in the case of Flash. By comparison, my Expression-generated Silverlight example has lots of moving parts — an HTML file, a XAML file, a flock of JavaScript files, and the WMV file.

The Silverlight example could of course be simplified by coalescing the JavaScript support, but that alone won’t solve another issue of deployment convenience. It’s nice to be able to embed a screencast in any arbitrary web host. From the perspective of my WordPress.com blog, that’s an issue for all four of these approaches. WordPress is always coming up with new ways to embed video from various services, but the reason that’s necessary is that WordPress.com — quite rationally — strips out most of the advanced HTML tags and JavaScript support that you might want to include in your blog postings. In general, embedded video seems to be a game of point solutions. In order to embed video flavor X in web host Y you need a specific X+Y adapter. I understand the reasons why, but it’s frustrating.

One of those adapters, by the way, will be needed for WordPress.com and Silverlight Streaming, which is the Microsoft hosting service announced at the MIX conference earlier this year. I’ve hosted another version of my Silverlight example there. It’s the same set of files as this example, minus the HTML wrapper and the core Silverlight JavaScript code, plus an XML manifest, all packaged up in a zip file. I’m not expecting my little test to attract millions of viewers, but if I were, this hosting service would be one way to handle the load.

In terms of viewing convenience, the Silverlight example exhibits a nice property that I wasn’t expecting. When you resize the window containing the player, the player scales to fit. I’m pretty sure the embeddable Quicktime and Windows Media players can’t do that. Flash-based media players are more customizable, and can respond to container resize events, but I don’t think I’ve ever seen the technique applied to a screencast. It’s a nice idea. A screencast at 1:1 resolution is guaranteed to be legible, but will also consume a lot of screen real estate. So it’s tempting to shrink its width and height in production. But by how much? Any fixed resolution will work well for some people and not others. Resizable screencasts would be great for accessibility.

Of course you can resize any standalone player. So this issue boils down to what’s possible when the player is embedded in a web page. And as we’ve seen, embedding can be problematic. In general, we need to work toward a smoother transition between embedded and standalone viewing experiences.

The ultimate test of viewing convenience is, of course: Does it play instantly, regardless of the operating system or browser I happen to be using? Flash leads the way in that regard. Silverlight aspires to the same level of plug-in ubiquity, and with the announcement of Moonlight that aspiration seems achievable.

Ultimately a screencaster wants to be able to produce one video that works well for everyone, everywhere, for various definitions of works well. That’s a hard problem. Solutions depend on the raw capabilities of media players, it’s true. But they also depend on an ecosystem of plug-ins, browsers, encoders, operating systems, and hosting services.

I am an immediate fan of Common Craft’s style of concept videos. Their explanations of how and why to use del.icio.us and Google Docs are crisp and entertaining. They convey the essence of these activities more clearly than any other visual explanations I’ve seen, including many of the screencasts I’ve made.

The style is called paperworks because these sketchcasts are made by capturing screenshots, printing out key elements, and then filming, animating, annotating, and narrating arrangements and rearrangements of these scraps of paper. The first time you watch one, you’ll be captivated: it’s cute, it’s fresh. But is this just a gimmick? After you watch a few more, and you begin to acclimate to the style, does its effectiveness wane? Not yet, for me, because these productions have more going for them than cuteness and freshness.

One of the principles at work here is the moral equivalent of cropping and zooming in the screencast medium. When you’re trying to explain software on a conceptual level, images captured from screens can be a mixed blessing. It’s valuable to show exactly what screens look like, and exactly how actions flow within and across them. But the amount of detail that’s visible in a typical screen can often distract from the story you’re trying to tell. By cropping the screen, and/or by zooming in on the active region, you can prune away a lot of visual clutter and focus on key interactions. The paperworks style is an extreme form of cropping and zooming; it prunes and focuses very aggressively.

Another principle is sketching. According to Bill Buxton, sketching goes hand in hand with what he calls design thinking. When I asked Bill how he would have used sketching in the design of a feature like the Office ribbon, he said:

You’d start with paper prototyping — quickly hand-rendered versions, and for the pulldown menus and other objects you’d have Post-It notes. So when somebody comes with a pencil and pretends it’s their stylus and they click on something, you’ve anticipated the things they’ll do, and you stick down a Post-It note.

If that’s a helpful way to imagine software interaction in the design phase, why wouldn’t it also be a helpful way to conceptualize the software in use? The paperworks style strongly suggests that it is. These sketchcasts are great visual explanations of working software. I suspect they’d be equally useful during the design of that software.

In last week’s item on social scripting, I suggested that CoScripter’s automation strategy — based on simple English instructions that people can easily read, write, and share — could in theory work across the continuum of application styles. And arguably it will need to, because we’re increasingly likely to mix those styles. If you begin to rely on an automation sequence for your bank’s web application, for example, you’ll be sorry to have it broken by an upgrade that introduces AJAX, Flash, or Silverlight components.

What enables CoScripter to work in the web domain is the document object model (DOM) of which every web page is a rendering. Because JavaScript code can explore and interact with the DOM’s tree of user-interface objects, the browser can be driven semantically, by object names and properties, rather than literally, by mouse clicks and keystrokes. The literal method is workable, and there many tools that make excellent use of it. The semantic method is more reliable if available, but it isn’t always. So the literal method winds up being the common denominator, because every style of application will respond to mouse clicks and keystrokes.

There is another kind of semantic technique long supported by desktop applications that define object models, notably the Mac’s AppleScript object model and Windows’ Component Object Model. These technologies enable automation scripts to reach below the user interface of applications, and to work with their internal machinery.

Using the Word object model, for example, you can automate a mail merge. If you run this program, you’ll see Word launch, you’ll see a data document written by an invisible hand, and then you’ll see a mail merge appear. What you won’t see are the user-interface actions required to produce these effects, because this level of automation bypasses the user interface.

So let’s distinguish between two flavors of semantic automation. The mail merge script does what I’ll call engine-based semantic automation. And CoScripter does what I’ll call UI-based semantic automation.

These two flavors are useful in quite different ways. With the engine-based approach, an automation script uses the application as if it (the application) were a service. In this case you don’t want windows and dialog boxes popping up all over the place, you just want to feed inputs and harvest outputs. The engine-based approach works accurately and efficiently, but it doesn’t yield a representation of task knowledge that a normal person could use, learn from, adapt, or share.

With the UI-based approach, an automation script uses the application as if it (the script) were a human being. It sees and touches exactly what the human sees and touches. This is not the optimal way to crank out a thousand mailing labels. But the UI-based approach does yield a representation of task knowledge that a normal person could use, learn from, adapt, or share.

Shareable representations of task knowledge are incredibly useful and powerful. Screencasts are one such representation, and as many people have noticed in recent years, they can radically outperform traditional forms of documentation. But you can’t interact with a screencast or concisely describe it. You can only watch and learn and imitate. Although that’s way better than not being able to watch and learn and imitate, interaction and concise description would be better still.

CoScripter delivers that superior experience of interaction and concise description. It does so by means of UI-based semantic automation which, in turn, is enabled by the browser’s document object model.

What might enable a more comprehensive flavor of UI-based semantic automation? Noodling on this question I arrived at one possible answer: the Windows UI Automation API, which is part of .NET Framework 3.0. I’d heard of it, but hadn’t connected the dots. In this June 2005 article for the ACM’s Special Interest Group on Accessible Computing, Rob Haverty lays out the rationale for this relatively new mechanism:

Windows UI Automation unifies disparate UI Frameworks such as Avalon [Windows Presentation Foundation], Trident [the browser], and Win32 so that code can be written against one API rather than several.

The basis of this unification is a tree of automation elements that is, in effect, a generic document object model. Automation providers map various specific object models, notably those of the browser and of Windows, into the generic tree. The API provides mechanisms for searching the tree and interacting with its elements.

It’s a powerful system that is also accurately described by John Robbins as “intensively fiddly.” So in this March 2007 MSDN article, he provides and illustrates the use of a set of convenience wrappers around the raw System.Windows.Automation classes. The sample program included with that article drives Notepad through a few basic operations. Could it be extended in the direction of CoScripter, in a way that realizes UI Automation’s ambition to uniformly control Windows and web applications?

I took a crack at that, and concluded that creating even a proof-of-concept will require more time and more programming chops than I can muster. But I’d be interested to hear from anyone who’s gone further down that path. I think this is potentially a very big deal. Although I suspect most programmers see UI Automation in the context of software testing, for which it is indeed well suited, Rob Haverty’s article suggests that it was primarily motivated by the need for better assistive technologies and improved accessibility.

When Tessa Lau says that accessibility guidelines are the lifeblood of CoScripter, she’s talking about affordances for people whose cannot otherwise use the full capability of their software. But consider Rob Haverty’s definition of accessible technology:

Accessible technology enables individuals to adjust their computers to meet their visual, hearing, dexterity, cognitive, and speech needs.

I like his use of the word cognitive because in some sense we are all cognitively impaired when we try to use software. For most people, most of the time, the concept count is way too high. We don’t normally think of automation as an assistive technology. But arguably it is one. And when automation yields interactive documentation that lives in shared information spaces, it becomes a really potent assistive technology.

In case it’s not obvious, I am not claiming that Windows UI Automation can realize this vision of assistive automation across the spectrum of application types. It’s currently only available by default for Vista, and optionally for Windows XP if enhanced with the .NET Framework 3.0. It is not part of Silverlight or Moonlight, though conceivably one day it might be. And it clearly has nothing to do with Mac OS X, or Java, or Flash, or the Linux desktop.

But the idea of UI-based semantic automation is something that could apply in all these domains. A proof-of-concept CoScripter-like application-plus-service spanning two major domains — Windows desktop apps and browser-based apps running on Windows — would be a big step toward that broader vision.

When I read this story about cancer care in the Sunday New York Times yesterday, I was struck by one particular information graphic which I thought was very nicely done:

It turns out that Chris Gemignani was impressed too, and he decided to recreate the image using Excel. Here’s what he came up with:

Going one huge step further, and in the spirit of today’s theme of narrating the work, he created a screencast in which he demonstrates the process of making that graphic. It’s a wonderful example of the dynamic I’ve been describing. One of the commenters on Chris’ blog thanks him for teaching him some helpful techniques. Another suggests a technique that Chris hadn’t used but thinks is interesting. Very cool!

With Excel, as with all software — on the desktop and on the web — there’s so much untapped potential. The obstacles are knowing what’s even possible, and then knowing how to achieve it. Screencasts like this one leap over both obstacles in a single bound.

While I was editing today’s screencast I kept a log of my edits, and I’ve included that log below. As is typical when I edit screencasts, this one squeezed down quite a lot: from almost 54 minutes to 34 minutes. The result not only saves the viewer a precious 20 minutes, but also unfolds in a far more entertaining and engaging way.

I’ve written a lot about why to do this kind of editing, but never shown in detail what the process is like. For folks who are familiar with the editing process — in any medium — this is all just basic knowledge and common sense. But there are lots of folks who are not familiar with the editing process in any medium. So to convey what it’s like, I decided to narrate (part of) the editing of this particular screencast.

As I’ve mentioned before, there’s one huge difference between editing audio and editing video. With audio, as with text, you can seamlessly cut and rearrange to your heart’s content. With video, the need to preserve visual continuity imposes severe limits, especially on the so-called internal edits that elide words and phrases. It’s interesting to note that, in this respect, the demo/interview genre of screencasting has more in common with audio than with video. There’s usually a lot less happening in a screencast than in motion video. So you can usually get away with the sort of heavy editing that’s normally only possible in the audio domain. And it’s very useful to be able to do that.


(Initial length: 53:45.)

I cut the first 2.5 mins of Henrik talking in general terms about CCR, DSS, the programming model. Why? Nothing to show, and this info is available elsewhere.

The real meat of this demo is to show how the Robotics Studio exposes a RESTful interface, and to demo interactions with (real and simulated) robots using that interface.

In the next segment, Henrik starts by saying “I have a nice big robot next to me, I might be able to show you, if I can just…”

I then cut 15 seconds of him fumbling around in the services directory and muttering to himself, while hunting for the webcam interface. So it went from:

“I might be able to show you, if I can just” …. 15 seconds of fumbling and muttering … “there you go! [image appears]”

to:

“I might be able to show you, if I can just … there you go! [image appears]”

This is partly about respect for the viewer’s time, because people have better things to do than watch and listen to 15 seconds of fumbling and muttering. And it’s partly about keeping the storyline moving forward in an engaging way.

A subtle point here is that I left in just enough of the fumbling and muttering. If I had reduced it to:

“I might just be able to show you…there you go!”

then it would have felt overproduced and inauthentic. I want Henrik to fumble and mutter a little bit, that’s part of the whole charm of the thing. But I want to limit the fumbling and muttering to a reasonable length. I think that leaving in “if I can just…” retains just enough of that quality — but not too much:

“I might be able to show you…if I can just…there you go! [image appears]”

During the next stretch I made no major cuts, but lots of little ones in the range of 2 to 5 seconds. These are places where the audio pauses because Henrik is thinking, or waiting for the computer to respond. And they are also places where he’s just verbally warming up to what he really wants to say — or where I am doing the same.

Example: “…and so, um…” –> “” == 2 seconds saved

Example: “And what we have here is that, um, and so, everybody has seen a web server” –> “Everybody has seen a web server” == 5 seconds saved

These internal cuts are completely inaudible and, so long as they don’t interrupt the onscreen action, also invisible. Since a typical screencast is often visually quiescent there are many opportunities to make these cuts. They not only reduce the end-to-end time, but also — just as important — they make the video far more watchable.

The next major cut was a 20-second setup leading to the statement: “In our model, everybody is a client and everybody is a server.” In the setup, Henrik talked about how typical web apps (like home banking) exhibit a more classical client/server architecture. It was a judgement call but, in this case, I decided that the kinds of folks who will care about RESTful interfaces to a robotic services fabric didn’t need the setup, and that it was more valuable to shave those 20 seconds than to keep them.

It’s worth noting how the context supported making this cut. Originally:

“We have services that talk to each other, that wire each other up, and use each other to construct and compose applications.”

… 20-second setup ….

“In our model, everybody is a client and everybody is a server.”

Finally:

“We have services that talk to each other, that wire each other up, and use each other to construct and compose applications. In our model, everybody is a client and everybody is a server.”

It flows perfectly.

Next I cut a restatement of “everybody is a client and a server” which chewed up 5 or 6 seconds without adding anything new. In doing so I ran into a logistical problem. When trying to make precise audio cuts in Camtasia you can run into trouble in tight spaces. (I keep meaning — and keep forgetting — to mitigate this problem by capturing at a higher frame rate than the one at which I finally produce.) A workaround is to silence a region that’s too small to accurately cut.

So, for example, after cutting that restatement I wound up with:

...at the same time same time. That has some great benefits.
                    ---------

I wasn’t able to cut the redundant “same time” without affecting the “That has” — but I was able to replace the redundant “same time” with silence:

...at the same time. ________ That has some great benefits.

That left a perfectly natural-sounding 1-second pause.

(Length now: 49:38)

Through the next section I made assorted internal cuts, and one major cut. After Henrik contrasted OO-style inheritance with the additive composition of RESTful services which is the extension pattern for the Robotics Studio, we got into a several-minute discussion about the tradeoffs between these approaches. It wasn’t really conclusive, though, and I realized that it would be better to factor that out. In fact, while recording, I decided at this point to do a separate podcast in which we’ll drill down on these more abstract points. In a screencast, you want to keep the visuals moving along.

(Length now: 46:38)

For the remainder, more of the same: internal cuts, plus 20- or 30-second chunks that were disposable.

Final length: 34:30

While I’m back on the topic of screencasting, I’ve been meaning to mention another important use of the medium. Recently a colleague reported severe trouble trying to present demos that rely on a live connection to the Internet. My solution is a variation of the old joke:

Patient: It hurts when I do that.

Doctor: Don’t do that!

To avoid the pain I use screencasts instead of live demos. There are a variety of reasons for doing so. An obvious one is that it makes you immune to network glitches.

A subtler reason is that it’s hard to show software in use without wasting effort and motion. You reach for the wrong menu item, you fumble while typing. These are perfectly normal and natural behaviors, but they only add dead time to your presentation and therefore, by definition, they detract from it. When you edit out the wasted motion and false starts you create an effect that isn’t quite real — it’s hyperreal — but that’s exactly the effect that you want (or anyway, I want) a presentation to achieve.

Another subtler reason is that video playback gives you more control over timing. It can be hard or even impossible to replay a piece of a demo in response to an audience question. Likewise, it can be hard or impossible to fast-forward a demo if you’re running short on time, or if you’re losing the audience. When you’ve canned your demos as screencasts, you have a lot more flexibility.

Finally, there’s just the peace of mind that comes with only having to keep track of one single media file, as opposed to lots of moving parts. When you are speaking and showing demos, the fewer moving parts, the better.

As part of my re-exploration of the walled-garden social networks, I’ve accepted the entire batch of LinkedIn invitations that had queued up in my dormant account. One of them was a request from Beth Kanter for advice on screencasting. From my point of view, LinkedIn was superfluous in this case because the same request had already been made (implicitly) in this blog post in which Beth summarizes what she had figured out for herself, and then invites feedback.

Although we should probably not yet simply assume that linking to a blog post will draw the attention of the author of that post, the blogosphere does in fact propagate awareness in that way, and does so with remarkable speed and reliability. So in this case I’d seen Beth’s item before I began receiving requests from her via LinkedIn intermediaries. Because I was boycotting walled-garden social networks at the time, I thought this was a good opportunity to show how, in a case like this, the open Net can obviate the need for a closed network. So I replied to Beth’s blog item in a comment. Or rather, I thought I did. But although I wrote the reply it seems I never managed to post it. Oops. I’m sorry about that, Beth, and I’ll try to make up for it here.

But first, I want to note that your item is a textbook example of how to construct an online query for information. By summarizing what you’ve already learned, you’re helping bring other folks up to speed. At the same time, you’re helping me understand where and how I can add value. This custom is just good common sense, of course, but one that’s honored more often in breach than in the observance. If I were teaching this kind of thing in grade school, I’d use del.icio.us to keep lists of good examples of netiquette, and I’d put this example on one of those lists.

Now, in the context of the genre that I’ve called the conversational demo, here are your questions and my answers.

Q: How much scripting does he do prior to the interview?

A: None.

Q: Does he “rehearse” with his guest?

A: No. I do, of course, choose topics in which I’m interested, and to which I bring plenty of domain knowledge.

Q: Or does he capture everything and edit?

A: Yes. As with my podcasts, I lean heavily on editing when making these conversational screencasts. The editing happens on two levels: macro and micro. On the macro level, because we (interviewer and interviewee) know that whole scenes can be cut, we don’t need to worry about the performance. If something doesn’t work we can just call it a bad take and try again. We can also plan, on the fly, where to go next, again knowing that such discussion is effectively out of band and will be deleted.

On the micro level, there’s internal editing. The term comes from the audio domain and it applies in the same way here. If I can eliminate ums and you knows and false starts without compromising the video, I do.

Q: What tools does he use to capture these interviews?

A: For video I mostly use Camtasia in conjunction with one or another of the many screen-sharing tools. Since most software demos don’t require a high frame rate and don’t push lots of screen bits, that’s usually OK. However in this screencast about tagging in Photo Gallery — which, by the way, was edited down from 35 minutes to 14 minutes — the screen-sharing setup couldn’t keep pace with all the images. So I had Scott Dart record locally using Windows Media Encoder, and then ship me the resulting WMV file. I was able to follow along in screen-sharing well enough to carry on the conversation, even though what was displayed on my screen would have been useless for production. This is a variation of a technique that’s really useful for podcasters who are struggling with expensive and/or poor-quality phone lines. If you can get both parties to record high-quality audio locally, you can use a marginal VoIP setup to converse and then join up the high-quality audio later. I did that here and I’d love to do more shows that way.

For audio I also mostly use Camtasia. Originally I didn’t, I used Audacity, because I hadn’t figured out how to get Camtasia to record the two channels (caller, callee) from my Telos as a stereo track. Eventually I found that setting. It’s in Tools Options -> Streams -> Audio Setup.

Because the conversational screencast is a superset of a podcast, you’re dealing with all of the same audio production issues as in podcasting. For me, working remotely, that’s been an ongoing challenge. Telephone recording is just plain hard. Although I’ve been using a Telos for a while, for example, I only recently discovered that I’d been using it incorrectly. On the other hand, VoIP recording is hard too.

Granted, I wasn’t born with an audio chromosome, but then neither were most folks. So, remote audio is going to be a problem for most of us — a problem that, I reckon, somebody is going to make money by solving. At this point I can muddle through fairly well. But if I hadn’t already invested in the Telos I’d be looking really hard at the technique of recording locally on both ends and then joining the results in post-production. It’s not particularly hard to do that, and it’s really nice to simply abolish all the problems associated with the voice channel — whether it’s the public telephone system or the Internet.

Because it rarely applies to me, I haven’t mentioned the scenario in which both parties are together in the same place looking at the same computer. In that case I’d use whatever capture software was convenient. If the software were on the interviewee’s computer, I’d ask the interviewee to install the free Windows Media Encoder and capture video that way. And I’d probably use a standalone digital audio recorder with a handheld microphone to separately capture audio.

One final point from my recent conversation with Doug Kaye: a lot of people who think they don’t have digital audio recorders overlook the fact that they have camcorders which can perform that function. A related point: if you use a camcorder, it’s tempting to let it do the whole job — that is, screen capture as well as audio capture. Although my Channel 9 colleagues do that all the time, I don’t recommend it. You’d much rather use perfect screen capture than fuzzy camcorder capture. And ideally you’d like to be able to do that without installing any software on the target computer, using a direct-capture device. I’ve never seen one of those, but next week in Redmond I’ll be visiting our new production studio where I’m told we have such a beast. I’m curious to see it in action.

Q: Does he edit in Camtasia?

A: Yes, I do. I’d honestly rather edit in iMovie instead, because I find it to be more elegant and more capable, but it’s a huge hassle to get stuff in and out of iMovie so I usually take the path of least resistance and edit in Camtasia. If you want to do micro-edits in Camtasia, one important tip is to record at a higher frame rate than you will ultimately produce. A screencast is legible at 5 or even fewer frames per second. But if you only capture at that rate, you’ll find that you can’t make intra-frame audio micro-edits. So record at 15 or more frames per second, then produce at a lower rate.

Q: What are some best practices in terms of production and editing?

A: It’s tempting to jump in and start editing right away, and to be honest I often do. But I think it’s better to just watch the raw recording all the way through, setting markers along the way to annotate the segments that you want to include, discard, or perhaps rearrange. Ultimately you’re trying to tell a story, and those markers will help you visualize the outline of the story.

Sorry this took so long, Beth. I hope it helps.

For several of my screencasts I used an unusual method which I mentioned here. I made my camcorder be the computer’s display, and dubbed the output to tape1. My reasons were twofold. First, I wanted to capture a lot of raw footage without having to wait for the captured data to get written to a file, which can be slow. Second, I wanted to be able to edit in iMovie. Although I have Camtasia and use it often, I reach for iMovie when I need precise frame-by-frame control, and when I’m laying down audio narration in a precise way. Camtasia isn’t good at those things, and neither is Windows Movie Maker. I’ve tried Adobe Premier but it does way more than I need and the learning curve intimidated me. (It also ain’t cheap.) If there is a basic Windows movie editor that meets my requirements, I’d love to hear about it, and so would my screencasting colleagues at MSDN Channel 9. Meanwhile I’ll continue to reach for iMovie. But moving files from a Windows-based capture tool over to iMovie on the Mac, and then back to Windows where I continue to rely on Camtasia for final production, is a huge hassle. Hence the notion of using the camcorder as a bridge between the two worlds.

For the screencasts mentioned above, I connected my Mac to the camcorder with an S-Video cable, detected the camcorder as a display, and captured at 720×480. It’s a challenge to arrange a presentation in that small rectangle, but — particularly when you’re demonstrating a single application window — it can be done.

Today when I updated the Vista video driver for my Compaq nc8340, which has an ATI Mobility Radeon X1600, I repeated the experiment in Vista. This 20-second screencast shows the results for two different capture resolutions: 1024×768 and 800×600. (With this Windows-based setup, talking to the same camcorder, 720×480 doesn’t seem to be an option.) Both captures get squashed down to the standard digital video resolution of 720×480, and neither is crystal clear, but I think both are usable, though you should judge for yourself. I’d lean toward the 800×600 resolution which I’ve found to be ideal for two reasons. First, it minimizes the amount of video data you have to ship over the wire to your viewers, and that still matters. Second, it forces the demo to focus on where the action is, rather than displaying the full panoply of the modern GUI which can often be overwhelming.


1 One of my goals in writing that post was to assure that a future search for ‘udell pv-gs400 s-video’ would find the reminder to myself, embedded in that post, about how to dub to tape. And now, sure enough, it does.

Yesterday’s screencast turned out to be a nice example of how the screencasting medium can communicate what otherwise cannot be explained easily, if at all. Here’s the kind of reaction you hope a screencast will elicit:

I checked out the Photo Gallery earlier, but didn’t see the added value. Now I do.

It’s hard to quantify the impact of a timely and well-produced screencast, but my gut tells me that Simon Willison’s outstanding effort, How to use OpenID, has more than a little to do with the momentum now building around OpenID.

I’ve written before about how to make screencasts that communicate effectively, and I’ll be updating those observations from time to time because it’s an evolving story.

One of my goals is to help folks inside Microsoft use this medium more effectively. Another is to help everyone else do so, because there’s a major obstacle in the way of my vision of the future of software and networks: Much of the value and capability of this stuff is unappreciated by most people.

In trying to understand why, I’ve settled on what I call the “ape with a termite stick” argument. If you’ve heard it before, skip ahead. If not, it goes like this. People learn to use tools by watching how other people use them, and imitating what they see. Observation is the key. Suppose apes had language, and the discoverer of the termite stick could explain to the tribe:

“So, you find a stick about yea long, and strip off the bark so it’s sticky, and poke it into the hole, and presto, it comes up bristling with yummy ants.”

Some of the other apes might get it, but most of them wouldn’t. On the other hand, any ape who could observe this technique would get it immediately, and never forget it.

Given all the network connectivity that we have nowadays, it’s perhaps surprising — but nevertheless true — that we have few opportunities to directly observe how other people, who are proficient users of software tools, do what they do. Screencasts are the best way I’ve found to make such tool use observable, and thus learnable.

Enough theory. When you get down to brass tacks and try to capture those “aha” moments, it’s easier said than done for a bunch of reasons. In the case of this particular screencast, I just want to point out three things.

Focus.

I always ask presenters to size the application window (or windows) to something like 800 by 600. That’s partly to minimize the quantity of video that has to be delivered, which continues to matter because broadband isn’t yet where it needs to be. But equally, it’s a way to focus on the real action. In the case of the Photo Gallery screencast, for example, I cropped away the window chrome because nothing was going on there. It’s a subtle and subliminal thing but, when you eliminate the uninteresting and uninformative, the interesting and informative aspects of what remains will emerge more clearly.

With some screencasting tools, including the one I mostly use, Camtasia, it’s also possible to also pan and zoom in order to focus even more precisely. I haven’t used that feature, yet, because I’m usually pressed for time and the basic kinds of editing that I do are already time-consuming. But I do want to add this technique to my repertoire, and use it in selective and appropriate ways.

Editing is crucial. The raw capture for yesterday’s screencast was 30 minutes. It included some false starts, some extraneous material, and a fair bit of verbal stuttering on the part of both Scott and myself. When we finished the capture, I wasn’t sure we even had anything that would be usable. But as I trimmed away the clutter, a reasonably clear storyline emerged.

Even the 14-minute version will, of course, be too long for many people. One solution would be to divide the material into chapters. But since none of those would work well standalone, a better solution might be to make an elevator-pitch version that tells the same story in just 3 to 5 minutes. I’d want that version to complement the 14-minute version, though, not replace it.

Interactivity.

Almost all the screencasts that I’ve seen, and many that I’ve made, are solo efforts. But I also love to do interview-style screencasts, and the Photo Gallery screencast is an example of that genre. When it works well, as I think it did in this case, the interaction between the interviewer and the presenter can help the presenter — who in some ways knows the subject too well — recognize what’s not obvious to viewers and adapt accordingly.

As an aside, I should mention that although we made this screencast remotely — Scott was in Redmond and I was in my home office in New Hampshire — we used a technique that was new for me. Normally I record screens projected to my computer using a screensharing application. In this case, because of all the images in the presentation, that didn’t work well. The projection couldn’t keep up. So I had Scott record his screen on his end, while I recorded the audio on my end. It worked great. I was able to follow the visual action well enough on my end, Scott captured a high-quality video which he later posted for me to download, and it was straightforward for me to marry up his video track with my audio track.

Show, don’t tell.

The “aha” moment, if there is one, speaks for itself. When the ape can see that termite stick bristling with ants, there is no need for someone to say: “This is a really cool benefit.” It’s just obvious.

In our session, Scott was actually quite restrained. But there were a few places where he made editorial comments like “this is really convenient” or “this is a great benefit”. I took them out. If I could give only one piece of advice to technical marketers everywhere, it would be this: Show me, don’t tell me.


Today’s 4-minute screencast, which explores Vista’s common feed system, serves multiple purposes. First, I wanted to familiarize myself with this stuff, and do so in a way that would elicit responses that help me understand how other folks are reacting to it. I am intensely interested in the reasons why people do or don’t take to the notion of reading RSS feeds. Mostly, as we know, they haven’t.

The assumption is that surfacing the concepts more prominently in the OS will help, and I think that’s true, but there’s a lot going on here. For example, even just explaining to people how feeds are like-but-unlike email is a huge challenge. When you start from the perspective of reading feeds versus reading email, it’s hard to see the difference. One key distinction — that feeds are by-invitation-only and can be easily and effectively shut down, versus email which is uninvited and can be very hard to deflect — is fairly abstract and hasn’t sunk in yet for most people.

When you start from the perspective of writing feeds versus writing email, the differences, and the benefits that flow from those differences, are even more compelling — at least to me. But the reasons why are even more abstract: manufactured serendipity, maximization of scope, awareness networking. How might Vista, or any desktop operating system, help surface these concepts?

I also made this screencast to find out what it’s like to make screencasts of Vista. I haven’t yet installed Camtasia on my newly-acquired Vaio laptop, because I want to repave that machine with a final version of Vista that I don’t have yet. But no worries, there’s always good old Windows Media Encoder. I’ve always said it’s an underappreciated jewel, and evidently that’s still true as it is not inclulded in Vista.

After capturing with Windows Media Encoder I transferred the file to my XP box for editing in Camtasia. As always, the process reminded me of Pascal’s famous quote: “If I had time, I would write a shorter letter.” Boiling a screencast down to its essence is really hard. One of the biggest challenges is meshing the video footage with the audio narration. I want to produce a series of screencasts that illustrate this process, but I’m not sure how best to separate out the kinds of general principles I outlined here from details of specific applications and delivery formats.

A couple of final points about the RSS features shown in the screencast. It shows how to acquire feeds one at a time into the common pool using IE, and how to acquire batches of feeds into Outlook by importing an OPML file, but there’s no obvious way to load a batch from OPML into the common pool. I know I could write that app, but is there one lying around somewhere that I’ve missed? Also, how do you batch-delete feeds from Outlook once you’ve acquired them via OPML?

In honor of my first get-together with the MSDN Channel 9 and 10 folks later today, I thought I’d do a spot of media hacking in support of the cause. One of the things that caught my eye recently was Brian Jones’ screencast on data/view separation in Word 2007. It’s published as a SWF (Shockwave Flash) movie and, like other SWF files on Channel 9, it’s delivered into the browser directly, without a controls wrapper. So there’s no way to see the length of the screencast, or pause it, or scroll around in it, or — as I was inclined to do — refer to a segment within the screencast.

I figured it would be a snap to grab the controller that Camtasia Studio emits and tweak its configuration file to point to Brian’s screencast. But that seemingly simple hack turned into a merry chase. It turns out that the Camtasia controller isn’t entirely generic. It embeds (at least) the width and height of the controlled video. I could use Camtasia to create a new controller, but I don’t have that software here with me, and in any case it seems like there should be a way to override those values.

First, though, I took a step back and spent some time looking for a generic SWF component to play back SWFs. For FLV (Flash video) files, I’ve made great use of flvplayer.swf 1. It’s a nice simple widget that does just the one thing I want: it accepts the address of an FLV file as a parameter, and it plays that file. There has to be an analogous swfplayer.swf, right? Well, I looked hard and didn’t find it, maybe someone can enlighten us on that score.

Circling back to the Camtasia controller, I asked myself another question. There has to be an easy way to not only display, but also edit, the header of a SWF file, right? Again, I looked hard and came up empty handed. Now it was a challenge, so I dug into things like SWF::File and Flasm, tools for picking apart and reassembling SWF files. Neither quite did the trick. Then I remembered a tip from Rich Kilmer about Kinetic Fusion, a Java toolkit for roundtripping between SWF and XML. Using it, I was able to convert the SWF to XML, alter the width/height values, and recreate the SWF 2.

I know, I know, this is crazy, there has to be a better way, and I hope someone will enlighten me. But in any case, I finally did succeed, sort of. Here’s a controllable version of Brian’s screencast:

s3.amazonaws.com/jon/9/DataViewSeparation.html

One further complication: I’d hoped to publish only the modified controller and configuration file, leaving the screencast in situ on Channel 9. But the cross-domain nature of that arrangement seems to rule it out. So I wound up rehosting the video on the same server as the controller and configuration file. In this case, just to keep things interesting, that server happens to be my Amazon S3 account.

Anyway, if you’ve made it this far, I can now refer you to the segment of that screencast. At 6:45 (of 9:53), Brian shows how to swap out one batch of XML data associated with a Word document and swap in another. I’ll say more about why I found that interesting in another post. Meanwhile, I’ll be pondering how one of my perennial interests — URL-adddressable and randomly accessible rich media — can help expose more of the considerable value that’s contained in the Channel 9 screencasts.


1. If you’re a Flash developer, it’s trivial to whip up your own playback control. But it’s non-trivial for regular folks who just want to embed videos in HTML pages. These folks find themselves rooting around on the net for components that should be way easier to find and use.

2. If you try this on the Camtasia controller, note that the decompiled XML won’t immediately recompile. The generated ActionScript contains a handful of references to this:componentName that instead should be this.componentName.

Sean McCown is a professional database administrator who writes the Database Underground blog for InfoWorld. Lately his postings have been full of references to videos. One day, he watched a Sysinternals training flick, combining live video with screencasting, and made immediate use of it to pinpoint and fix a problem. Another day, he made his own training screencast:

I sat down last night and made a video of the restore procedure for one of our ETL processes. It was 10mins long, and it explained everything someone would need to know to recover the process from a crash. [Database Underground: Not just a DR plan anymore]

Screencasting is poised to become a routine tool of business communication, but there are still a few hurdles to overcome. For starters, video capture isn’t as accessible as it ought to be. Second Life gets it right: there’s always a camera available, and you can turn it on at any time. Every desktop OS should work like that.

Meanwhile, I’ll reiterate some advice: Camtasia is an excellent tool for capturing screen video on Windows, but its $300 price tag covers a lot of editing and production features that you may never use if you’re capturing in stream-of-consciousness mode for purposes of documentation. In that case, the free Windows Media Encoder is perfectly adequate.

On the Mac I’d been using Snapz Pro X for short flicks, but it takes forever to save long sessions. Next time I do a long-form Mac screencast I’ll try iShowU. That’s what Peter Wayner used for his AJAX screencasts. Peter says that iShowU saves instantly. I tried the demo, and it does.

Finally, there’s the odd hack I tried here: I used the camera’s display as the Mac’s screen, and captured to tape. If the 720×480 format is appropriate for your subject — and when the focus is a single application window, it can be — this is a nice way to collect a lot of raw material without chewing up a ton of disk space.

Capture mechanics aside, I think the bigger impediment is mindset. To do what Sean did — that is, narrate and show an internal process, for internal consumption — you have to overcome the same natural reticence that makes dictation such an awkward process for those of us who haven’t formerly incorporated it into our work style. You also have to overcome the notion, which we unconsciously absorb from our entertainment-oriented culture, that video is a form of entertainment. It can be. Depending on the producer, a screencast documenting a disaster recovery scenario could be side-splittingly funny. And if the humor didn’t compromise the message, a funny version would be much more effective than a dry recitation. But even a dry recitation is way, way better than what’s typically available: nothing.

Mike Champion raises an interesting point that applies to Microsoft but also more broadly:

The culture at MS is very F2F-oriented…if you’re out of sight, you have to work hard not to be out of mind.

But then he adds:

Geographic distance will help keep you from getting sucked into the groupthink of whatever group you’re in. Microsoft collectively needs to be constantly reminded what the world looks like to people whose view isn’t fogged up by our typical drizzle or distracted by the scenery on the sunny days.

We’re entering an era in which our personal, social, and professional lives are increasingly network-mediated. Trust-at-a-distance is a new possibility, with economic ramifications that everyone from Yochai Benkler to Jim Russell is trying to figure out. As someone who’s worked remotely for 8 years, and is about to work remotely for a company with relatively few remote employees, this question is extremely interesting to me.

On the one hand, I’ve learned that I can accomplish a lot because I spend an abormal percentage of my waking hours in flow rather than in meetings. I’ve also learned that network-mediated interactions can be more productive than F2F interactions. Consider my August screencast with Jim Hugunin, or my May screencast with Anders Hejlsberg, or indeed any of the other screencasts in that series. They’re all scheduled events, mediated by telephone and screensharing. I can’t see how physical colocation would improve them.

On the other hand, there’s the “watercooler” effect: being in a place, you see and hear and smell things that aren’t otherwise transmitted through the network. I have no doubt whatsoever that shared physical space matters in ways we can’t begin to describe or understand.

But as collaboration in shared virtual space takes its rightful place alongside collaboration in shared physical space, shouldn’t a company whose products are key enablers of virtual collaboration be eating its own dogfood?

Of course things are never as black-and-white as they appear. So I’m going to bookmark this posting and return to it in six months. Hopefully by then I’ll know more about the value of being here and of being there.


Jon Udell is an author, information architect, software developer, and new media innovator. His 1999 book, Practical Internet Groupware, helped lay the foundation for what we now call social software. Udell was formerly a software developer at Lotus, BYTE Magazine’s executive editor and Web maven, and an independent consultant.

A hands-on thinker, Udell’s analysis of industry trends has always been informed by his own ongoing experiments with software, information architecture, and new media.

From 2002 to 2006 he was InfoWorld’s lead analyst, author of the weekly
Strategic Developer column, and blogger-in-chief. During his InfoWorld tenure he also produced a series of screencasts and an audio show that continues as Interviews with Innovators on the Conversations Network.

In January 2007, Udell joined Microsoft as a writer, speaker, and producer of another series of interviews: Perspectives. This show features projects in which Microsoft works with partners — universities, governments, NGOs — to develop new and socially impactful uses of its technology portfolio.

He is currently working on the elmcity project, a service running on Microsoft’s Azure platform that enables curators to aggregate and syndicate calendar information for their communities.