Screencasting tips

Yesterday’s screencast turned out to be a nice example of how the screencasting medium can communicate what otherwise cannot be explained easily, if at all. Here’s the kind of reaction you hope a screencast will elicit:

I checked out the Photo Gallery earlier, but didn’t see the added value. Now I do.

It’s hard to quantify the impact of a timely and well-produced screencast, but my gut tells me that Simon Willison’s outstanding effort, How to use OpenID, has more than a little to do with the momentum now building around OpenID.

I’ve written before about how to make screencasts that communicate effectively, and I’ll be updating those observations from time to time because it’s an evolving story.

One of my goals is to help folks inside Microsoft use this medium more effectively. Another is to help everyone else do so, because there’s a major obstacle in the way of my vision of the future of software and networks: Much of the value and capability of this stuff is unappreciated by most people.

In trying to understand why, I’ve settled on what I call the “ape with a termite stick” argument. If you’ve heard it before, skip ahead. If not, it goes like this. People learn to use tools by watching how other people use them, and imitating what they see. Observation is the key. Suppose apes had language, and the discoverer of the termite stick could explain to the tribe:

“So, you find a stick about yea long, and strip off the bark so it’s sticky, and poke it into the hole, and presto, it comes up bristling with yummy ants.”

Some of the other apes might get it, but most of them wouldn’t. On the other hand, any ape who could observe this technique would get it immediately, and never forget it.

Given all the network connectivity that we have nowadays, it’s perhaps surprising — but nevertheless true — that we have few opportunities to directly observe how other people, who are proficient users of software tools, do what they do. Screencasts are the best way I’ve found to make such tool use observable, and thus learnable.

Enough theory. When you get down to brass tacks and try to capture those “aha” moments, it’s easier said than done for a bunch of reasons. In the case of this particular screencast, I just want to point out three things.


I always ask presenters to size the application window (or windows) to something like 800 by 600. That’s partly to minimize the quantity of video that has to be delivered, which continues to matter because broadband isn’t yet where it needs to be. But equally, it’s a way to focus on the real action. In the case of the Photo Gallery screencast, for example, I cropped away the window chrome because nothing was going on there. It’s a subtle and subliminal thing but, when you eliminate the uninteresting and uninformative, the interesting and informative aspects of what remains will emerge more clearly.

With some screencasting tools, including the one I mostly use, Camtasia, it’s also possible to also pan and zoom in order to focus even more precisely. I haven’t used that feature, yet, because I’m usually pressed for time and the basic kinds of editing that I do are already time-consuming. But I do want to add this technique to my repertoire, and use it in selective and appropriate ways.

Editing is crucial. The raw capture for yesterday’s screencast was 30 minutes. It included some false starts, some extraneous material, and a fair bit of verbal stuttering on the part of both Scott and myself. When we finished the capture, I wasn’t sure we even had anything that would be usable. But as I trimmed away the clutter, a reasonably clear storyline emerged.

Even the 14-minute version will, of course, be too long for many people. One solution would be to divide the material into chapters. But since none of those would work well standalone, a better solution might be to make an elevator-pitch version that tells the same story in just 3 to 5 minutes. I’d want that version to complement the 14-minute version, though, not replace it.


Almost all the screencasts that I’ve seen, and many that I’ve made, are solo efforts. But I also love to do interview-style screencasts, and the Photo Gallery screencast is an example of that genre. When it works well, as I think it did in this case, the interaction between the interviewer and the presenter can help the presenter — who in some ways knows the subject too well — recognize what’s not obvious to viewers and adapt accordingly.

As an aside, I should mention that although we made this screencast remotely — Scott was in Redmond and I was in my home office in New Hampshire — we used a technique that was new for me. Normally I record screens projected to my computer using a screensharing application. In this case, because of all the images in the presentation, that didn’t work well. The projection couldn’t keep up. So I had Scott record his screen on his end, while I recorded the audio on my end. It worked great. I was able to follow the visual action well enough on my end, Scott captured a high-quality video which he later posted for me to download, and it was straightforward for me to marry up his video track with my audio track.

Show, don’t tell.

The “aha” moment, if there is one, speaks for itself. When the ape can see that termite stick bristling with ants, there is no need for someone to say: “This is a really cool benefit.” It’s just obvious.

In our session, Scott was actually quite restrained. But there were a few places where he made editorial comments like “this is really convenient” or “this is a great benefit”. I took them out. If I could give only one piece of advice to technical marketers everywhere, it would be this: Show me, don’t tell me.

Tagging and foldering in Photo Gallery

In this 14-minute screencast I interview Scott Dart, who blogs here, about how tagging works in Vista’s Photo Gallery. I wanted to look over Scott’s shoulder, rather than do this myself with my own photos, because Scott’s been managing a lot of photos in this app for a long time, and he’s in a position to reflect on the evolution of his tag vocabulary.

The metadata storage strategies discussed here lately are just plumbing. What you see in this screencast is the payoff: An application that will be, for many people, the first experience of a style of personal information management that relies on tagging and search as much as, or more than, on folders and navigation.

Conventional wisdom was that people could never be bothered to invest effort in tagging their stuff. What and then Flickr and then a host of other web applications showed is that people will invest that effort if the activation threshold is low and the reward is immediate. On the web, the rewards are both personal (I can more easily find my photos) and broadly social (I can interact not only with friends and family but with like-minded photographers everywhere). On the desktop, the rewards will mainly be personal and more narrowly social (friends and family), though if photos can bring their tags with them when they travel to the cloud, the broader social rewards become available too.

One of the fascinating threads in this screencast is the interplay between foldering and tagging. In principle you don’t need a folder hierarchy rooted in the file system, and doing away with it entirely would reduce the concept count. In practice that’s not yet possible, if only because cameras don’t produce endless streams of uniquely-identified files. When DSCF0004.JPG rolls around again, you have to put it into a different file-system folder than the last time.

It’s too bad, really, because those file-system folders serve little other purpose. They’re conceptual clutter that obstructs your view of tagging, and of tag-oriented search and navigation, which is where all the action really wants to be.

A further complication is that, unlike most of the popular tag systems on the web, tagging in Photo Gallery is hierarchical. You don’t have to use it that way, you could keep a flat list of tags, but the system invites you to nest your tags in a way that seems folderish but that has a magical property. The same thing — not a copy of the thing — can be in two or more places at once.

It’ll be fascinating to observe what people make of this. For example, that magical same-thing-in-two-places property may seem less magical to the majority of folks who don’t know what I know about directory structures on disks. I experience cognitive dissonance when I see a “real” file-system hierarchy and a “virtual” tag hierarchy living in the same navigational tree. But somebody who never had a strong sense of the difference between those two modes might not be bothered at all.

Are people actually using tags to organize and search for their photos? According to Scott, data from the opt-in software quality metrics (SQM) feature — which relays anonymized usage data to product teams for analysis — says that they are.

How private tag vocabularies develop, and what happens when they intersect with the web, are two processes that I’d love to be able to study over time. That raises an interesting question. Can I access that SQM usage data myself? Could groups of willing participants pool their data and do independent analyses of it? It’s our data, there’s no reason why not. Does anyone know how?

Who’s got the tag? Database truth versus file truth, part 3

I’ve recently been exploring the implications of the following mantra:

The truth is in the file.

In this context it refers to a strategy for managing metadata (e.g., tags) primarily in digital files (e.g., JPEG images, Word documents) and only secondarily in a database derived from those files.

Commenting on an entry that explores how Vista uses this technique for photo tags, Brian Dorsey throws down a warning flag:

Many applications are guilty of changing JPEGs [ed: RAW file, not JPEGs, are the issue, see below] behind the scenes and there is nothing forcing them to do it in compatible ways. Here is a recent example with Vista.

A cautionary tale, indeed. This is the kind of subject that doesn’t necessarily yield right and wrong answers. But we can at least put the various options on the table and discuss them.

There is an interesting comparison to be made, for example, between OS X and Vista. While researching this topic I found this Lifehacker article on a feature of OS X that I completely missed. You can tag a file in the GetInfo dialog, and when you do, the file will be instantly findable (by that tag) in SpotLight.

My purpose here is not to discuss or debate the OS X and Vista interfaces for tagging files and searching for tagged files. I do however want to explore the implications of two different strategies: “the truth is in the file” versus “the truth is in the database”.

In Vista, if I tag yellowflower.jpg with iris, that tag lives primarily in the file yellowflower.jpg and secondarily in a database. An advantage is that if I transfer that file to another operating system, or to a cloud-based service like Flickr, the effort I’ve invested in tagging that file is (or anyway can be) preserved. A disadvantage, as Brian points out, is that when different applications try to manage data that’s living inside JPEG files, my investment in tagging can be lost.

Conversely, if I tag yellowflower.jpg with iris in OS X, yellowflower.jpg is untouched, the tag only lives in Spotlight’s database. If I transfer the file elsewhere, my investment in tagging is lost. But on my own system, my tags are less vulnerable to corruption.

Arguably these are both valid strategies. The Vista way optimizes for cross-system interoperability and collaboration, while the OS X way optimizes for single-system consistency. Of course as always we’d really like to have the best of both worlds. Can we?

It’s a tough problem. Vista tries to help with consistency by offering APIs in the .NET Framework for manipulating photo metadata. But those APIs don’t yet cover all the image formats, and even if they did, there’s nothing to prevent developers from going around them and writing straight to the files.

For its part, OS X offers APIs for querying the Spotlight database. So an application that wanted to marry up images and their metadata could do so, but there’s no guarantee that a backup application or a Flickr uploader would do so.

It’s an interesting conundrum. Because I am mindful of the lively discussion over at Scoble’s place about what matters to people in the real world, though, I don’t want to leave this in the realm of technical arcana. There are real risks and benefits associated with each of these strategies. And while it’s true that people want things to Just Work, that means different things to different people.

If you’re an avid Flickr user, if you invest effort tagging photos in OS X, and if that effort is lost when you upload to Flickr, then OS X did not Just Work for you. Conversely if you don’t care about online photo sharing, if you invest effort tagging photos in Vista, and then another application corrupts your tags, then Vista did not Just Work for you.

I think many people would understand that explanation. In principle, both operating systems could frame the issue in exactly those terms, and could even offer a choice of strategy based on your preferred workstyle. In practice that’s problematic because people don’t really want choice, they want things to Just Work, and they’d like technology to divine what Just Work means to them, which it can’t. It’s also problematic because framing the choice requires a frank assessment of both risks and benefits, and no vendor wants to talk about risks.

I guess that in the end, both systems are going to have to bite the bullet and figure out how to Just Work for everybody.

Blogging from Word 2007, crossing the chasm

The other day I wrote:

…as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, I’m obviously all about metadata coexisting with human-readable HTML.

Operating in that mode for years has given me a deep understanding of how documents, and collections of documents, are also databases. It has led me to imagine and prototype a way of working with documents that’s deeply informed by that duality. But none of this is apparent to most people and, if it requires them to write semantic CSS tags in XHTML using emacs, it never will become apparent.

So it’s time to cross the chasm and find out how to make these effects happen for people in editors that they actually use. Here’s how I’m writing this entry:

This is the display you get when you connect Word 2007 to a blog publishing system, in my case WordPress, and when you use the technique shown in this screencast to minimize the ribbon.

Here’s a summary of the tradeoffs between my homegrown approach and the Word-to-WordPress system I’m using here:




My homegrown approach

  • Can use any text editor
  • Source is inherently web-ready
  • Easy to add create and use new semantic features
  • Low barrier to XML processing
  • Only for geeks

Word 2007

  • A powerful editor that anyone can use
  • Source is not inherently web-ready
  • Harder to create and use new semantic features
  • Higher barrier to XML processing

These are two extreme ends of a continuum, to be sure, but there aren’t many points in between. For example, I claim that if I substitute OpenOffice Writer for Word 2007 in the above chart, nothing changes. So I’m going to try to find a middle ground between the extremes.

To that end, I’m developing some Python code to help me wrangle Word’s default .docx format, which is a zip file containing the document in WordML and a bunch of other stuff. At the end of this entry you can see what I’ve got so far. I’m using this code to explore what kind of XML I can inject programmatically into a Word 2007 document, what kind comes back after a round trip through the application, how that XML relates to the HTML that gets published to WordPress, and which of these representations will be the canonical one that I’ll want to store and process.

So far my conclusion is that none of these representations will be the canonical one, and that I’ll need to find (or more likely create) a transform to and from the canonical representation where I’ll store and process all my stuff. We’ll see how it goes.

Meanwhile here’s one immediately useful result. The tagDocx method shown below parallels the picture-tagging example I showed last week. Here, the truth is also in the file. When you use the Vista explorer to tag a Word 2007 file, the tag gets shoved into one of XML subdocuments stored inside the document. But any application can read and write the tag. Watch.


Run this code:

$ python

import wordxml

wordxml.tagDocx(‘Blogging from Word2007.docx’,’word2007 blogging tagging’)



Here’s why this might matter to me. In my current workflow, I manage my blog entries in an XML database (really just a file). I extract the tags from that XML and inject them into That enables great things to happen. I can explore my own stuff in a tag-oriented way. And I can exploit the social dimension of to see how my stuff relates to other people’s stuff.

But in the truth is not in the file, it’s in a database that asserts things about the file — its location on the web, its tags. If I revise my tag vocabulary in, the new vocabulary will be out of synch with what’s in my XML archive. So I have to do those revisions in my archive. I can, and I do, but it’s all programmatic work, there’s no user interface to assist me.

What I’m discovering about Vista and the Office apps is that they offer a nice combination of programmatic and user interfaces for doing these kinds of things. This blog entry uses three photos, for example. It’s easy for me to assign them the same tags I’m assigning this entry. If I do, I can interactively search for both the entry and the photos in the Vista shell. And I can build an alternate interface that runs that same search on the web and correlates results to published blog entries.

That’s still not the endgame. At heart I’m a citizen of the cloud, and I don’t want any dependencies on local applications or local storage. Clearly Vista and Office entail such dependencies. But they can also cooperate with the cloud and, over time, will do so in deeper and more sophisticated ways. It’s my ambition to do everything I can to improve that cooperation.

Note: There will be formatting problems in this HTML rendering which, for now, painful though it is, I am not going to try to fix by hacking around in the WordPress editor. There are a lot of moving parts here: Word, WordPress, the editor embedded in WordPress (which itself has a raw mode, a visual mode, and a secret/advanced visual mode). I haven’t sorted all this out yet, and I’m not sure I can. (Formatting source code. Why is that always the toothache?)

Anyway, if you want to follow along, I’ve posted the original .docx version of this file here.

Here’s which was imported in the above example. Note that this is CPython, not IronPython. That’s because I’m relying here on CPython’s zipfile module, which in turn relies on a compiled DLL.

import zipfile, re


def readDocx(docx):

inarc = zipfile.ZipFile(docx,’r’)

names = inarc.namelist()

dict = {}

for name in names:

dict[name] =


print dict.keys()

return dict


def readDocumentFromDocx(docx):

arc = zipfile.ZipFile(docx,’r’)

s =‘word/document.xml’)

f = open(‘document.xml’,’w’)



return s


def updateDocumentInDocx(docx,doc):

dict = readDocx(docx)

archive = zipfile.ZipFile(docx,’w’)

for name in dict.keys():

if (name == ‘word/document.xml’):

dict[name] = doc




def tagDocx(docx,tags):

dict = readDocx(docx)

archive = zipfile.ZipFile(docx,’w’)

for name in dict.keys():

if (name == ‘docProps/core.xml’):

dict[name] = re.sub(‘<cp:keywords>(.*)</cp:keywords>’,'<cp:keywords>%s</cp:keywords>’ %

tags, dict[name])





A conversation with Dan Chudnov about OpenURL, context-sensitive linking, and digital archiving

Today’s podcast with Dan Chudnov is a sequel to my earlier podcast with Tony Hammond about the Nature Publishing Group’s use of digital object identifiers. I invited Dan to discuss related topics including the OpenURL standard for context-sensitive linking.

I’m not the only one who’s had a hard time understanding how these technologies relate to one another and to the web. See, for example, Dorothea Salo’s rant I hate library standards, also Dan’s own recent essay Rethinking OpenURL.

I have ventured into this confusing landscape because I think that the issues that libraries and academic publishers are wrestling with — persistent long-term storage, permanent URLs, reliable citation indexing and analysis — are ones that will matter to many businesses and individuals. As we project our corporate, professional, and personal identities onto the web, we’ll start to see that the long-term stability of those projections is valuable and worth paying for.

Recently, for example, Dave Winer — who’s been exploring Amazon’s S3 — wrote:

I have an idea of making a proposal to Amazon to pay it a onetime fee for hosting the content for perpetuity, that way I can remove a concern for my heirs, and feel that my writing may survive me, something I’d like to assure.

Beyond long-term storage of bits, there’s a whole cluster of related services that we’re coming to depend on, but that flow from relationships that are transient. When I moved this blog from to, for example, InfoWorld very graciously redirected the RSS feed, but another organization might not have done so. I could have finessed that issue by using FeedBurner, but I wasn’t — and honestly, still am not — ready to make a long-term bet on that service.

For most people today, digital archiving and web publishing services are provided to you by your school, by your employer, or — increasingly — by some entity on the web. When your life circumstances change, it’s often necessary or desirable to change your provider, but it’s rarely easy to do that, and almost never possible to do it without loss of continuity.

There are no absolute guarantees, of course, but a relatively strong assurance of continuity is something that more and more folks will be ready to pay for. Amazon is on the short list of organizations in a position to make such assurances. So, obviously, is Microsoft. Will Microsoft’s existing and future online services move in that direction? I hope so. Among other things, it’s a business model that doesn’t depend on advertising, and that would be a refreshing change.

XMP and microformats revisited

Yesterday I exercised poetic license when I suggested that Adobe’s Extensible metadata platform (XMP) was not only the spiritual cousin of microformats like hCalendar but also, perhaps, more likely to see widespread use in the near term. My poetic license was revoked, though, in a couple of comments:

Mike Linksvayer: How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.

Danny Ayers: Like Mike I don’t really understand Jon’s references to microformats – I first assumed he meant XMP could be replaced with a uF.

Actually, I’m serious about this. If I step back and ask myself what are the essential qualities of a microformat, it’s a short list:

  1. A small chunk of machine-readable metadata,
  2. embedded in a document.

Mike notes:

XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.

Yes, I understand. And as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, I’m obviously all about metadata coexisting with human-readable HTML. And I’ve been applying this technique since long before I ever heard the term microformats — my own term was originally microcontent.

But some things that have mattered to me in my ivory tower, like “colocation of metadata with human-visible HTML,” matter to almost nobody else. In the real world, people have been waiting — still are waiting — for widespread deployment of the tools that will enable them to embed chunks of metadata in documents, work with that metadata in-place, and exchange it.

We’ll get there, I hope and pray. But when we finally do, how different are these two scenarios, really?

  1. I use an interactive editor to create the chunk of metadata I embed in a blog posting.
  2. I use an interactive editor to create the chunk of metadata I embed in a photo.

Now there is, as Mike points out, a big philosophical difference between XMP, which aims for arbitrary extensibility, and fixed-function microformats that target specific things like calendar events. But in practice, from the programmer’s perspective, here’s what I observe.

Hand me an HTML document containing a microformat instance and I will cast about in search of tools to parse it, find a variety of ones that sort of work, and then wrestle with the details.

Hand me an image file containing an XMP fragment and, lo and behold, it’s the same story!

In both of these cases, there either will or won’t be enough use of these formats to kickstart the kind of virtuous cycle where production of the formats gets reasonably well normalized. In the ivory tower we pretend that the formats matter above all, and we argue endlessly about them. Personally I’d rather see what I’d consider to be a simpler and cleaner XMP. Others will doubtless argue that XMP doesn’t go far enough in its embrace of semantic web standards. But when we have that argument we are missing the point. What matters is use. This method of embedding metadata in photos is going to be used a whole lot, and in ways that are very like how I’ve been imagining microformats would be used.

PS: As per for this comment, Scott Dart informs me that PNG (and to a lesser extent GIF) can embed arbitrary metadata, but that support for those embeddings regrettably didn’t make the cut in .NET Framework 3.0.

Truth, files, microformats, and XMP

In 2005 I noted the following two definitions of truth:

1. WinFS architect Quentin Clark: “We [i.e. the WinFS database] are the truth.”

2. Indigo architect Don Box: “Message, oh Message / The Truth Is On The Wire / There Is Nothing Else”

Today I’m adding a third definition:

3. Scott Dart, program manager for the Vista Photo Gallery: “The truth is in the file.”

What Scott means is that although image metadata is cached in a database, so that Photo Gallery can search and organize quickly, the canonical location for metadata, including tags, is the file itself. As a result, when you use Photo Gallery to tag your images, you’re making an investment in the image files themselves. If you copy those files to another machine, or upload them to the Net, the tags will travel with those image files. Other applications will be able to make them visible and editable, and those edits can flow back to your local store if you transfer the files back.

That’s huge. It’s also, of course, a bit more complicated. As Scott explains, there are different flavors of metadata: EXIF, IPTC, and the new favorite, XMP. And not all image formats can embed image metadata. In fact many popular formats can’t, including PNG, GIF, and BMP. [Update: Incorrect, see next rock.] But JPG can, and it’s a wonderful thing to behold.

For example, I selected a picture of a yellow flower in Photo Gallery and tagged it with flower. Here’s the XML that showed up inside yellowflower.jpg:

<xmp:xmpmeta xmlns:xmp="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="">
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
<rdf:Bag xmlns:rdf="">
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
  <rdf:Bag xmlns:rdf="">
<rdf:Description xmlns:MicrosoftPhoto="">
<rdf:Description xmlns:xmp="">

It’s a bit of a mish-mash, to say the least. There’s RDF (Resource Description Framework) syntax, Adobe-style metadata syntax, and Microsoft-style metadata syntax. But it works. And when I look at this it strikes me that here, finally, is a microformat that has a shot at reaching critical mass.

Perhaps we’ve been looking in the wrong places for the first microformat to achieve liftoff. Many of us hoped hCalendar would, but it’s hard to argue that it has. I suppose that’s partly because even though we have a variety of online event services that produce the hCalendar format, there just aren’t that many people publishing and annotating that many events.

There are already a lot of people saving, publishing, and annotating photos. And the tagging interface in Vista’s Photo Gallery, which is really sweet, is about to recruit a whole lot more.

There’s also good support in .NET Framework 3.0 for reading and writing XMP metadata. In the example above, the tag flower was assigned interactively in Photo Gallery. Here’s an IronPython script to read that tag, and change it to iris.

import clr
from System.IO import FileStream, FileMode, FileAccess, FileShare
from System.Windows.Media.Imaging import JpegBitmapDecoder, 

def ReadFirstTag(jpg):
  f = FileStream(jpg,FileMode.Open)
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
  frame = decoder.Frames[0]
  metadata = frame.Metadata
  return metadata.GetQuery("/xmp/dc:subject/{int=0}")

def WriteFirstTag(jpg,tag):
  f = FileStream(jpg,FileMode.Open, FileAccess.ReadWrite, 
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
  frame = decoder.Frames[0]
  writer = frame.CreateInPlaceBitmapMetadataWriter()
    print "cannot save metadata"

print ReadFirstTag('yellowflower.jpg') 
print ReadFirstTag('yellowflower.jpg')

The output of this script is:


And when you revisit the photo in Photo Gallery, the tag has indeed changed from flower to iris. Very cool.