I’ve recently been exploring the implications of the following mantra:

The truth is in the file.

In this context it refers to a strategy for managing metadata (e.g., tags) primarily in digital files (e.g., JPEG images, Word documents) and only secondarily in a database derived from those files.

Commenting on an entry that explores how Vista uses this technique for photo tags, Brian Dorsey throws down a warning flag:

Many applications are guilty of changing JPEGs [ed: RAW file, not JPEGs, are the issue, see below] behind the scenes and there is nothing forcing them to do it in compatible ways. Here is a recent example with Vista.

A cautionary tale, indeed. This is the kind of subject that doesn’t necessarily yield right and wrong answers. But we can at least put the various options on the table and discuss them.

There is an interesting comparison to be made, for example, between OS X and Vista. While researching this topic I found this Lifehacker article on a feature of OS X that I completely missed. You can tag a file in the GetInfo dialog, and when you do, the file will be instantly findable (by that tag) in SpotLight.

My purpose here is not to discuss or debate the OS X and Vista interfaces for tagging files and searching for tagged files. I do however want to explore the implications of two different strategies: “the truth is in the file” versus “the truth is in the database”.

In Vista, if I tag yellowflower.jpg with iris, that tag lives primarily in the file yellowflower.jpg and secondarily in a database. An advantage is that if I transfer that file to another operating system, or to a cloud-based service like Flickr, the effort I’ve invested in tagging that file is (or anyway can be) preserved. A disadvantage, as Brian points out, is that when different applications try to manage data that’s living inside JPEG files, my investment in tagging can be lost.

Conversely, if I tag yellowflower.jpg with iris in OS X, yellowflower.jpg is untouched, the tag only lives in Spotlight’s database. If I transfer the file elsewhere, my investment in tagging is lost. But on my own system, my tags are less vulnerable to corruption.

Arguably these are both valid strategies. The Vista way optimizes for cross-system interoperability and collaboration, while the OS X way optimizes for single-system consistency. Of course as always we’d really like to have the best of both worlds. Can we?

It’s a tough problem. Vista tries to help with consistency by offering APIs in the .NET Framework for manipulating photo metadata. But those APIs don’t yet cover all the image formats, and even if they did, there’s nothing to prevent developers from going around them and writing straight to the files.

For its part, OS X offers APIs for querying the Spotlight database. So an application that wanted to marry up images and their metadata could do so, but there’s no guarantee that a backup application or a Flickr uploader would do so.

It’s an interesting conundrum. Because I am mindful of the lively discussion over at Scoble’s place about what matters to people in the real world, though, I don’t want to leave this in the realm of technical arcana. There are real risks and benefits associated with each of these strategies. And while it’s true that people want things to Just Work, that means different things to different people.

If you’re an avid Flickr user, if you invest effort tagging photos in OS X, and if that effort is lost when you upload to Flickr, then OS X did not Just Work for you. Conversely if you don’t care about online photo sharing, if you invest effort tagging photos in Vista, and then another application corrupts your tags, then Vista did not Just Work for you.

I think many people would understand that explanation. In principle, both operating systems could frame the issue in exactly those terms, and could even offer a choice of strategy based on your preferred workstyle. In practice that’s problematic because people don’t really want choice, they want things to Just Work, and they’d like technology to divine what Just Work means to them, which it can’t. It’s also problematic because framing the choice requires a frank assessment of both risks and benefits, and no vendor wants to talk about risks.

I guess that in the end, both systems are going to have to bite the bullet and figure out how to Just Work for everybody.