Search Results for 'truth is in the file'


Back in February I noted that Photo Gallery can embed metadata — including tags — directly into images. One implication of this “truth in the file” strategy was that photos tagged on the desktop in Photo Gallery would someday be able to carry their tags with them when uploaded to Flickr. Today is that day. The latest beta refresh of Photo Gallery introduces a tag-preserving Flickr upload feature. Excellent!

There is, of course, no absolute truth in this game of tag. From Photo Gallery’s perspective, not all image formats are created equal. Some are friendlier than others to the embedding of metadata. So while you can tag both a JPEG and a PNG image in Photo Gallery, the tags have different relationships to the photos. In the case of the JPEG, the tag lives inside the image file. In the case of the PNG, it lives in an external data store.

It was already the case that a tag assigned to a PNG image would not survive a trip through your local recycle bin. For the same reason, a tag assigned to a PNG image will not survive a trip to Flickr.

What about the reverse trip? If you download a tagged JPEG from Flickr, will Photo Gallery see the tag? So far as I can determine, no. (Update: As per kellan’s comment below, the largest size of your photo — and the one you’re likeliest to want to fetch back — is unmodified and does retain the tag.)

It follows from these observations that there is no straightforward way to synchronize your cloud-based tag vocabulary with your local tag vocabulary. If these evolve, they will diverge.

Mind you, I’m not complaining. There really is no practical solution to this problem. I’m delighted that we’ve taken this first step, not only because it offers real practical benefit but also because it will get more people thinking about how we might want to manage our self-assigned metadata in a decentralized world.

Here’s what I envision. In a hosted lifebits scenario, every chunk of bits I produce — on the desktop or in the cloud — has a canonical, globally-unique, and cloud-accessible name, along with one or more representations and associated pieces of metadata. When I upload a photo to Flickr I don’t transfer the bits that directly represent the photo. Instead I transfer that canonical name. When Flickr or another service needs to materialize a representation of the photo, it sources the image data from one of my hosted representations. These services likewise syndicate my metadata, including tags, so that when my tag vocabulary evolves, it evolves everywhere.

I know, I know, there is a crazy amount of stuff that would need to happen in order to make this possible. But it would be good stuff, for users and for service providers, and I can’t help imagining what it would be like.

I’ve recently been exploring the implications of the following mantra:

The truth is in the file.

In this context it refers to a strategy for managing metadata (e.g., tags) primarily in digital files (e.g., JPEG images, Word documents) and only secondarily in a database derived from those files.

Commenting on an entry that explores how Vista uses this technique for photo tags, Brian Dorsey throws down a warning flag:

Many applications are guilty of changing JPEGs [ed: RAW file, not JPEGs, are the issue, see below] behind the scenes and there is nothing forcing them to do it in compatible ways. Here is a recent example with Vista.

A cautionary tale, indeed. This is the kind of subject that doesn’t necessarily yield right and wrong answers. But we can at least put the various options on the table and discuss them.

There is an interesting comparison to be made, for example, between OS X and Vista. While researching this topic I found this Lifehacker article on a feature of OS X that I completely missed. You can tag a file in the GetInfo dialog, and when you do, the file will be instantly findable (by that tag) in SpotLight.

My purpose here is not to discuss or debate the OS X and Vista interfaces for tagging files and searching for tagged files. I do however want to explore the implications of two different strategies: “the truth is in the file” versus “the truth is in the database”.

In Vista, if I tag yellowflower.jpg with iris, that tag lives primarily in the file yellowflower.jpg and secondarily in a database. An advantage is that if I transfer that file to another operating system, or to a cloud-based service like Flickr, the effort I’ve invested in tagging that file is (or anyway can be) preserved. A disadvantage, as Brian points out, is that when different applications try to manage data that’s living inside JPEG files, my investment in tagging can be lost.

Conversely, if I tag yellowflower.jpg with iris in OS X, yellowflower.jpg is untouched, the tag only lives in Spotlight’s database. If I transfer the file elsewhere, my investment in tagging is lost. But on my own system, my tags are less vulnerable to corruption.

Arguably these are both valid strategies. The Vista way optimizes for cross-system interoperability and collaboration, while the OS X way optimizes for single-system consistency. Of course as always we’d really like to have the best of both worlds. Can we?

It’s a tough problem. Vista tries to help with consistency by offering APIs in the .NET Framework for manipulating photo metadata. But those APIs don’t yet cover all the image formats, and even if they did, there’s nothing to prevent developers from going around them and writing straight to the files.

For its part, OS X offers APIs for querying the Spotlight database. So an application that wanted to marry up images and their metadata could do so, but there’s no guarantee that a backup application or a Flickr uploader would do so.

It’s an interesting conundrum. Because I am mindful of the lively discussion over at Scoble’s place about what matters to people in the real world, though, I don’t want to leave this in the realm of technical arcana. There are real risks and benefits associated with each of these strategies. And while it’s true that people want things to Just Work, that means different things to different people.

If you’re an avid Flickr user, if you invest effort tagging photos in OS X, and if that effort is lost when you upload to Flickr, then OS X did not Just Work for you. Conversely if you don’t care about online photo sharing, if you invest effort tagging photos in Vista, and then another application corrupts your tags, then Vista did not Just Work for you.

I think many people would understand that explanation. In principle, both operating systems could frame the issue in exactly those terms, and could even offer a choice of strategy based on your preferred workstyle. In practice that’s problematic because people don’t really want choice, they want things to Just Work, and they’d like technology to divine what Just Work means to them, which it can’t. It’s also problematic because framing the choice requires a frank assessment of both risks and benefits, and no vendor wants to talk about risks.

I guess that in the end, both systems are going to have to bite the bullet and figure out how to Just Work for everybody.

In 2005 I noted the following two definitions of truth:

1. WinFS architect Quentin Clark: “We [i.e. the WinFS database] are the truth.”

2. Indigo architect Don Box: “Message, oh Message / The Truth Is On The Wire / There Is Nothing Else”

Today I’m adding a third definition:

3. Scott Dart, program manager for the Vista Photo Gallery: “The truth is in the file.”

What Scott means is that although image metadata is cached in a database, so that Photo Gallery can search and organize quickly, the canonical location for metadata, including tags, is the file itself. As a result, when you use Photo Gallery to tag your images, you’re making an investment in the image files themselves. If you copy those files to another machine, or upload them to the Net, the tags will travel with those image files. Other applications will be able to make them visible and editable, and those edits can flow back to your local store if you transfer the files back.

That’s huge. It’s also, of course, a bit more complicated. As Scott explains, there are different flavors of metadata: EXIF, IPTC, and the new favorite, XMP. And not all image formats can embed image metadata. In fact many popular formats can’t, including PNG, GIF, and BMP. [Update: Incorrect, see next rock.] But JPG can, and it’s a wonderful thing to behold.

For example, I selected a picture of a yellow flower in Photo Gallery and tagged it with flower. Here’s the XML that showed up inside yellowflower.jpg:

<xmp:xmpmeta xmlns:xmp="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
  xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:subject>
<rdf:Bag xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:li>horse</rdf:li></rdf:Bag>
</dc:subject>
</rdf:Description>
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" 
  xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0">
  <MicrosoftPhoto:LastKeywordXMP>
  <rdf:Bag xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:li>flower</rdf:li>
  </rdf:Bag>
  </MicrosoftPhoto:LastKeywordXMP>
</rdf:Description>
<rdf:Description xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0">
  <MicrosoftPhoto:Rating>1</MicrosoftPhoto:Rating>
 </rdf:Description>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/">
  <xmp:Rating>1</xmp:Rating>
</rdf:Description>
</rdf:RDF>
</xmp:xmpmeta>

It’s a bit of a mish-mash, to say the least. There’s RDF (Resource Description Framework) syntax, Adobe-style metadata syntax, and Microsoft-style metadata syntax. But it works. And when I look at this it strikes me that here, finally, is a microformat that has a shot at reaching critical mass.

Perhaps we’ve been looking in the wrong places for the first microformat to achieve liftoff. Many of us hoped hCalendar would, but it’s hard to argue that it has. I suppose that’s partly because even though we have a variety of online event services that produce the hCalendar format, there just aren’t that many people publishing and annotating that many events.

There are already a lot of people saving, publishing, and annotating photos. And the tagging interface in Vista’s Photo Gallery, which is really sweet, is about to recruit a whole lot more.

There’s also good support in .NET Framework 3.0 for reading and writing XMP metadata. In the example above, the tag flower was assigned interactively in Photo Gallery. Here’s an IronPython script to read that tag, and change it to iris.

import clr
clr.AddReferenceByPartialName("PresentationCore")
from System.IO import FileStream, FileMode, FileAccess, FileShare
from System.Windows.Media.Imaging import JpegBitmapDecoder, 
 BitmapCreateOptions,BitmapCacheOption


def ReadFirstTag(jpg):
  f = FileStream(jpg,FileMode.Open)
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
    BitmapCacheOption.Default)
  frame = decoder.Frames[0]
  metadata = frame.Metadata
  f.Close()
  return metadata.GetQuery("/xmp/dc:subject/{int=0}")


def WriteFirstTag(jpg,tag):
  f = FileStream(jpg,FileMode.Open, FileAccess.ReadWrite, 
    FileShare.ReadWrite)
  decoder = JpegBitmapDecoder(f, BitmapCreateOptions.PreservePixelFormat, 
    BitmapCacheOption.Default)
  frame = decoder.Frames[0]
  writer = frame.CreateInPlaceBitmapMetadataWriter()
  try:
    writer.SetQuery("/xmp/dc:subject/{int=0}",tag)
    writer.TrySave()
  except:
    print "cannot save metadata"
  f.Close()
  writer.GetQuery("/xmp/dc:subject/{int=0}")

print ReadFirstTag('yellowflower.jpg') 
WriteFirstTag('yellowflower.jpg','iris')
print ReadFirstTag('yellowflower.jpg')

The output of this script is:

flower
iris

And when you revisit the photo in Photo Gallery, the tag has indeed changed from flower to iris. Very cool.

For the last five years, Bill Crow has been working on HD Photo, a new image file format that’s intended to supplant the JPEG format currently at the heart of the digital photography ecosystem.

I first met Bill many years ago when he came to BYTE to show us HP NewWave, which was probably the earliest effort to produce an object-oriented file system for Windows — originally, believe it or not, for Windows 1.0. The connection between NewWave and HD Photo is tenuous, but it does exist in the sense that the metadata strategies we’re seeing today (see the truth is in the file) point the way toward ending the tyranny of the hierarchical file system.

Today’s podcast begins by revisiting NewWave, but it’s mostly about HD Photo: Why it was created, how it works, what it will mean to both amateur photographers (“happy snappers”) as well as pros, and how it will be standardized and baked into a next generation of digital cameras.

Along the way I learned a huge amount about the current state of digital photography. For example, I knew that pros prefer to shoot in RAW format, but I wasn’t clear what that meant. According to Bill, a RAW image is just sensor data from a high-end camera, which photo processing software later turns into an image. The professional photographer trades away convenience for control and flexibility. In the case of the JPEG images produced by the vast majority of digicams, though, it’s the other way around. We get usable images without any fuss, but we give up the ability to reinterpret the data. HD Photo aims for the best of both worlds: ultimate control and flexibility if you desire, convenience when you don’t.

Although Bill guesses we’re two years away from commercial HD Photo cameras, the format is being used today to support Photosynth. As he explains on his blog, a compressed Photo HD image has a regular structure that makes it possible to extract images at various levels of detail without decoding the entire image.

There’s a whole lot more to the story. I hugely enjoyed this conversation, and I think you will too.

In a comment on an item last month about Photo Gallery, Mary Branscombe writes:

I’m having an issue at the moment where renaming a file in Windows Live Photo Gallery seems to reset the date on the file so WLPG sees a file from May 2006 as having been taken today. Has anyone else seen this? Changing the name also loses my tags and confuses WLPG so it can’t upload it to flickr… All JPEGs.

That’s a perfectly plausible problem report. But I couldn’t reproduce it, and neither could a couple of product managers I asked to take a look. If it “won’t repro” we’re stuck.

But there might be a way out of this bind. The description “renaming a file in Windows Live Gallery seems to reset the date” is reasonably precise, but there can be all kinds of nuances that would be difficult for Mary to convey, or that she might not even be aware of. That’s why it’s a great idea to capture a screencast that illustrates the problem you’re having. You can do that with Windows Media Encoder which remains, as I’ve been saying for years, sadly unknown and unappreciated. If you’ve never installed or used it, John Montgomery’s recent screencast shows you how.

I wish that this sort of diagnostic screencasting were more accessible to people. Even I don’t reach for the tool as often as I should, and when I don’t, I regret it. For example, last year I ran into an issue with an application that suddenly wouldn’t save a particular file type. Of course it “wouldn’t repro” and I got into a long back-and-forth with the developer in the course of which I wound up installing a specially instrumented version of the program to capture a detailed log file.

In the end we found that, as is so often the case, it was a silly little thing. The export feature broke when I switched, without realizing it, from an absolute path:

c:/jon/…

to a relative path:

/jon/…

It turns out that a third-party component used by the program for this export operation won’t accept relative paths. The program needed to know that (which it didn’t) and, if a user entered a relative path, needed to transform it into an absolute one.

We’ll never know how things might have otherwise turned out. But if I’d shown the developer a screencast of my problem scenario, there’s a decent chance he’d have said: “Hmm. Something unusual about that path in the file save box.”

So I’ve asked Mary Branscombe to make a diagnostic screencast, and if you find yourself in a similar situation I urge you to do the same. Pictorial descriptions of software behavior can enhance verbal descriptions with details that we ourselves aren’t aware of.

The other day I wrote:

…as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, I’m obviously all about metadata coexisting with human-readable HTML.

Operating in that mode for years has given me a deep understanding of how documents, and collections of documents, are also databases. It has led me to imagine and prototype a way of working with documents that’s deeply informed by that duality. But none of this is apparent to most people and, if it requires them to write semantic CSS tags in XHTML using emacs, it never will become apparent.

So it’s time to cross the chasm and find out how to make these effects happen for people in editors that they actually use. Here’s how I’m writing this entry:

This is the display you get when you connect Word 2007 to a blog publishing system, in my case WordPress, and when you use the technique shown in this screencast to minimize the ribbon.

Here’s a summary of the tradeoffs between my homegrown approach and the Word-to-WordPress system I’m using here:

method

pros

cons

My homegrown approach

  • Can use any text editor
  • Source is inherently web-ready
  • Easy to add create and use new semantic features
  • Low barrier to XML processing
  • Only for geeks

Word 2007

  • A powerful editor that anyone can use
  • Source is not inherently web-ready
  • Harder to create and use new semantic features
  • Higher barrier to XML processing

These are two extreme ends of a continuum, to be sure, but there aren’t many points in between. For example, I claim that if I substitute OpenOffice Writer for Word 2007 in the above chart, nothing changes. So I’m going to try to find a middle ground between the extremes.

To that end, I’m developing some Python code to help me wrangle Word’s default .docx format, which is a zip file containing the document in WordML and a bunch of other stuff. At the end of this entry you can see what I’ve got so far. I’m using this code to explore what kind of XML I can inject programmatically into a Word 2007 document, what kind comes back after a round trip through the application, how that XML relates to the HTML that gets published to WordPress, and which of these representations will be the canonical one that I’ll want to store and process.

So far my conclusion is that none of these representations will be the canonical one, and that I’ll need to find (or more likely create) a transform to and from the canonical representation where I’ll store and process all my stuff. We’ll see how it goes.

Meanwhile here’s one immediately useful result. The tagDocx method shown below parallels the picture-tagging example I showed last week. Here, the truth is also in the file. When you use the Vista explorer to tag a Word 2007 file, the tag gets shoved into one of XML subdocuments stored inside the document. But any application can read and write the tag. Watch.

Before:

Run this code:

$ python

import wordxml

wordxml.tagDocx(‘Blogging from Word2007.docx’,’word2007 blogging tagging’)

 

After:

Here’s why this might matter to me. In my current workflow, I manage my blog entries in an XML database (really just a file). I extract the tags from that XML and inject them into del.icio.us. That enables great things to happen. I can explore my own stuff in a tag-oriented way. And I can exploit the social dimension of del.icio.us to see how my stuff relates to other people’s stuff.

But in del.icio.us the truth is not in the file, it’s in a database that asserts things about the file — its location on the web, its tags. If I revise my tag vocabulary in del.icio.us, the new vocabulary will be out of synch with what’s in my XML archive. So I have to do those revisions in my archive. I can, and I do, but it’s all programmatic work, there’s no user interface to assist me.

What I’m discovering about Vista and the Office apps is that they offer a nice combination of programmatic and user interfaces for doing these kinds of things. This blog entry uses three photos, for example. It’s easy for me to assign them the same tags I’m assigning this entry. If I do, I can interactively search for both the entry and the photos in the Vista shell. And I can build an alternate interface that runs that same search on the web and correlates results to published blog entries.

That’s still not the endgame. At heart I’m a citizen of the cloud, and I don’t want any dependencies on local applications or local storage. Clearly Vista and Office entail such dependencies. But they can also cooperate with the cloud and, over time, will do so in deeper and more sophisticated ways. It’s my ambition to do everything I can to improve that cooperation.

Note: There will be formatting problems in this HTML rendering which, for now, painful though it is, I am not going to try to fix by hacking around in the WordPress editor. There are a lot of moving parts here: Word, WordPress, the editor embedded in WordPress (which itself has a raw mode, a visual mode, and a secret/advanced visual mode). I haven’t sorted all this out yet, and I’m not sure I can. (Formatting source code. Why is that always the toothache?)

Anyway, if you want to follow along, I’ve posted the original .docx version of this file here.

Here’s wordxml.py which was imported in the above example. Note that this is CPython, not IronPython. That’s because I’m relying here on CPython’s zipfile module, which in turn relies on a compiled DLL.

import zipfile, re

 

def readDocx(docx):

inarc = zipfile.ZipFile(docx,’r’)

names = inarc.namelist()

dict = {}

for name in names:

dict[name] = inarc.read(name)

inarc.close

print dict.keys()

return dict

 

def readDocumentFromDocx(docx):

arc = zipfile.ZipFile(docx,’r’)

s = arc.read(‘word/document.xml’)

f = open(‘document.xml’,’w’)

f.write(s)

f.close()

return s

 

def updateDocumentInDocx(docx,doc):

dict = readDocx(docx)

archive = zipfile.ZipFile(docx,’w’)

for name in dict.keys():

if (name == ‘word/document.xml’):

dict[name] = doc

archive.writestr(name,dict[name])

archive.close()

 

def tagDocx(docx,tags):

dict = readDocx(docx)

archive = zipfile.ZipFile(docx,’w’)

for name in dict.keys():

if (name == ‘docProps/core.xml’):

dict[name] = re.sub(‘<cp:keywords>(.*)</cp:keywords>’,’<cp:keywords>%s</cp:keywords>’ %

tags, dict[name])

archive.writestr(name,dict[name])

archive.close()

 

 

Yesterday I exercised poetic license when I suggested that Adobe’s Extensible metadata platform (XMP) was not only the spiritual cousin of microformats like hCalendar but also, perhaps, more likely to see widespread use in the near term. My poetic license was revoked, though, in a couple of comments:

Mike Linksvayer: How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.

Danny Ayers: Like Mike I don’t really understand Jon’s references to microformats – I first assumed he meant XMP could be replaced with a uF.

Actually, I’m serious about this. If I step back and ask myself what are the essential qualities of a microformat, it’s a short list:

  1. A small chunk of machine-readable metadata,
  2. embedded in a document.

Mike notes:

XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.

Yes, I understand. And as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, I’m obviously all about metadata coexisting with human-readable HTML. And I’ve been applying this technique since long before I ever heard the term microformats — my own term was originally microcontent.

But some things that have mattered to me in my ivory tower, like “colocation of metadata with human-visible HTML,” matter to almost nobody else. In the real world, people have been waiting — still are waiting — for widespread deployment of the tools that will enable them to embed chunks of metadata in documents, work with that metadata in-place, and exchange it.

We’ll get there, I hope and pray. But when we finally do, how different are these two scenarios, really?

  1. I use an interactive editor to create the chunk of metadata I embed in a blog posting.
  2. I use an interactive editor to create the chunk of metadata I embed in a photo.

Now there is, as Mike points out, a big philosophical difference between XMP, which aims for arbitrary extensibility, and fixed-function microformats that target specific things like calendar events. But in practice, from the programmer’s perspective, here’s what I observe.

Hand me an HTML document containing a microformat instance and I will cast about in search of tools to parse it, find a variety of ones that sort of work, and then wrestle with the details.

Hand me an image file containing an XMP fragment and, lo and behold, it’s the same story!

In both of these cases, there either will or won’t be enough use of these formats to kickstart the kind of virtuous cycle where production of the formats gets reasonably well normalized. In the ivory tower we pretend that the formats matter above all, and we argue endlessly about them. Personally I’d rather see what I’d consider to be a simpler and cleaner XMP. Others will doubtless argue that XMP doesn’t go far enough in its embrace of semantic web standards. But when we have that argument we are missing the point. What matters is use. This method of embedding metadata in photos is going to be used a whole lot, and in ways that are very like how I’ve been imagining microformats would be used.

PS: As per for this comment, Scott Dart informs me that PNG (and to a lesser extent GIF) can embed arbitrary metadata, but that support for those embeddings regrettably didn’t make the cut in .NET Framework 3.0.

Follow

Get every new post delivered to your Inbox.

Join 6,095 other followers