So I wanted to make a HTML page of just the titles of my blog items, with the titles hyperlinked. Here’s a solution in PowerShell:

[xml]$xml = get-content 'wordpress.xml'
$items = $xml.rss.channel.item | Select-Object title,link
foreach ($item in $items)
  {
  $s = '<p><a href="' + $item.link + '">' + $item.title + '</a></p>'
  echo $s
  }

I like how the XML handling is just woven into the fabric.

That said, the XML file that WordPress exports is — I just discovered to my chagrin — not actually XML. The comments contain all sorts of junk that choke an XML parser. I couldn’t find an example of a multiline non-greedy regular expression search-and-replace in PowerShell, so I stripped out the comments using Python:

import re
s = open('wordpress.2007-11-09.xml').read()
pat = re.compile('<wp:comment>.+?</wp:comment>',re.DOTALL)
s = re.sub(pat,'',s)
f = open('wordpress.xml','w')
f.write(s)

Mapping idioms from one language to another is such an interesting problem. I’ve always imagined a kind of Rosetta Stone of patterns. It would contain patterns like multiline non-greedy regular expression search-and-replace and then you could map examples from any language into those patterns. Does any resource on the web approximate that kind of pattern vocabulary?