While updating my home page today, I noticed that that the page listing my InfoWorld articles had become a graveyard of broken links. The stuff is all still there, but at some point the site switched to another content management system without redirecting old URLs. This happens to me from time to time. It’s always annoying. In some cases I’ve moved archives to my own personal web space. But I prefer to keep them alive in their original contexts, if possible. This time around, I came up with a quick and easy way to do that. I’ll describe it here because it illustrates a few simple and effective strategies.
My listing page looks like this:
<p><a href=”http://www.infoworld.com/article/06/11/15/47OPstrategic_1.html”>XQuery and the power of learning by example | Column | 2006-11-15</a></p>
<p><a href=”http://www.infoworld.com/article/06/11/08/46OPstrategic_1.html”>Web apps, just give me the data | Column | 2006-11-08</a></p>
It’s easy to see the underlying pattern:
LINK | CATEGORY | DATE
When I left InfoWorld I searched the site for everything I’d written there and made a list, in the HTML format shown above, that conformed to the pattern. Today I needed to alter all the URLs in that list. My initial plan was to search for each title using this pattern:
site:infoworld.com “jon udell” “TITLE”
For example, try this in Google or Bing:
site:infoworld.com “jon udell” “xquery and the power of learning by example”
Either way, you bypass the now-broken original URL (http://www.infoworld.com/article/06/11/15/47OPstrategic_1.html) and are led to the current one (http://www.infoworld.com/article/2660595/application-development/xquery-and-the-power-of-learning-by-example.html)
The plan was then to write a script that would robotically perform those searches and extract the current URL from each result. But life’s short, I’m lazy, and I realized a couple of things. First, the desired result is usually but not always first, so the script would need to deal with that. Second, what if the URLs change yet again?
That led to an interesting conclusion: the search URLs themselves are good enough for my purposes. I just needed to transform the page of links to broken URLs into a page of links to title searches constrained to infoworld.com and my name. So that’s what I did, it works nicely, and the page is future-proofed against future URL breakage.
I could have written code to do that transformation, but I’d rather not. Also, contrary to a popular belief, I don’t think everyone can or should learn to write code. There are other ways to accomplish a task like this, ways that are easier for me and — more importantly — accessible to non-programmers. I alluded to one of them in A web of agreements and disagreements, which shows how to translate from one wiki format to another just by recording and using a macro in a text editing program. I used that same strategy in this case.
Of course recording a macro is a kind of coding. It’s tricky to get it to do what you intend. So here’s a related strategy: divide a complex transformation into a series of simpler steps. Here are the steps I used to fix the listing page.
Step 1: Remove the old URLs
The old URLS are useless clutter at this point, so just get rid of them.
old: <p><a href=”http://www.infoworld.com/article/06/11/15/47OPstrategic_1.html”>XQuery and the power of learning by example | Column | 2006-11-15</a></p>
new: <p><a href=””>XQuery and the power of learning by example | Column | 2006-11-15</a></p>
how: Search for href=”, mark the spot, search for “>, delete the highlighted selection between the two search targets, go to the next line.
Step 2: Add query templates
We’ve already seen the pattern we need: site:infoworld.com “jon udell” “TITLE”. Now we’ll replace the empty URLs with URLs that include the pattern. To create the template, search Google or Bing for the pattern. (I used Bing but you can use Google the same way.) You’ll see some funny things in the URLs they produce, things like %3A and %22. These are alternate ways of representing the equals sign and the double quote. They make things harder to read, but you need them to preserve the integrity of the URL. Copy this URL from the browser’s location window to the clipboard.
old: <p><a href=””>XQuery and the power of learning by example | Column | 2006-11-15</a></p>
new: <p><a href=”http://www.bing.com/search?q=site%3Ainfoworld.com+%22jon+udell%22+%22%5BTITLE%5D%22″>XQuery and the power of learning by example | Column | 2006-11-15</a></p>
how: Copy the template URL to the clipboard. Then for each line, search for href=””, put the cursor after the first double quote, paste, and go to the next line.
Step 3: Replace [TITLE] in each template with the actual title
old: <p><a href=”http://www.bing.com/search?q=site%3Ainfoworld.com+%22jon+udell%22+%22%5BTITLE%5D%22″>XQuery and the power of learning by example | Column | 2006-11-15</a></p>
new: <p><a href=”http://www.bing.com/search?q=site%3Ainfoworld.com+%22jon+udell%22+%22XQuery and the power of learning by example%22″>XQuery and the power of learning by example | Column | 2006-11-15</a></p>
how: For each line, search for >”, mark the spot, search for |, paste, copy the highlighted section between the two search targets, search for [TITLE], put the cursor at [, delete the next 7 characters, paste from the clipboard.
Now that I’ve written all this down, I’ll admit it looks daunting, and doesn’t really qualify as a “no coding required” solution. It is a kind of coding, to be sure. But this kind of coding doesn’t involve a programming language. Instead you work out how to do things interactively, and then capture and replay those interactions.
I’ll also admit that, even though word processors like Microsoft Word and LibreOffice can do capture and replay, you’ll be hard pressed to pull off a transformation like this using those tools. They’re not set up to do incremental search, or switch between searching and editing while recording. So I didn’t use a word processor, I used a programmer’s text editor. Mine’s an ancient one from Lugaru Software, there are many others, all of which will be familiar only to programmers. Which, of course, defeats my argument for accessibility. If you are not a programmer, you are not going to want to acquire and use a tool made for programmers.
So I’m left with a question. Are there tools — preferably online tools — that make this kind of text transformation widely available? If not, there’s an opportunity to create one. What IFTTT is doing for manual integration of web services is something that could also be done for manual transformation of text. If you watch over an office worker’s shoulder for any length of time, you’ll see that kind of manual transformation happening. It’s a colossal waste of time (and source of error). I could have spent hours reformatting that listing page. Instead it took me a few minutes. In the time I saved I documented how to do it. I wasn’t able to give you a reusable and modifiable online recipe, but that’s doable and would be a wonderful thing to enable.