My recent post about redirecting a page of broken links weaves together two different ideas. First, that the titles of the articles on that page of broken links can be used as search terms in alternate links that lead people to those articles’ new locations. Second, that non-programmers can create macros to transform the original links into alternate search-driven links.
There was lots of useful feedback on the first idea. As Herbert Van de Sompel and Michael Nelson pointed out, it was a really bad idea to discard the original URLs, which retain value as lookup keys into one or more web archives. Alan Levine showed how to do that with the Wayback Machine. That method, however, leads the user to sets of snapshots that don’t consistently mirror the original article, because (I think) Wayback’s captures happened both before and after the breakage.
So for now I’ve restored the original page of broken links, alongside the new page of transformed links. I’m grateful for the ensuing discussion about ways to annotate those transformed links so they’re aware of the originals, and so they can tap into evolving services — like Memento — that will make good use of the originals.
The second idea, about tools and automation, drew interesting commentary as well. dyerjohn pointed to NimbleText, though we agreed it’s more suited to tabular than to textual data. Owen Stephens reminded me that the tool I first knew as Freebase Gridworks, then Google Refine, is going strong as OpenRefine. And while it too is more tabular than textual, “the data is line based,” he says, “and sort of tabular if you squint at it hard.” In Using OpenRefine to manipulate HTML he presents a fully-worked example of how to use OpenRefine to do the transformation I made by recording and playing back a macro in my text editor.
The point about MS Word styles is spot on. That mechanism asks people to think abstractly, in terms of classes and instances. It’s never caught on. So with my text-transformation puzzle, Les suggests. Even with tools that enable non-coders to solve the puzzle, getting people across the cognitive threshold is Too Hard.
While mulling this over, I happened to watch Jeremy Howard’s TED talk on machine learning. He demonstrates a wonderful partnership between human and machine. The task is to categorize a large set of images. The computer suggests groupings, the human corrects and refines those groupings, the process iterates.
We’ve yet to inject that technology into our everyday productivity tools, but we will. And then, maybe, we will finally start to bridge the gap between coders and non-coders. The computer will watch the styles I create as I write, infer classes, offer to instantiate them for me, and we will iterate that process. Similarly, when I’m doing a repetitive transformation, it will notice what’s happening, infer the algorithm, offer to implement it for me, we’ll run it experimentally on a sample, then iterate.
Maybe in the end what people will most need to learn is not how to design stylistic classes and instances, or how to write code that automates repetitive tasks, but rather how to partner effectively with machines that work with us to make those things happen. Things that are Too Hard for most living humans and all current machines to do on their own.