Recently I migrated a wiki from one platform to another. It was complicated in a couple of ways. The first wrinkle was hosting. The old wiki ran on a Linux-based virtual machine and the new one runs on GitHub. The second wrinkle was markup. MediaWiki uses one flavor of lightweight markup and GitHub uses (a variant of) another.
The process was confusing even for me. But logistics aside, it raised questions about standards, interoperability, and the challenge of working in an evolving digital realm.
The wiki in question is the documentation for the Thali project which I’ve mentioned in a number of posts. The project is mainly documented by Thali’s creator, Yaron Goland. Why use a wiki? Thali is a fast-moving project. Yaron has a blog, and he could use that to document Thali. But while blogs are agile publishing tools, they don’t shine when it comes to restructuring and spontaneous editing. Those are the great strengths of wikis.
Thali was originally hosted on CodePlex. Since that service doesn’t offer a built-in wiki, Yaron augmented it with a Bitnami MediaWiki image hosted in Azure. This was a DIY setup, not a managed service, which meant that when the Heartbleed Bug showed up he had to patch it himself, and he would have been on the hook again when Shellshock arrived. Life’s too short for that.
Also, with the project’s source code hosted on GitHub, it made sense to explore hosting the documentation there too. It’s simpler for readers of the code and the documentation to find everything in one place. And it’s simpler for writers of both forms of text to put everything in that place. There’s just one service to authenticate too, and tools for version control and issue tracking can be used for both forms of text.
I started by moving a few experimental pages from the MediaWiki to the GitHub wiki. Were there tools that could automate the translation? Maybe, but I’ve learned to walk before attempting to run. Converting a few pages by hand gave me an appreciation of the differences between the two markup languages. Each is a de facto standard with many derived variations. GitHub, for example, uses a variant of Markdown called GitHub Flavored Markdown (GFM). Tools that read and write “standard” Markdown don’t properly read and write GFM.
If I were teaching a course in advanced web literacy, I’d pose the following homework exercise:
You’re required to migrate a wiki from MediaWiki to GitHub. Possible strategies include:
- Use a tool that does the translation automatically.
- Create that tool if it doesn’t exist
- Do the job manually
Evaluate these options.
Of course there are assumptions buried in the problem statement. A web-literate student should first ask: “Why? Are we just chasing a fad? What problems will this migration solve? What problems will it create? ”
Assuming we agree it makes sense, I’d like to see responses that:
- Enumerate available translators.
- Cite credible evaluations of them (and explain why they’re credible).
- Analyze the source and target data to find out which markup features might or might not be supported by the available translators.
- Consider the translators’ implementation costs. Are they local or cloud-based? If local how much infrastructure must be installed, how complex are its dependencies? If cloud-based how will bulk operations work?
- If no translators emerge, make a back-of-the-envelope estimate of the distance between two formats and the effort required to create software to map between them.
- Evaluate the time and effort required to research, acquire, and use an automated tool, vis a vis that required to do the job manually.
- Estimate the break-even point at which a resuable automated tool pays off.
- Recognize that there really isn’t a manual option. Doing the job “by hand” in a text editor means using a tool that enables a degree of automation.
In my case that last point proved salient. The tools landscape looked messy, there were only a few dozen pages to move over, the distance between the two markups wasn’t great, it was (for me) a one-time thing, and I wanted to make an editorial pass through the stuff anyway. So I wound up using a text editor. To bridge one gap between the two formats — different syntaxes for hyperlinks — I recorded a macro to convert one to the other.
To achieve this result in MediaWiki:
You type this:
[[Frog|all about frogs]]
In a GitHub wiki it’s this:
[Frog](all about frogs)
So much writing nowadays happens in browsers, never mind word processors, never mind old-school text editors, that it’s worth pointing out those old dogs can do some cool tricks. I won’t even mention which editor I use because people get religious about this stuff. Suffice it to say that it’s one of a class of tools that make it easy to record, and then play back, a sequence of actions like this:
- Search for [[
- Put the cursor on the first [
- Delete it
- Search for |
- Change it to ]
- Type (
- Search for ]]
- Change it to )
You might find an automated translator that encodes that same recipe. You might be able to write code to implement it. But for a large class of textual transformations like this you can most certainly use an editor that records and runs macros. Given that the web is still a largely textual medium, where transformations like this one are often needed, it’s a shame that macros are a forgotten art. I often use them to prototype recipes that I’ll then translate into code. But sometimes, as in this case, they’re all the code I need. That’s something I’d want students of web literacy to realize.
What would really make my day, though, would be for one of those students to say:
“Hey, wait a sec. This doesn’t make sense. There is no such thing as GitHub Flavored HTML. Why is there GitHub Flavored Markdown?
Or Standard Flavored Markdown, which quickly became Common Markdown, then CommonMark. How de facto standards become de jure standards, or don’t, is a fascinating subject. The web works as well as it does because we mostly agree on a set of tools and practices. But it evolves when we disagree, try different approaches, and test them against one another in a marketplace of ideas. Citizens of a web-literate planet should appreciate both the agreements and the disagreements.