Individual voices in the Federated Wiki chorus

In recent days I’ve been immersed in the Federated Wiki Happening, a group exploration of Ward Cunningham’s Smallest Federated Wiki (SFW). When I first saw what Ward was up to, nearly a year ago, I resisted the temptation to dive in because I knew it would be a long and deep dive that I couldn’t make time for. But when Mike Caulfield brought together a diverse group of like-minded scholars for the #fedwikihappening, I had the time and took the plunge. It’s been a joyful experience that reminds me of two bygone eras. The first was the dawn of the web, when I built the BYTE website and explored the Internet’s precursors to today’s social software. The second was the dawn of the blogosphere, when I immersed myself in Radio UserLand and RSS.

During both of those eras I participated in online communities enaged in, among other things, the discovery of emergent uses of the networked software that enabled those communities to exist. The interplay of social and technological dynamics was exciting in ways I’d almost forgotten. This week, the FedWikiHappening took me there again.

I want to explain why but, as Mike says today, so much has happened so quickly that it’s hard to know where to begin. For now, I’ll choose a single narrative thread: identity.

SFW inverts the traditional wiki model, which enables many authors to work on a canonical page. In SFW there is no canonical page. We all create our own pages and edit them exclusively. But we can also copy pages from others, and make changes. Others may (or may not) notice those changes, and may (or may not) merge the changes.

In this respect SFW resembles GitHub, and its terminology — you “fork” a page from an “origin site” — invites the comparison. But SFW is looser than GitHub. What GitHub calls a pull request, for example, isn’t (yet) a well-developed feature of SFW. And while attribution is crystal-clear in GitHub — you always know who made a contribution — it is (by design) somewhat vague in SFW. In the Chorus of Voices that Ward envisions, individual voices are not easy to discern.

That notion was hard for some of us in The Happening, myself included, to swallow. In SFW we were represented not as avatars with pictures but as neutral “flags” made of color gradients. Identity was Discoverable But Not Obvious.

Then Alex North cracked the code. He read through the FedWiki sources, found the hook for uploading the favicon that serves as SFW’s flag/avatar, and worked out a procedure for using that hook to upload an image.

The next day I worked out a Windows-friendly variant of Alex’s method and uploaded my own image. Meanwhile a few other Happening participants used Alex’s method to replace their colored gradients with photos.

The next day Mike Caulfield bowed to the will of the people and uploaded a batch of photos on behalf of participants unable to cope with Alex’s admittedly geeky hack. Suddenly the Happening looked more like a normal social network, where everyone’s contributions have identifying photos.

That was a victory, but not an unqualified one.

It was a victory in part because Alex showed the group that SFW is web software, and like all web software is radically open to unintended uses. Also, of course, because we were able to alter the system in response to a perceived need.

And yet, we may have decided too quickly not to explore a mode of collaboration that favors the chorus over the individual voice. Can we work together effectively that way, in a federated system that ultimately gives us full control of our own data? That remains an open question for me, one of many that the Happening has prompted me to ask and explore.

TypeScript Successes and Failures

My last post buried the lead, so I’ll hoist it to the top here: I’ve left Microsoft. While I figure out what my next gig will be, I’ll be doing some freelance writing and consulting. My first writing assignment will be an InfoWorld feature on TypeScript. It’s an important technology that isn’t yet well understood or widely adopted. I made two efforts to adopt it myself. The first, almost a year ago, didn’t stick. The second, a few weeks ago, did.

I’ll reflect on those experiences in the article. But I’m also keen to mine other perspectives on why TypeScript adoption fails or succeeds. And I’m particularly interested to hear about experiences with TypeScript toolchains other than Visual Studio. If you have perspectives and experiences to share, please drop a note here or to jon at jonudell.info.

Skype Translator will (also) be a tool for language learners

When I saw this video of Skype Translator I realized that beyond just(!) translation, it will be a powerful tool for language learning. Last night I got a glimpse of that near future. Our next door neighbor, Yolanda, came here from Mexico 30 years ago and is fluently bilingual. She was sitting outside with her friend, Carmen, who speaks almost no English. I joined them and tried to listen to their Spanish conversation. I learned a bit of Spanish in high school but I’ve never been conversational. Here in Santa Rosa I’m surrounded by speakers of Spanish, it’s an opportunity to learn, and Yolanda — who has worked as a translator in the court system — is willing to help.

I find myself on parallel tracks with respect to my learning of two different languages: music and Spanish. In both cases I’ve historically learned more from books than by ear. Now I want to put myself into situations that force me to set the books aside, listen intently, and then try to speak appropriately. I can use all the help I can get. Luckily we live in an era of unprecedented tool support. On the musical front, I’ve made good use of Adrian Holovaty’s SoundSlice, a remarkable tool for studying and transcribing musical performances it pulls from YouTube. I haven’t used SoundSlice much for annotation, because I’m trying to develop my ear and my ability to converse musically in realtime. But its ability to slow down part of a tune, and then loop it, has been really helpful in my efforts to interact with real performances.

I suspect that’s why Skype Translator will turn out to be great for language learning. Actually I’m sure that will happen, and here’s why. Last night I showed the Skype Translator video to Yolanda and Carmen. Neither is remotely tech-savvy but both instantly understood what was happening. Yolanda marveled to see machine translation coming alive. Carmen, meanwhile, was transfixed by the bilingual exchange. And when she heard the English translation of a Spanish phrase, I could see her mouthing the English words. I found myself doing the same for the Spanish translation of English phrases.

That’s already a powerful thing, and yet we were only observers of a conversation. When we can be participants, motivated to communicate, the service won’t just be a way to speak across a language gap. It’ll be a way to learn one another’s languages.

No disclosure is needed here, by the way, because I’m a free agent for now. My final day with Microsoft was last Friday. In the end I wasn’t able to contribute in the ways I’d hoped I could. But great things are happening there, and Skype Translator is only one of the reasons I’m bullish on the company’s future.

Human/machine partnership for problems otherwise Too Hard

My recent post about redirecting a page of broken links weaves together two different ideas. First, that the titles of the articles on that page of broken links can be used as search terms in alternate links that lead people to those articles’ new locations. Second, that non-programmers can create macros to transform the original links into alternate search-driven links.

There was lots of useful feedback on the first idea. As Herbert Van de Sompel and Michael Nelson pointed out, it was a really bad idea to discard the original URLs, which retain value as lookup keys into one or more web archives. Alan Levine showed how to do that with the Wayback Machine. That method, however, leads the user to sets of snapshots that don’t consistently mirror the original article, because (I think) Wayback’s captures happened both before and after the breakage.

So for now I’ve restored the original page of broken links, alongside the new page of transformed links. I’m grateful for the ensuing discussion about ways to annotate those transformed links so they’re aware of the originals, and so they can tap into evolving services — like Memento — that will make good use of the originals.

The second idea, about tools and automation, drew interesting commentary as well. dyerjohn pointed to NimbleText, though we agreed it’s more suited to tabular than to textual data. Owen Stephens reminded me that the tool I first knew as Freebase Gridworks, then Google Refine, is going strong as OpenRefine. And while it too is more tabular than textual, “the data is line based,” he says, “and sort of tabular if you squint at it hard.” In Using OpenRefine to manipulate HTML he presents a fully-worked example of how to use OpenRefine to do the transformation I made by recording and playing back a macro in my text editor.

Meanwhile, on Twitter, Paul Walk and Les Carr and I were rehashing the old permathread about coding for non-coders.

The point about MS Word styles is spot on. That mechanism asks people to think abstractly, in terms of classes and instances. It’s never caught on. So with my text-transformation puzzle, Les suggests. Even with tools that enable non-coders to solve the puzzle, getting people across the cognitive threshold is Too Hard.

While mulling this over, I happened to watch Jeremy Howard’s TED talk on machine learning. He demonstrates a wonderful partnership between human and machine. The task is to categorize a large set of images. The computer suggests groupings, the human corrects and refines those groupings, the process iterates.

We’ve yet to inject that technology into our everyday productivity tools, but we will. And then, maybe, we will finally start to bridge the gap between coders and non-coders. The computer will watch the styles I create as I write, infer classes, offer to instantiate them for me, and we will iterate that process. Similarly, when I’m doing a repetitive transformation, it will notice what’s happening, infer the algorithm, offer to implement it for me, we’ll run it experimentally on a sample, then iterate.

Maybe in the end what people will most need to learn is not how to design stylistic classes and instances, or how to write code that automates repetitive tasks, but rather how to partner effectively with machines that work with us to make those things happen. Things that are Too Hard for most living humans and all current machines to do on their own.

Where’s the IFTTT for repetitive manual text transformation?

While updating my home page today, I noticed that that the page listing my InfoWorld articles had become a graveyard of broken links. The stuff is all still there, but at some point the site switched to another content management system without redirecting old URLs. This happens to me from time to time. It’s always annoying. In some cases I’ve moved archives to my own personal web space. But I prefer to keep them alive in their original contexts, if possible. This time around, I came up with a quick and easy way to do that. I’ll describe it here because it illustrates a few simple and effective strategies.

My listing page looks like this:

<p><a href=”http://www.infoworld.com/article/06/11/15/47OPstrategic_1.html”>XQuery and the power of learning by example | Column | 2006-11-15</a></p>

<p><a href=”http://www.infoworld.com/article/06/11/08/46OPstrategic_1.html”>Web apps, just give me the data | Column | 2006-11-08</a></p>

It’s easy to see the underlying pattern:

LINK | CATEGORY | DATE

When I left InfoWorld I searched the site for everything I’d written there and made a list, in the HTML format shown above, that conformed to the pattern. Today I needed to alter all the URLs in that list. My initial plan was to search for each title using this pattern:

site:infoworld.com “jon udell” “TITLE”

For example, try this in Google or Bing:

site:infoworld.com “jon udell” “xquery and the power of learning by example”

Either way, you bypass the now-broken original URL (http://www.infoworld.com/article/06/11/15/47OPstrategic_1.html) and are led to the current one (http://www.infoworld.com/article/2660595/application-development/xquery-and-the-power-of-learning-by-example.html)

The plan was then to write a script that would robotically perform those searches and extract the current URL from each result. But life’s short, I’m lazy, and I realized a couple of things. First, the desired result is usually but not always first, so the script would need to deal with that. Second, what if the URLs change yet again?

That led to an interesting conclusion: the search URLs themselves are good enough for my purposes. I just needed to transform the page of links to broken URLs into a page of links to title searches constrained to infoworld.com and my name. So that’s what I did, it works nicely, and the page is future-proofed against future URL breakage.

I could have written code to do that transformation, but I’d rather not. Also, contrary to a popular belief, I don’t think everyone can or should learn to write code. There are other ways to accomplish a task like this, ways that are easier for me and — more importantly — accessible to non-programmers. I alluded to one of them in A web of agreements and disagreements, which shows how to translate from one wiki format to another just by recording and using a macro in a text editing program. I used that same strategy in this case.

Of course recording a macro is a kind of coding. It’s tricky to get it to do what you intend. So here’s a related strategy: divide a complex transformation into a series of simpler steps. Here are the steps I used to fix the listing page.

Step 1: Remove the old URLs

The old URLS are useless clutter at this point, so just get rid of them.

old: <p><a href=”http://www.infoworld.com/article/06/11/15/47OPstrategic_1.html”>XQuery and the power of learning by example | Column | 2006-11-15</a></p>

new: <p><a href=””>XQuery and the power of learning by example | Column | 2006-11-15</a></p>

how: Search for href=”, mark the spot, search for “>, delete the highlighted selection between the two search targets, go to the next line.

Step 2: Add query templates

We’ve already seen the pattern we need: site:infoworld.com “jon udell” “TITLE”. Now we’ll replace the empty URLs with URLs that include the pattern. To create the template, search Google or Bing for the pattern. (I used Bing but you can use Google the same way.) You’ll see some funny things in the URLs they produce, things like %3A and %22. These are alternate ways of representing the equals sign and the double quote. They make things harder to read, but you need them to preserve the integrity of the URL. Copy this URL from the browser’s location window to the clipboard.

old: <p><a href=””>XQuery and the power of learning by example | Column | 2006-11-15</a></p>

new: <p><a href=”http://www.bing.com/search?q=site%3Ainfoworld.com+%22jon+udell%22+%22%5BTITLE%5D%22″>XQuery and the power of learning by example | Column | 2006-11-15</a></p>

how: Copy the template URL to the clipboard. Then for each line, search for href=””, put the cursor after the first double quote, paste, and go to the next line.

Step 3: Replace [TITLE] in each template with the actual title

old: <p><a href=”http://www.bing.com/search?q=site%3Ainfoworld.com+%22jon+udell%22+%22%5BTITLE%5D%22″>XQuery and the power of learning by example | Column | 2006-11-15</a></p>

new: <p><a href=”http://www.bing.com/search?q=site%3Ainfoworld.com+%22jon+udell%22+%22XQuery and the power of learning by example%22″>XQuery and the power of learning by example | Column | 2006-11-15</a></p>

how: For each line, search for >”, mark the spot, search for |, paste, copy the highlighted section between the two search targets, search for [TITLE], put the cursor at [, delete the next 7 characters, paste from the clipboard.

Now that I’ve written all this down, I’ll admit it looks daunting, and doesn’t really qualify as a “no coding required” solution. It is a kind of coding, to be sure. But this kind of coding doesn’t involve a programming language. Instead you work out how to do things interactively, and then capture and replay those interactions.

I’ll also admit that, even though word processors like Microsoft Word and LibreOffice can do capture and replay, you’ll be hard pressed to pull off a transformation like this using those tools. They’re not set up to do incremental search, or switch between searching and editing while recording. So I didn’t use a word processor, I used a programmer’s text editor. Mine’s an ancient one from Lugaru Software, there are many others, all of which will be familiar only to programmers. Which, of course, defeats my argument for accessibility. If you are not a programmer, you are not going to want to acquire and use a tool made for programmers.

So I’m left with a question. Are there tools — preferably online tools — that make this kind of text transformation widely available? If not, there’s an opportunity to create one. What IFTTT is doing for manual integration of web services is something that could also be done for manual transformation of text. If you watch over an office worker’s shoulder for any length of time, you’ll see that kind of manual transformation happening. It’s a colossal waste of time (and source of error). I could have spent hours reformatting that listing page. Instead it took me a few minutes. In the time I saved I documented how to do it. I wasn’t able to give you a reusable and modifiable online recipe, but that’s doable and would be a wonderful thing to enable.

Why shouting won’t help you talk to a person with hearing loss

I’ve written a few posts [1, 2] about my mom’s use of a reading machine to compensate for macular degeneration, and I made a video that shows the optimal strategy for using the machine. We’re past the point where she can get any benefit from the gadget, though. She needs such extreme magnification that it’s just not worth it any more.

So she’s more dependent than ever on her hearing. Sadly her hearing loss is nearly as profound as her vision loss, and hearing aids can’t compensate as well as we wish. She’s still getting good mileage out of audiobooks, and also podcasts which she listens to on MP3 players that I load up and send her. The clear and well-modulated voice of a single speaker, delivered through headphones that block out other sound, works well for her. But in real-world situations there are often several voices, not clear or well-modulated, coming from different parts of the room and competing with other ambient noise. She depends on hearing aids but as good as they’ve gotten, they can’t yet isolate and clarify those kinds of voices.

One of the best ways to communicate with my mom is to speak to her on the phone. That puts the voice directly in her ear while the phone blocks other sounds. And here’s a pro tip I got from the audiologist I visited today. If she removes the opposite hearing aid, she’ll cut down on ambient noise in the non-conversational ear.

In person, the same principle applies. Put the voice right into her ear. If I lean in and speak directly into her ear, I can speak in a normal voice and she can understand me pretty well. It’s been hard to get others to understand and apply that principle, though. People tend to shout from across the room or even from a few feet away. Those sounds don’t come through as clearly as sounds delivered much more softly directly into the ear. And shouting just amps up the stress in the room, which nobody needs.

Lately, though, the voice-in-the-ear strategy — whether on the phone or in person — had been failing us. We had thought maybe the hearing aids needed be cleaned, but that wasn’t the problem. She’s been accidentally turning down the volume! There’s a button on each hearing aid that you tap to cycle through the volume settings. I don’t think mom understood that, and I know she can’t sense if she touches the button while reseating the device with her finger. To compound the problem, the button’s action defaults to volume reduction. If it went the other way she might have been more likely to notice an accidental change. But really, given that she’s also losing dexterity, the volume control is just a useless affordance for her.

Today’s visit to the audiologist nailed the problem. When he hooked the hearing aids up to his computer and read their logs(!), we could see they’d often been running at reduced volume. On her last visit he’d set them to boot up at a level we’ll call 3 on a scale of 1 to 5. That’s the level he’d determined was best for her. He’d already had an inkling of what could go wrong, because on that visit he’d disabled the button on the left hearing aid. Now both are disabled, and the setting will stick to 3 unless we need to raise it permanently.

Solving that problem will help matters, but hearing aids can only do so much. The audiologist’s digital toolkit includes a simulator that enabled us to hear a pre-recorded sample voice the way my mom hears it. That was rather shocking. The unaltered voice was loud and clear. Then he activated mom’s profile, and the voice faded so low I thought it was gone completely. I had to put my ear right next to the computer’s speaker to hear it at all, and then it was only a low murmur. When there aren’t many hair cells doing their job in the inner ear, it takes a lot of energy to activate the few that still work, and it’s hard apply that energy with finesse.

I’m sure we’ll find ways to compensate more effectively. That won’t happen soon enough for my mom, though. I wonder if the audiologist’s simulator might play a useful role in the meantime. When we speak to a person with major hearing loss we don’t get any feedback about how we’re being heard. It’s easy to imagine a device that would record a range of speech samples, from shouting at a microphone from across the room to shouting at it from a few feet away to speaking softly directly into it. Then the gadget would play those sounds back two ways: first unaltered, then filtered through the listener’s hearing-loss profile. Maybe that would help people realize that shouting doesn’t help, but proper positioning does.

Alternative sources of data on police homicides

There were empty seats at the table on Thursday for young males of color who have been shot by police, most recently Tamar Rice, a 12-year-old boy who was carrying a toy gun. His case resonates powerfully in Santa Rosa where, last year, 13-year-old Andy Lopez was shot for the same reason. He is memorialized in this moveable mural currently on display at the Peace and Justice Center around the corner from Luann’s studio:

Our son is an Airsoft enthusiast, just like  Tamar Rice and Andy Lopez were. Unlike them he is white. In circumstances like theirs, would that have made the crucial difference? We think so. But when you look for data to confirm or reject that intuition, it’s thin and unreliable.

Criminal justice experts note that, while the federal government and national research groups keep scads of data and statistics— on topics ranging from how many people were victims of unprovoked shark attacks (53 in 2013) to the number of hogs and pigs living on farms in the U.S. (upwards of 64,000,000 according to 2010 numbers) — there is no reliable national data on how many people are shot by police officers each year.

How many police shootings a year? No one knows, Washington Post, 09/08/2014

The one available and widely-reported statistic is that, in recent years, there have been about 400 justifiable police homicides annually. In Nobody Knows How Many Americans The Police Kill Each Year, FiveThirtyEight’s Reuben Fischer-Baum reviews several sources for that number, and concludes that while it’s a reasonable baseline, the real number is likely higher.

Fischer-Baum’s article, and others he cites, draw on a couple of key Bureau of Justice reports. They are not hard to find:

https://www.google.com/?q=site:bjs.gov+”Justifiable+Homicide+by+Police

One report, Policing and Homicide, 1976-98: Justifiable Homicide by Police, Police Officers Murdered by Felons [1999], says this about race:

A growing percentage of felons killed by police are white, and a declining percentage are black (figure 4).

Race of felons killed
1978 50% White 49% Black
1988 59% White 39% Black
1998 62% White 35% Black

Felons justifiably killed by police represent a tiny fraction of the total population. Of the 183 million whites in 1998, police killed 225; of the 27 million blacks, police killed 127. While the rate (per million population) at which blacks were killed by police in 1998 was about 4 times that of whites (the figure below and figure 5), the difference used to be much wider: the black rate in 1978 was 8 times the white rate.

A more recent report, Homicide Trends in the United States, 1980-2008, is one key source for the widely-cited number of 400 justifiable police homicides per year:

Interestingly, the gap between justifiable police homicides and justifiable citizen homicides has widened in recent decades.

Table 14 addresses race:

It’s a complicated comparison involving the races of shooters and shootees when the shooters are civilians and also when they are police. In the latter case, the trend noted in the earlier report — “a declining percentage [of ‘felons’ killed by police] are black” — has reversed. Combining this table with the previous one, we get:

% of blacks killed by police
1978 49%
1988 39%
1998 35%
2008 38%

What is our level of confidence in this data? Low. From How many police shootings a year? No one knows.:

“What’s there is crappy data,” said David A. Klinger, a former police officer and criminal justice professor at the University of Missouri who studies police use of force.

Several independent trackers, primarily journalists and academics who study criminal justice, insist the accurate number of people shot and killed by police officers each year is consistently upwards of 1,000 each year.

“The FBI’s justifiable homicides and the estimates from (arrest-related deaths) both have significant limitations in terms of coverage and reliability that are primarily due to agency participation and measurement issues,” said Michael Planty, one of the Justice Department’s chief statisticians, in an email.

Are there other sources we might use? Well, yes. Wikipedia is one place to start. It has lists of killings by law enforcement officers in the U.S.. The introductory page says:

Listed below are lists of people killed by nonmilitary law enforcement officers, whether in the line of duty or not, and regardless of reason or method. Inclusion in the lists implies neither wrongdoing nor justification on the part of the person killed or the officer involved. The listing merely documents the occurrence of a death.

The lists below are incomplete, as the annual average number of justifiable homicides alone is estimated to be near 400.

Each entry cites a source, typically a newspaper report. About once a day (or maybe about twice day), a police officer shoots a civilian somewhere in the U.S. That’s a rare and dramatic event that will almost certainly be noted in a local newspaper. The report may or may not provide complete details, but it’s an independent data point. Everything else we know about the phenomenon is based on self-reporting by law enforcement.

To an analyst, the data that lives in Wikipedia tables is semi-structured. It can be extracted into a spreadsheet or a database, but extracting fully structured data almost always requires some massaging. The script I wrote to massage Wikipedia’s lists of police homicides handles the following irregularities:

  1. From 2009 to 2009 the table lives in a single per-year page. From 2010 onward, the per-year pages subdivide into per-month pages.
  2. The city and state are usually written like this: Florida (Jacksonville). But for the first five months of 2012 they are written like this: Jacksonville, Florida.
  3. The city name, and or the city/state combination, is sometimes written as plain text, and sometimes as a link to the corresponding Wikipedia page.
  4. The city name is sometimes omitted.

The script produces a CSV file. The data prior to 2009 is sparse, so I’ve omitted it. Here’s a yearly summary since 2009:

year  count
----  -----
2009     60
2010     82
2011    157
2012    580
2013    309
2014    472

It looks like this listmaking process didn’t really kick into high gear until 2012. Since then, though, it has produced a lot of data. And for one of those years, 2012, the count of police homicides is 580, versus the Uniform Crime Report’s 426. Each of those 580 incidents cites a source. Here’s one of them I’ve picked randomly:

One person is dead following an officer-involved shooting in Anderson, according to police.

Investigators said they were called to Kings Road about 2:45 a.m. Thursday for a domestic-related incident between a husband and wife.

Coroner Greg Shore said three officers went inside the home, and that’s when a man pointed what appeared to be a gun at officers. At least one officer fired his gun, and the suspect died on the scene, Shore said.

Investigators said the man’s wife, who was inside the home, was taken to the hospital due to injuries from a physical fight with the suspect. Shore said she was doing OK.

Shore said the man officers shot was 47-year-old Paul Leatherwood. Sgt. David Creamer said officers have been called to the same house six times in the past six months for domestic-related issues.

The State Law Enforcement Division is headed to the scene to investigate, which is standard protocol for officer-involved shootings in South Carolina.

One killed in Anderson officer-involved shooting, FoxCarolina.com

We don’t know the race of the shooter or the shootee. We don’t know whether what appeared to be a gun turned out to be a gun. We don’t know whether this was or wasn’t reported as a justifiable homicide. But we could find out.

Every week, a million people listen to the blockbuster podcast Serial, an investigation into a cold case from 1999. A staggering amount of cognitive surplus has been invested in trying to figure out whether Adnan Syed did or did not murder Hae Min Lee. In this blog post, which I picked randomly from the flood of Reddit commentary, a lawyer named Susan Simpson has produced a 14000-word densely-illustrated “comparison of Adnan’s cell phone records to the witness statements provided by Adnan, Jay, Jenn, and Cathy.”

With a fraction of the investigative effort being poured into that one murder, we could know a lot more about many others.