Category Archives: Uncategorized

Online scientific collaboration: the sequel

In 2000 I was commissioned to write a report called Internet Groupware for Scientific Collaboration. That was before modern social media, before blogs even really got going. But arxiv.org was already well-established, and wikis and calendar services and Dave Winer’s proto-blog, Manila, and many kinds of discussion forums were relevant to my theme. On the standards front, RSS, MathML, and SVG were emerging. One of my premonitions, that lightweight and loosely-coupled web services would matter, turned out to be accurate. Another, the notion of a universal canvas for creating and editing words, pictures, data, and computation, remains part of the unevenly distributed future, though projects like IPython Notebook and Federated Wiki rekindle my hope that we’ll get there.

Now I’m writing an update to that report. There’s unfinished business to reconsider, but also much new activity. Scientific collaboration happens in social media and on blogs, obviously. It happens in scientific social media. It happens in and around open access journals. It happens on GitHub where you can find open software and open data projects in many scientific disciplines. It happens on Reddit, on StackExchange-based Q&A sites, on citizen science websites, and in other places I don’t even know about.

I want to interview researchers engaged in various aspects of online scientific collaboration. I’m well connected to some of the tribes I need to reach, but need to cast a wider net. I want to hear, from practitioners in natural sciences, social sciences, and digital humanities, about ways you and your colleagues, in disciplines near and far, do, and/or don’t, collaborate online, both in specific contexts (OA journals, academic social networks) and wider contexts (blogs, mainstream social media). How does your activity in those settings advance your work (or not)? How does it help connect your work to society at large.(or not)?

If you’re somebody who ought to be involved in this project, please do get in touch here or here. And if you know someone who ought to be involved, please pass this along.

Thanks!

Another Internet miracle!

I’m among the many fans of the entertaining physics lectures that made Walter Lewin a star of stage (MIT OpenCourseWare) and screen (YouTube). And I was among those saddened, last month, to learn that charges of harassment had ended his career on the OpenCourseWare stage.

When it severed its ties to Lewin, MIT made the controversial decision to remove his lectures from ocw.mit.edu. Searching for perspective on that decision, I landed on Scott Aronson’s blog where I found much useful discussion. One comment in particular, from Temi Remmen, had the ring of truth:

I agree Walter Lewin’s lectures should be made available through a different source so everyone around the world may enjoy them. Having known him for most of my life, I am not in the least surprised that this happened to him. None of us enjoy his downfall. However, he managed to alienate many of his peers, colleagues and people in his personal life to an extreme. It is my gut feeling, that prominent people at MIT had enough of his antics, in spite of his success as a teacher and brilliance as a scientist. In the scientific community, he is widely known for being very demeaning and insulting to those he does not feel are as intelligent as he is — and for having had numerous problems with women in the past. His online sexual harassment does not appear to warrant this kind of punishment, not even by MIT. This was a long time coming and they got rid of him this way. Emails destroy careers. Sorry to say. I feel sorry for Walter too for lacking the insight to treat others better and that he did this to himself.

That was on December 10th, the day after the news broke. I read the comment thread a few days later, absorbed the discussion, and moved on.

So I was surprised the other night by Conor Friedersdorf’s The Blog Comment That Achieved an Internet Miracle, inspired by that very same comment thread. When I’d last checked in, the Aronson thread ended at about comment #75. The comment to which Friedersdorf refers was #171, posted on December 14.

It would be insane to add many more words to the outpouring that followed the now-infamous Comment #171, both on Aronson’s blog and elsewhere. So instead I’ll just add a couple of pictures.

Contributors by number of comments:

Contributors by number of bytes:

What these charts show is that two people dominate the thread which, by the other night, had grown to over 600 comments. There’s Scott Aronson, the author of the blog, who in the two weeks leading up to Christmas wrote 107 comments adding up to about 30,000 words (assuming an average word length of 5 characters). And there’s Amy, who over those same two weeks wrote 82 comments adding up to about 36,000 words.

I can’t begin to summarize the discussion, so I’ll just agree with Conor Friedersdorf’s assessment:

Aaronson and his interlocutors transformed an obscure, not-particularly-edifying debate into a broad, widely read conversation that encompassed more earnest, productive, revelatory perspectives than I’d have thought possible. The conversation has already captivated a corner of the Internet, but deserves wider attention, both as a model of public discourse and a window into the human experience.

There were many interlocutors, but one in particular stood head and shoulders above the crowd: Amy. How often is she mentioned in three widely-cited blog posts about the Comment 171 affair? Let’s look.

0: http://www.theatlantic.com/politics/archive/2015/01/the-blog-comment-that-achieved-an-internet-miracle/384539/ (Conor Friedersdorf)

0: http://www.newstatesman.com/laurie-penny/on-nerd-entitlement-rebel-alliance-empire (Laurie Penny)

0: http://slatestarcodex.com/2015/01/01/untitled/ (Scott Alexander)

Another Internet miracle!

A network of neighbors

A new acquaintance here in Santa Rosa recommended Nextdoor, a service that describes itself as “the private social network for your neighborhood.” Yet another social network? I know. Back in 2007 Gary McGraw nailed the problem of social network fatigue. “People keep asking me to join the LinkedIn network,” he said, “but I’m already part of a network, it’s called the Internet.”

Nevertheless, I joined. We’re new in town, and I don’t want to let my antipathy to walled gardens get in the way of making useful connections. If you haven’t seen Nextdoor it’s because you haven’t joined it. Nextdoor resembles Facebook in many ways. But it’s only visible after you sign up, and you can only do that by proving residence in a neighborhood.

The signup protocol is intruiguing:

The postcard method seems safest but I didn’t want to wait. The credit card method is immediate but I dislike using my card that way. So I tried the phone method. You’re asked to provide a phone number that’s billed to your residence, then the phone receives a code you use to complete the signup.

How did the site get my service provider, AT&T, to confirm that my phone’s billing address matches the one I was claiming on Nextdoor? Beats me. On reflection that feels as creepy as identifying to a social network with a credit card, maybe creepier. It’s shame that services like Nextdoor can’t yet verify such claims with identity providers that we choose for the purpose — banks for example, or state governments. But I digress.

Once you sign in, Nextdoor begs you in the usual ways to help grow the network. It prompts you to upload contacts who will be targets for email invitations, and offers a $25 Amazon Gift Card if you’ll post an invitation link on Facebook or Twitter. But there are some uniquely local alternatives too. Nextdoor will send postcards to nearby households, and help you make flyers you can post around the neighborhood.

Nextdoor’s map of my neighborhood reports that 51 of 766 households are claimed by registered users. A progress bar shows that’s 7% of the total neighborhood saturation to which it aspires. The map is a patchwork quilt of claimed addresses, shown in green, and ones yet to be assimilated, shown in pink.

The neighborhood directory lists people who live at the claimed addresses, it links to their profiles, and it offers to send direct messages to them. Local chatter appears on the neighborhood wall and is what you’d expect: a filing cabinet is available for $85, a neighborhood watch meeting will be held next month.

This social network is private in an interesting way. The zone of privacy is defined by the neighborhood boundary. You can most easily find and interact with others within that zone. But you’re also made aware of activities in the wider zone of nearby neighborhoods. Maps of those neighborhoods aren’t as detailed. But you can see posts from people in nearby neighborhoods, communicate with them, and discover them by searching for things they’ve said.

Our neighborhood is near Santa Rosa’s downtown. Nextdoor considers fifteen others, comprising much of the downtown core, to be nearby neighborhoods. You can choose to communicate with nearby neighbors or not. If you do, you reveal less about yourself than to immediate neighbors. It’s a clever design that encourages people to explore the boundaries between what’s public and what’s private, to realize how the online world renders such distinctions fluid and relative, and to learn to behave accordingly.

None of this will appeal to everyone, much less to millenials like Carmen DeAmicis who covers social media for Gigaom. But in a recent Gigaom post she explains why she suddenly found Nextdoor compelling:

Twenty-somethings in urban areas by-and-large don’t have kids, their lives don’t revolve around their home and they know their neighbors hardly, if at all. So even though I covered Nextdoor, I never felt compelled to actually become a user.

That changes today. Nextdoor has introduced a new element to its application that makes it a must-use network, even for the disinterested younger generations. It has started partnering with police and fire departments across the country — in 250 cities initially, with more to come — to use Nextdoor to communicate about emergencies and safety issues with local residents.

Given that Nextdoor sings a familiar tune — “We will NOT require members to pay to use Nextdoor and we will not sell users’ private information to other companies” — that’s a plausible business model. But to partner cities it’s yet another channel of communication to keep track of. And to citizens it’s yet another fragment of online identity.

Cities need to engage with people as individuals, members of interest groups, and residents of neighborhoods, in multi-faceted ways that reflect personal preferences, local customs, and generational trends. Nextdoor is interesting and useful, but I would rather see neighborhood social networks arise as organically online as they do on the ground. There isn’t an app for that, but there is a network, or rather there will be. In that network you’ll choose various parties to certify claims about aspects of your identity to various other parties. Those claims will define your affiliations to various geographic and interest groups. Your communication within those groups will flow through channels that you specify. What is that network? We’ll call it the Internet, and we’ll all be neighbors there.

A federated Wikipedia

Writing for the Chronicle of Higher Education in 2012, Timothy Messer-Kruse described his failed efforts to penetrate Wikipedia’s gravitational field. He begins:

For the past 10 years I’ve immersed myself in the details of one of the most famous events in American labor history, the Haymarket riot and trial of 1886. Along the way I’ve written two books and a couple of articles about the episode. In some circles that affords me a presumption of expertise on the subject. Not, however, on Wikipedia.

His tale of woe will be familiar to countless domain experts who thought Wikipedia was the encyclopedia anyone can edit but found otherwise. His research had led to the conclusion that a presumed fact, often repeated in the scholarly literature, was wrong. Saying so triggered a rejection based on Wikipedia’s policy on reliable sources and undue weight. Here was the ensuing exchange:

Explain to me, then, how a ‘minority’ source with facts on its side would ever appear against a wrong ‘majority’ one?” I asked the Wiki-gatekeeper. He responded, “You’re more than welcome to discuss reliable sources here, that’s what the talk page is for. However, you might want to have a quick look at Wikipedia’s civility policy.

(You can relive his adventure by visiting this revision of the article’s talk page and clicking the Next edit link a half-dozen times. You have to dig to find backstories like this one. But to Wikipedia’s credit, they are preserved and can be found.)

Timothy Messer-Kruse’s Wikipedia contributions page summarizes his brief career as a Wikipedia editor. He battled the gatekeepers for a short while, then sensibly retreated. As have others. In The Closed, Unfriendly World of Wikipedia, Internet search expert Danny Sullivan blogged his failed effort to offer some of his expertise. MIT Technology Review contributor Tom Simonite, in The Decine of Wikipedia, calls Wikipedia “a crushing bureaucracy with an often abrasive atmosphere that deters newcomers” and concludes:

Today’s Wikipedia, even with its middling quality and poor representation of the world’s diversity, could be the best encyclopedia we will get.

That would be a sad outcome. It may be avoidable, but only if we take seriously the last of Wikipedia’s Five pillars. “Wikipedia has no firm rules,” that foundational page says, it has “policies and guidelines, but they are not carved in stone.” Here is the policy that most desperately needs to change: Content forking:

A point of view (POV) fork is a content fork deliberately created to avoid neutral point of view guidelines, often to avoid or highlight negative or positive viewpoints or facts. All POV forks are undesirable on Wikipedia, as they avoid consensus building and therefore violate one of our most important policies.

That policy places Wikipedia on the wrong side of history. Not too long ago, we debated whether a distributed version control system (DVCS) could possibly work, and regarded forking an open source project as a catastrophe. Now GitHub is the center of an open source universe in which DVCS-supported forking is one of the gears of progress.

Meanwhile, as we near the 20th anniversary of wiki software, its inventor Ward Cunningham is busily reimagining his creation. I’ve written a lot lately about his new federated wiki, an implementation of the wiki idea that values a chorus of voices. In the federated wiki you fork pages of interest and may edit them. If you do, your changes may or may not be noticed. If they are noticed they may or may not be merged. But they belong to the network graph that grows around the page. They are discoverable.

In Federated Education: New Directions in Digital Collaboration, Mike Caulfield offers this key insight about federated wiki:

Wiki is a relentless consensus engine. That’s useful.

But here’s the thing. You want the consensus engine, eventually. But you don’t want it at first.

How can we ease the relentlessness of Wikipedia’s consensus engine? Here’s a telling comment posted to Timothy Messer-Kruse’s User talk page after his Chronicle essay appeared:

Great article. Next time just go ahead and make all of your changes in one edit, without hesitation. If you are reverted, then make a reasonable educated complaint in the talk page of the article (or simply write another article for the Chronicle, or a blog post). Other people with more, eh, “wikiexperience” will be able to look at your edit, review the changes, and make them stand.

To “write another article for the Chronicle, or a blog post” is, of course, a way of forking the Wikipedia article. So why not encourage that? There aren’t an infinite number of people in the world who have deep knowledge of the Haymarket affair and are inclined to share it. The network graph showing who forked that Wikipedia article, and made substantive contributions, needn’t be overwhelming. Timothy Messer-Kruse’s fork might or might not emerge as authoritative in the judgement of Wikipedia but also of the world. If it did, Wikipedia might or might not choose to merge it. But if the consensus engine is willing to listen for a while to a chorus of voices, it may be able to recruit and retain more of the voices it needs.

Thoughts in motion

In Federated Wiki for teaching and learning basic composition I speculated about a FedWiki plugin that would surface the version history of individual paragraphs within an essay. Over the weekend I prototyped that plugin and you can see it in action here. The paragraph that begins with “The relevance to SFW” is the one that I featured in the original blog post. On the wiki, in a forked copy of the Kate Bowles essay that prompted my inquiry, I’ve injected a plugin that lists paragraphs that have changed at least once since originally written, and that displays the version-to-version changes for each. On that page the plugin shows up in light yellow. If you click the + that precedes “The relevance to SFW” there, you should see the same record of changes I reported in the blog post. It looks like this:

This view improves on the mock-up shown in the original blog post, adding color-coded highlighting of version-to-version differences. I think it’s a striking illustration of how a thought evolves through a sequence of written snapshots. It reminds me of Michael Rubinstein’s astonishing TED talk on a video processing technique that reveals and amplifies normally undetectable color change and motion.

If Kate had been using a conventional writing tool, it would be much harder to observe what we see here. But in FedWiki a paragraph is special. It has its own version history. Every time you open up a paragraph to change it, that change is recorded. To the extent there’s a correspondence between paragraphs and thoughts — and in prose that is often the case — FedWiki intrinsically enables the study of those thoughts in motion.

When computers make visible what was formerly invisible there is always an initial rush of excitement. Then the question becomes: Is this useful? And if so, in what ways?

I can envision two primary uses of this technique. First, for writers. We all compose differently, and not everyone will want to be able to replay the changes to an individual paragraph. But if you do want to, conventional tools aren’t much help. In Google Docs, for example, you can roll through versions of a whole document but it’s not easy to focus on how a single paragraph changes.

A second use, as suggested in the original post, is for teachers of writing and their students. In other domains, such as music, computers have become powerful tools for studying finished compositions. Adrian Holovaty’s Soundslice, for example, makes YouTube performances accessible to study and annotation. In that case there’s nothing hidden, the tool’s job is to slow things down and provide a synchronized framework for annotation. But what if you wanted to study the process of musical composition? Then you’d want the composition to occur in an environment that records changes in chunks of varying granularity that correspond to the structure of the piece.

Because FedWiki naturally divides writing into paragraph chunks, it enables us to see how paragraph-sized thoughts evolve. But a FedWiki page is not only a sequence of paragraphs. Other kinds of objects can be injected into the page by plugin that manage, for example, data sets and small scripts written in domain-specific languages. These things have their own version histories too.

Most knowledge workers, with the notable exception of software developers, don’t yet use tools that take version control for granted. That will change. The artifacts produced by knowledge work are simply too valuable to be managed any other way. As versioning tools evolve for other disciplines, we have the opportunity to rethink what those tools can do. I hope we’ll do that with sensitivity to the natural granularity of the material. In many cases, whole documents aren’t the right chunks.

Even in software development, of course, we are still working with document-sized chunks. Compilers know that programs are made of modules and functions, but editing tools don’t track changes that way and as a result GitHub can’t directly show us the history of an individual function. That would be useful for the same reasons I’ve mentioned. It would help programmers reflect on their own work, and enable teachers to show students more about how code is written and how it evolves.

Tools for structured writing aren’t a new idea, of course. There are reasons why they haven’t caught on. But FedWiki reminds us that there are also reasons to hope they will.

How Federated Wiki neighborhoods grow and change

Federated Wiki sites form neighborhoods that change dynamically as you navigate FedWiki space. Sites that are within your current neighborhood are special in two ways: you can link to them by names alone (versus full URLs), and you can search them.

Here’s one neighborhood I can join.

A row of flags (icons) in the bottom right corner of the screen (1) indicates that there are five sites in this neighborhood: my own and four others. The number next to the search box in the bottom middle (2) says that 772 pages can be searched. That number is the sum of all the pages in the neighborhood.

From each site in the neighborhood, FedWiki retrieves a summary called the sitemap. It is a list of all the pages on the site. Each item in the list has the page’s title, date, and complete first paragraph (which might be very short or very long). FedWiki’s built-in search uses sitemaps which means that it only sees the titles and first paragraphs of the pages in your neighborhood.

Here are the sites in this neighborhood:

  1. forage.ward.fed.wiki.org
  2. jon.sf.fedwikihappening.net
  3. sites.fed.wiki.org
  4. video.fed.wiki.org
  5. ward.fed.wiki.org

You can find these names by hovering over the row of flags. If you are technical you might also want to observe them in a JavaScript debugger. In this picture, I used Control-J in Chrome to launch the debugger, then clicked into the Console tab, then typed the name of the JavaScript variable that represents the neighborhood: wiki.neighborhood.

Why are these five sites in my neighborhood? It’s obvious that my own site, jon.sf.fedwikihappening.net, belongs. And since I’ve navigated to a page on forage.ward.fed.wiki.org, it’s not suprising to find that site in my neighborhood too. But what about the other three? Why are they included?

The answer is that Ward’s page includes references to sites.fed.wiki.org, video.fed.wiki.org, and ward.fed.wiki.org. A FedWiki reference looks like a paragraph, but its blue tint signals that it’s special. Unlike a normal paragraph, which you inject into the page using the HTML or Markdown plugin, a reference is injected using the Reference plugin. It’s a dynamic element that displays the flag, the page name, and synopsis (first paragraph) of the referenced page. It also adds that page’s origin site to the neighborhood.

Two of the five sites in this example neighborhood — jon.sf.fedwikihappening.net and forage.ward.fed.wiki.org — got there directly by way of navigation. The other three got there indirectly by way of references.

To add a reference to one of your own pages, you click the + symbol to add a factory, drag the flag (or favicon) of a remote FedWiki page, and drop it onto the factory.

To illustrate, I’ll start with a scratch page that has a factory ready to accept a drop.

In a second browser tab, I’ll navigate to forage.ward.fed.wiki.org’s Ward Cunningham page, the one with the three references we saw above. Then I’ll drag that page’s favicon into the first browser tab and drop it onto the factory. Dragging between browser tabs may be unfamiliar to you. It was to me as well, actually. But it’s a thing.

The setup in this example is:

Tab 1: http://jon.sf.fedwikihappening.net/view/welcome-visitors/view/scratch

Tab 2: http://forage.ward.fed.wiki.org/view/ward-cunningham

Here is the result:

How many sites are in this neighborhood? When I did this experiment, I predicted either 2 or 5. It would be 2 if the neighborhood included only my site and the origin of the referenced page. It would be 5 if FedWiki included, in addition, sites referenced on the referenced page. Things aren’t transitive in that way, it turns out, so the answer is 2.

Except that it isn’t. It’s 3! Look at the row of flags in the bottom right corner. There are three of them: jon.sf.fedwikihappening.net, forage.ward.fed.wiki.org, and mysteriously, fedwikihappening.rodwell.me. That’s Paul Rodwell’s site. How did he get into this neighborhood?

This closeup of the journal will help explain the mystery. The page was forked 5 days ago.

We can view the source of the page to find out more.

And here’s the answer. Early in the life of my scratch page I forked Paul Rodwell’s scratch page from fedwikihappening.rodwell.me.

So we’ve now discovered a third way to grow your neighborhood. First by navigating to remote pages directly. Second by including references to remote pages. And third by forking remote pages.

FedWiki for collaborative analysis of data

A FedWiki page presents one or more wiki pages side by side. This arrangement is called the lineup. During interactive use of FedWiki the lineup grows rightward as you navigate the federation. But you can also compose a lineup by forming an URL that describes a purposeful arrangement of wiki pages. In Federated Wiki for teaching and learning basic composition I composed two lineups. The first compares two versions of a page on Kate Bowles’ FedWiki site. The second compares two versions of that page from two different sites: mine and Kate’s. With these two lineups I’m exploring the notion that FedWiki could be a writers’ studio in which students watch their own paragraphs evolve, and also overlay suggestions from teachers (or other students).

In that example the order of wiki pages in the lineup isn’t important. You can compare versions left-to-right or right-to-left. But here’s another example where left-to-right sequence matters:

Link: Favorite Broccoli Recipes

URL: http://jon.sf.fedwikihappening.net/view/italian-broccoli/view/broccoli-fried-with-sesame-and-raspberry/view/favorite-broccoli-recipes

Rendering:

The tables shown in these wiki pages are made by a data plugin that accumulates facts and performs calculations. FedWiki has explored a number of these data plugins. This one implements the little language that you can see in these views of the text that lives in those embedded plugins:

On the Italian Broccoli page:

5 (calories) per (garlic clove)
200 (calories) per (bunch of broccoli)
SUM Italian Broccoli (calories)

On the Broccoli Fried With Sesame and Raspberry page:

100 (calories) per (tbsp sesame seed oil)
34 (calories) per (100 grams broccoli)

And:

3 (tbsp sesame seed oil)
SUM (calories)
1 (100 grams broccoli)
SUM Broccoli Fried With Sesame Oil (calories)

On the Favorite Broccoli Recipes page:

Italian Broccoli (calories)

And:

Broccoli Fried With Sesame Oil (calories)

Other plugins implement variations on this little language, and it’s surprisingly easy to create new ones. What I’m especially drawing attention to here, though, is that the lineup of wiki pages forms a left-to-right pipeline. Facts and calculations flow not only downward within a wiki page, but also rightward through a pipeline of wiki pages.

And that pipeline, as we’ve seen, can be composed of pages from one site, or of pages drawn from several sites. I could provide one set of facts, you could provide an alternative set of facts, anyone could build a pipeline that evaluates both. It’s a beautiful way to enable the collaborative production and analysis of data.