When chatter in the mainstream media and in the blogosphere intersects with scientific discourse, I’m always interested in the ways that citations do, or don’t, cross the border between those domains. In 2006, for example, while checking references for a podcast with Steve Burbeck about multicellular computing, I traced a meme about how we humans are really a hybrid of human and bacterial cells. The mainstream vector was a New York Times magazine story on obesity. It got to the blogosophere by way of a Wired News story. But the original Nature Biotechnology article mentioned in the Wired story was linked nowhere that I could find.

A comment from Gordon Mohr on yesterday’s item about Many Eyes prompted a similar analysis. Gordon asks:

…do the Many Eyes founders consider the statistical paradox that when testing large numbers of hypotheses, *most* recognized ’statistically significant’ results may in fact be false?

A good discussion of the issue is here:

http://www.marginalrevolution.com/marginalrevolution/2005/09/why_most_publis.html

To answer Gordon’s question, I don’t know, it didn’t come up in our conversation. But lets look at the conversation surrounding the PloS Medicine article cited in the blog entry to which Gordon points.

The blog entry itself was widely noticed, it has 31 del.icio.us bookmarks. What about the PloS Medicine article cited in this popular blog entry? It has only 6 del.icio.us bookmarks.

This is the URL cited by the marginalrevolution blog:

http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=16060722

It’s not the most canonical form of the article’s URL. A more canonical form would be the base PubMed record:

http://www.ncbi.nlm.nih.gov/pubmed/16060722

That URL has 0 del.icio.us citations. However, now we cross over into the realm of scientific discourse. When you visit that PubMed URL, you’ll discover citations in the PubMed domain:

Comment in:
PLoS Med. 2005 Aug;2(8):e272.
PLoS Med. 2005 Nov;2(11):e361.
PLoS Med. 2005 Nov;2(11):e386; author reply e398.
PLoS Med. 2005 Nov;2(11):e395.
PLoS Med. 2007 Apr;4(4):e168.

There’s another canonical form for the PloS Medicine article, by the way. It has a Digital Oject Identifier (DOI):

http://dx.doi.org/10.1371/journal.pmed.0020124

Interestingly, there is 1 del.icio.us citation for that DOI.

So, what did the PloS Medicine folks have to say about the claim in the cited August 2005 PloS Medicine article? Here’s an April 2007 reaction:

The mathematical proof offered for this in the PLoS Medicine paper shows merely that the more studies published on any subject, the higher the absolute number of false positive (and false negative) studies. It does not show what the papers’ graphs and text claim, viz, that the number of false claims will be a higher proportion of the total number of studies published (i.e., that the positive predictive value of each study decreases with increasing number of studies).

I’m not interested here in the claim and counterclaim. I’m interested in the process of discourse, in citation as the engine of that discourse, in the role that canonical identifiers play in citation, and in the disconnect between scientific and mainstream discourse.

It’s all happening on the web, but it’s happening in isolated ghettoes with few points of actual contact. How could we bring those worlds into closer contact?

Here’s one approach that could help. When the citation engines in the blogosphere find references in blog entries to scientific articles on the web, they could resolve those to their most canonical forms: DOIs, PubMed records. And they could make equivalences among those forms. That way, conversation in the blogosophere about a scientific article, and scientific conversation about the same article, would tend to hang together and would be discoverable in the same contexts.

Why does this matter? Well, the marginalrevolution blog is influential, widely cited in the blogosphere. The entry that cited the PLoS Medicine article was itself widely cited. But the PLoS Medicine reaction to the article is not part of the blog conversation. I had to work really hard to find it, and to include it here.

The conversation-tracking tools used by bloggers should discover scientific discourse related to a scientific article as easily as they discover blog discourse. Conversely, the conversation-tracking tools used by scientists should discover blog discourse as readily as scientific discourse. Public understanding of science would improve, and so would scientific understanding of the public.