The NSF’s DataNet initiative

The NSF is soliciting proposals for a “sustainable digital data preservation and access” network. According to Chris Greer, the NSF will invest 100 million dollars over 5 years in a federation of five organizations that will together create a DataNet that will:

  • provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
  • continuously anticipate and adapt to changes in technologies and in user needs and expectations;
  • engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and
  • serve as component elements of an interoperable data preservation and access network.

The scope includes “text, numbers, software, images, video and audio streams, sensor streams — the full range of digital artifacts.”

Because sustainability is the goal, Chris says, this effort does not aim to support domain-specific repositories — for example, in the realm of astronomical data. These efforts haven’t worked out so well, he says. They’ve tended to create long-term dependency on continued NSF funding, and have failed to produce the kinds of network effects that enabled the Internet itself to transcend its original life support system.

My $0.02 is that a sustainable model for digital archiving will be an ecosystem of hosted lifebits services. Both individuals and institutions produce streams of lifebits. Sometimes those streams run separately, sometimes they run together. If I value my personal output highly enough to park it in a personal archive whose access, integrity, and long-term availability guarantees meet the requirements of my institution, then the institution might not need to be responsible for archiving my stuff. Instead it can just syndicate it. Alternatively, if my personal archive doesn’t meet the institution’s requirements, it can choose to host rather than syndicate those of my bits it cares about.

These options aren’t mutually exclusive. We can, and often will, wind up replicating as well as syndicating. But there’s going to be a ton of virtual capacity controlled by individuals — for example, in their open-notebook blogs. Those blogs today don’t provide the sort of virtual capacity that meets institutional requirements. But they can, and they should, and if they did it’d be a service that individuals would pay for.

Global Research Library 2020

I’m attending GRL2020, where a high-powered group of folks who care about the future of libraries, and in particular, research libraries, have come together to discuss opportunities, risks, and impediments.

The opportunities are abundantly clear to me, but what about risks? The only risk I can think of is maintaining status quo. For example, the other day I published a screencast and blog writeup about a new IronPython-based spreadsheet called Resolver. It got Slashdotted and attracted more than the usual amount of commentary. Several folks noted, very helpfully, that the notion of a spreadsheet that’s intimately connected to an object-oriented programming environment is not new, and they pointed to various antecedents.

One commenter, John Lopez, wrote:

I see this about once a month: an announcement of something so new that it couldn’t possibly have been done before, yet when I ask if they have done a literature search they look at me like I am speaking in an alien language. Organizations like the ACM and IEEE have a vast troves of information and knowledge, yet membership continues to decline in the traditional professional societies in favor of vendor specific groups (that lack an *interest* in developing institutional knowledge because that doesn’t sell new products).

Much effort is lost duplicating the past.

It’s a fair point, but when I followed his links I landed here:

Full-Text is a controlled feature.

To access this feature:

* Please login with your ACM Web Account.
* Please review the requirements below.

Now this isn’t just a question of open access. Setting aside the question of whether or to what extent peer-reviewed literature is made freely available, there’s a vast new literature that never existed. We create that literature as we narrate the work that we do, and we create it in an environment that makes it naturally discoverable, linkable, and capable of influencing minds across space and time.

Switching from computer science to biology, here’s a nice example of that sort of narration that I found the other day:

Michael Barton is a PhD student in Bioinformatics at the University of Manchester.

This is blog about my research on gene expression in yeast, and an experiment in open notebook science.

The only real risk I can see is that we’ll fail to establish the equivalent of open notebook science in every professional domain. If we succeed in establishing that norm, though, the future for libraries — and librarians — will be very bright indeed.