The NSF’s DataNet initiative

The NSF is soliciting proposals for a “sustainable digital data preservation and access” network. According to Chris Greer, the NSF will invest 100 million dollars over 5 years in a federation of five organizations that will together create a DataNet that will:

  • provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
  • continuously anticipate and adapt to changes in technologies and in user needs and expectations;
  • engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and
  • serve as component elements of an interoperable data preservation and access network.

The scope includes “text, numbers, software, images, video and audio streams, sensor streams — the full range of digital artifacts.”

Because sustainability is the goal, Chris says, this effort does not aim to support domain-specific repositories — for example, in the realm of astronomical data. These efforts haven’t worked out so well, he says. They’ve tended to create long-term dependency on continued NSF funding, and have failed to produce the kinds of network effects that enabled the Internet itself to transcend its original life support system.

My $0.02 is that a sustainable model for digital archiving will be an ecosystem of hosted lifebits services. Both individuals and institutions produce streams of lifebits. Sometimes those streams run separately, sometimes they run together. If I value my personal output highly enough to park it in a personal archive whose access, integrity, and long-term availability guarantees meet the requirements of my institution, then the institution might not need to be responsible for archiving my stuff. Instead it can just syndicate it. Alternatively, if my personal archive doesn’t meet the institution’s requirements, it can choose to host rather than syndicate those of my bits it cares about.

These options aren’t mutually exclusive. We can, and often will, wind up replicating as well as syndicating. But there’s going to be a ton of virtual capacity controlled by individuals — for example, in their open-notebook blogs. Those blogs today don’t provide the sort of virtual capacity that meets institutional requirements. But they can, and they should, and if they did it’d be a service that individuals would pay for.

2 thoughts on “The NSF’s DataNet initiative

  1. NSF needs to drive across town and visit the US federal archives ERA (electronic records archive) project staff. They have already spent the 100 million plus ten years of research on the issue.

Leave a Reply