My guest for this week’s Innovators show, Randy Julian, founded the bioinformatics company Indigo BioSystems to help modernize the process of drug discovery. The challenge — and opportunity — is partly to standardize the data formats used to represent experimental data, and to locate that data in shared spaces where it can be linked and recombined.
There’s also the crucial issue of reproducibility. One requirement, as Victoria Stodden said in my conversation with her, is to publish not just data but also the code that processes the data, ideally in an environment where data-transforming computation can be replayed and verified. One of the ways Indigo’s system does that is by hosting instances of R, the wildly popular statistical programming system, in the cloud.
Another key requirement for reproducing an experiment, Randy Julian says, is a robust and machine-readable representation of the design of the experiment. If I don’t know what you’re trying to prove, and how you’re trying to prove it, your data are just numbers to me. If I do know those things, I may be able to verify your results. And we may be able to automate more of the work using machine intelligence and machine labor — a vision that also inspires Jean-Claude Bradley, Cameron Neylon, and others to pursue open-notebook science.