Software Forethought

Will Schroeder | Kitware Blog | November 3, 2011

Here's an all too common scenario. A bunch of really smart scientists and medical researchers get together. They envision a research program of unprecedented scale. They obtain funding, tens to hundreds of millions of dollars, from academic, commercial, non-profit enterprises, and from investors and philanthropists. The plans are drawn up, brick and mortar is under design and soon to be built! And at some point the data starts flooding in.....

Oops there's a problem, what to do about the data, how do we maintain provenance, analyze, visualize, and share it? How are we going build reproducible software systems and translate our research to application? Well let's just hire some software folks, they'll take care of it, we'll provide computing budgets for the scientists with which to buy software, we've budgeted a cluster (what more do you want after all?), we might even hire a computer scientist or two! In the meantime some of the more tech savvy scientists (never formally trained in software process) will start writing some code. And so it goes, we muddle along.

Just another case of software as an afterthought. I've seen this scenario in action much too often across groups ranging in scale from small research teams to large research institutions. To be blunt (and gentler then many deserve) the results are predictably poor: primitive computing and visualization capabilities, lost data, fractured and incompatible workflows, non-existent software processes, inefficient collaboration methods, and poor science. It makes you want to cry for the waste of talent and resources, not to mention the missed scientific discoveries and the possibility of better health care outcomes.

I think it's time to make software considerations a forethought, a fundamental driver of the scientific process. I'm convinced doing so will unleash a torrent of innovation. For a long time the scientific process, consisting of theory, experiment and computing (and maybe a data-intensive scientific discovery if you ascribe to the fourth paradigm) has been driven by the experimentalists and theorists, and computing has gone along for the ride. But more and more science is computationally driven, yet I don't think we've reflected this in our thinking, and more importantly in the way we do science. It seems pretty clear to me: it's time to place software and computing front and center in the practice of science, and then work with the theorists and experimentalists to solve their problems using the full potential of computing technology...