Peer-review debate should include software
Richard Padley argues the case for scrutinising software submitted with scholarly articles within the peer-review process
It’s not often that coding errors make the news. But back in April one particular slip-up with formulae on an Excel spreadsheet caused worldwide repercussions. It emerged that Harvard economists Carmen Reinhart and Kenneth Rogoff had made mistakes in the data underlying their influential 2010 paper, ‘Growth in a Time of Debt’ that appeared to undermine the paper’s main contention: that countries with debt-to-GDP ratios above 90 per cent see markedly slower growth.
The data was not published with the paper, and only showed up when 28-year-old PhD Student Thomas Herndon requested the spreadsheets directly from the authors. After a thorough debug, Herndon subsequently rubbished their results as lead author of his own paper, ‘Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff’.
This might merely have been embarrassing for Reinhart and Rogoff, were it not for that fact that that their paper had been widely cited as underpinning the need for austerity measures by politicians across the developed world – including US congressman Paul Ryan, UK chancellor George Osborne, and European Commission vice-president Olli Rehn. If the researchers had got their sums wrong, then this was very big news indeed.
At a time when scholarly publishing is debating the issue of data being published alongside papers, this makes an interesting test case. Reinhart and Rogoff’s errors could not have been detected by reading the journal article alone, so proper scrutiny in this case ought to have included the dataset.
But I would argue that the terms of the debate should go beyond data: we ought also to be thinking about software. In my view, the Reinhart and Rogoff story makes this clear.
Reproducibility is one of the main principles of the scientific method. Initially, Herndon and his Amherst colleagues found that they were unable to replicate Reinhart and Rogoff’s results. This was what caused them to request the underlying data, resulting in their subsequent discovery of errors.
Three areas gave concern about the methodology employed in arriving at the paper’s conclusions, but the most highly-publicised flaw was the Excel coding error – a major part of the lack of reproducibility here. As Mike Konczal showed in the blog post that broke the story, the selection box for one of the formulas in the spreadsheet doesn’t go to the bottom of the column, and this misses some crucial data, skewing the final result.
As the FT was at pains to point out, it might be over-egging things slightly to claim that thousands of people across Europe were thrown out of work simply because two Harvard professors can’t work an Excel spreadsheet. Reinhart and Rogoff have since responded, although critics also found errors in their response.
Interestingly, from the reproducibility point of view, what was flawed in Reinhart and Rogoff’s methodology was not the base data they drew on, but their use of the software tool Excel to explore and analyse that data. Methodological flaws were uncovered by Herndon in examining their use of the tool. Without the use of the same software that Reinhart and Rogoff had employed in reaching their conclusions, Herndon, Ash and Pollin would likely never have got to the bottom of why those results were not reproducible using the same data set.
It follows that not only the data ought to be open to the scrutiny of peer review in an instance like this, but also the software. Since everything that involves data nowadays involves software to some degree, the software becomes a central artifact in the presentation of scholarly results.
In our increasingly computer-centric working environment, without the software used to analyse, explore, model, visualise and in other ways draw inferences from base data, we are missing an important part of the picture.
Excel is a universally familiar piece of software (albeit a relatively unsophisticated one) but other more specialised tools such as MATLAB are routinely used within the scientific community and by researchers in disciplines as diverse as engineering, economics and social science to perform operations on data that result in published science.
MATLAB goes beyond Excel in that its output might not be just a set of numbers (such as Reinhart and Rogoff’s 90 per cent), but an algorithm. Frequently the output, the conclusion, the result of a given piece of research in certain academic disciplines is an algorithm.
If you want to look under the hood of that algorithm, for the purposes of peer-review scrutiny or reproducibility, you might well need to access the software that produced it. And you might also want to see the algorithm in action.
There are many ways to represent algorithms – including as formulae, flowcharts and in natural language. However, arguably the best way is by using the programming languages that were written specifically for the purpose and, of course, programming languages were created not just to represent algorithms, but to actualise them. It is logical therefore that MATLAB produces not only algorithms but also executables – i.e. software.
For a good example of an academic field where the output of research is software, just look at IPOL, the research journal of image processing and image analysis. Each of the articles in this online journal has a text description of an algorithm and source code, but also an ‘online demonstration facility’ that allows you to play with the executable in real time. Both text and source code are peer-reviewed.
In launching the journal GigaScience last year, its editors spoke of ‘overseeing the transition from papers to executable research objects’. This represents a view of academic publishing that embraces the reality of an increasingly ‘born digital’ research process.
Clearly, not every item of published research needs to include a piece of software. But if we restrict our vision of scholarly publishing to just articles and data we risk ignoring the other digital bits and pieces that now rightfully belong in the scholarly record – and without which it cannot properly be understood and scrutinised.
It’s clear that the classic functions of scholarly publication, namely registration, certification, awareness, archiving and reward will all have to apply to data and software as much as they do for textual works at the moment.
There is much for publishers to think about here, and I’m aware that I’m raising questions that do not all presently have answers. Indeed, it was encouraging to see some of these themes addressed at the recent Beyond The PDF 2 conference, where projects such as Reprozip were presented as ways of addressing the reproducibility issue. I believe, however, that our view of the scholarly record must be broadened to include software as well as data, as this forms a significant part of research practice today. In this context the practice of publishing plain PDF files as a print analogue will become increasingly antiquated.
Richard Padley is managing director of Semantico. He thanks to Dave De Roure for his paper raising these issues in a panel discussion chaired by Padley at the APE 2013 conference