Open Data Spotlight for researchers
Digital Science recently hosted the first in a series of Open Data Spotlight events. Here, we find out what open data means for researchers. By Nicko Goncharoff, director of publisher relations and head of knowledge discovery
At Digital Science our aim is to help researchers work in the most effective way possible, overcome the myriad challenges they face, and maximise the value of their efforts. As part of our outreach efforts we recently announced our new Spotlight series of community-based events, themed around some of the pain-points researchers experience and the ways they can be addressed.
On 26 February we launched the series with our first event, ‘Open Data For Researchers: the obstacles and the opportunities’, exploring the ‘what’, ‘why’ and ‘how’ of open data, helping them to understand the benefits as well as examining concerns about potential risks.
Several key themes emerged:
- Easy access to data is important to the advancement of science;
- That said, researchers are worried about being scooped if they make their data available before they’ve published in a scholarly journal;
- Data published in data journals or publicly posted is not accorded the impact of a journal article, even though many feel it should be;
- As a result, researchers are concerned they won’t get credit for making their data available or reproducible;
- Even if one does make data reproducible, will anyone do anything with it? How many grants have been awarded to reproduce someone else’s experiment? and
- Some data, such as clinical trial, other medical records, or competitively sensitive outputs, should not be made open.
So while the concept of data sharing and reproducibility were generally seen as good, there was a sense that the effort requires leadership to ensure the widespread adoption needed to fulfil the goals of the open data movement.
Ross Mounce, postdoc from the University of Bath, opened the proceedings with an introductory overview of open data, providing a succinct and clear definition of what it really means.
‘Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness),’ he said. Mounce explained his view that opening up data is about making the most of its potential. It is often the case that the best thing to do with your data will be thought of by someone else. It has also been shown that papers with open data get cited more often.
Andrew Hufton, managing editor at Scientific Data, Nature Publishing Group’s new open-access publication for descriptions of datasets, gave a brief introduction to data journals and what, in his view, they can offer of value to researchers. He presented three principles which he argued should form the basis of a data journal:
- Data must be well described before others can use it and benefit from it;
- Scientists who share data in a reusable manner deserve credit through citable publication; and
- Quality of data is important.
Amye Kenall, journal development manager for open data at BioMed Central, then spoke about GigaScience, BioMed Central’s online open-access open-data journal for very large datasets. Her main focus however was on a new initiative to bring open contributorship badges to science.
She argued that we need to re-imagine the way we value different research outputs and research skills. As things stand the article is still seen as the most valuable output. This must change in order to encourage data sharing and the way to do so is to ensure researchers get credit for sharing data. Kenall explained how badges are used to classify and recognise different skills within Stack Overflow’s online community and argued that academia badly needs a similar scheme.
Michael Markie, associate publisher at F1000 Research, gave a talk titled ‘Getting the Most Out of Research Data’. He explained some ways to help make data usable and reproducible. He stressed the importance of usable, non-proprietary formats, as well as detailed specifications of the methods, software and software parameters needed in order to generate and analyse the data.
In essence, the more information about a dataset the better. Markie concluded his talk by making the argument that the article as we know it needs to change. In his view the article of the future should be designed to fit how research is actually done, not the other way around.
Alan Hyndman, from Figshare, spoke on ‘The Unforeseeable Benefits of Sharing Data’. Alan briefly gave the Figshare backstory, explaining how the company’s founder Mark Hahnel was frustrated at not being able to publish the videos generated as outputs of his research. He wanted to be able to share all of this data, so he created Figshare. Hyndman shared several impressive examples of how researchers shared data that ended up being used in ways they would never have predicted. For example, files containing 3D scans of the world’s largest dinosaur were uploaded to Figshare, and were seen by people all across the world, used with 3D printers and turned into full CGI animations!
Finally we heard from Tom Pollard, PhD student at University College London, who spoke about the needs of the research community, from his perspective, highlighting some of the big challenges around the sharing of data, especially from a medical perspective. He explained how valuable clinical data is often neglected, to the point that it often no longer exists. Even if it is archived, it’s not at all easy to find and reuse, this is a real problem because it’s a barrier to medical progress.