Lack of FAIR data ‘slows innovation’
Ted Slater and James Malone
Siloed research data has a wide-reaching impact, write Ted Slater and James Malone
As the volume of data and the number of use cases continue to grow exponentially in life sciences, the lack of reusability needs to be addressed urgently.
In May 2018, the EU published a report estimating that not having FAIR (Findable, Accessible, Interoperable and Reusable) research data costs the European economy at least €10.2bn every year. Furthermore, by drawing a rough parallel with the European open data economy, it concluded that the downstream inefficiencies arising from not implementing FAIR could account for a further €16bn in losses annually. Similarly, in the US, according to recent Gartner research, the average financial impact of poor data quality on organisations is significant.
As emerging technologies come into play and aspects of AI – such as machine learning (ML) – become more widely adopted, data-rich workflows are being unlocked. This allows researchers to take large sets of historic data and apply them to solve more problems and ask new questions – again leading to new data uses and requiring fresh data management techniques.
This has offered the possibility of a shift in R&D around data-sharing – from individual scientists’ data only being used by a specific person or team for one purpose, to data being used by the entire company and even the wider industry, to advance innovation. However, the industry is still playing catch-up to make this data sharing a reality.
Yet, Covid-19 has shown the urgency of addressing these problems and has provided a wake-up call for many organisations. To tackle this current pandemic, and indeed to respond quickly to any such event, scientists need to be able to access the right data in a useable form as quickly as possible. These data may need to be shared with other life science and biotech companies, and potentially integrate with large-scale real world evidence (for example, data from self-reporting mobile apps like ZOE in the UK). FAIR data is essential for us to bring global solutions to this public health crisis, as well as the others that are sure to come in the future.
Emerging tech has revolutionised how we use data
In recent years, the life sciences industry has suffered an unignorable decline in innovation efficiency, but AI has the power to change this. Drug developers are looking to bring together everything that is known about a problem, to build a more accurate and nuanced picture of patients, diseases and medicines. As such, we need new ways to capture and manage these varied data. FAIRification is one such way.
Employing data to build more realistic, multi-dimensional analyses will help researchers better understand diseases and assess how chemical entities behave in biological systems. Data that are structured in line with FAIR principles, and so are interoperable and reusable, will make this approach possible. It’s also an approach that promises to slash drug development times and vastly reduce late-stage failures.
To build such in-depth patient and product profiles, life sciences companies need access to greater volumes of data external to their organisation, including:
• Public domain sources (such as PubMed, ClinicalTrials.gov, FDA);
• Commercial intelligence (such as Sitetrove, Pharmaprojects, Pharmapremia);
• Data provided by contract research organisations (CROs); and
• Real-world evidence (such as electronic health records (EHR), patient self-reporting).
Since 2016, the FAIR Data Principles have been adopted by the European Union (EU), together with a growing number of pharmaceutical companies, research organisations and universities. To accelerate innovation and productivity, more organisations and public bodies will need to follow in their footsteps.
Common challenges to FAIR implementation
While the ideas behind the FAIR principles have been around for some time, implementation in the life sciences industry has been slow, because the path to adoption is neither finite nor predetermined. FAIRification is the long-term overhaul of how data are created and used in an organisation, and this process is continuously influenced by an ever-changing knowledge landscape.
When organisations begin the FAIRification journey, they face some common challenges, including:
• Unstructured legacy data – often data are not tagged, contain haphazard names or identifiers and lack common terminology;
• Data silos and trapped historical data – technologies used in previous research are likely obsolete or no longer supported; often personnel responsible for creating original datasets have moved on, leading to data becoming inaccessible or uninterpretable;
• Scientific complexity – machine-readable representations of biological information can quickly become extremely complex;
• Ontology management – there are multiple competing ontologies and vocabularies, often even in a single organisation, with little standardisation across the industry; and
• Cultural barriers – changing the culture of an organisation can be one of the most challenging tasks; researchers and organisations are typically very protective of even non-proprietary data. Incentivising all parties to do their parts in generating high-quality FAIR data will require valuing efforts to that end, as much as the marketable output of a drug development pipeline.
Path to implementation
FAIRification does not happen immediately and comprehensively. Making data FAIR is an evolving and progressive process. This is especially true for the pharmaceutical industry, where data production is continuous and new knowledge continuously reshapes the information landscape for research questions.
However complex FAIRification may seem, it is critical to start the process and allow for an agile, test-and-learn adoption. Helpfully, companies do not need to go it alone. There is a large and growing network of organisations offering assistance, expertise and tools to help FAIRify data. This includes The Pistoia Alliance, a non-profit group advocating for better data sharing in life sciences, which offers a free FAIR toolkit for implementation.
The FAIR movement is not the first attempt at merging data from disparate sources. But it is gathering pace at a time when computer infrastructure, knowledge engineering and data generation are finally where they need to be for firms to transition to powerful analytics, enhanced by a semantic and more comprehensive representation of knowledge.
One final key element to remember is that the outcomes from implementing the FAIR Principles will be different for each organisation. Some parts of FAIR may be more important to one group than another, for example ‘findable’ might be your priority to begin with, but even small steps will help improve data quality and management, to aid future innovation.
Ted Slater is senior director, product management for PaaS at Elsevier; James Malone is CTO at SciBite