Ensuring data accuracy in research organisations
Recently, it transpired that 20 per cent of academic papers on gene research that are based on data collated in Excel spreadsheets have errors, writes Joe Galletta.
This is an eye-opener. It clearly highlights the extent to which the scientific community is reliant on the Excel spreadsheet for acquisition of data. Also, and indeed worryingly, there appear to be very minimal (if any) controls at academic organisations on monitoring the accuracy and integrity of data residing in Excel.
While this revelation is shocking, the error is easily overlooked given the thousands and thousands of data points being generated from next generation gene sequencing programmes. With the volume of data handled in spreadsheets and other end user computing (EUC) applications (databases, modelling tools), scientific research organisations need to establish best practice controls across the lifecycle of critical spreadsheets to prevent systemic errors from proliferating across the EUC landscape; and into the more advanced graphical and statistical analysis systems that researchers use.
Applying a monitoring and control framework to spreadsheets and other files across the EUC landscape is imperative. It will allow researchers to benefit from the agility and flexibility that EUCs provide, supported by simple safeguards, with the aid of technology, to mitigate the risks posed by Excel.
Visibility fundamental to control
Fundamental to the application of any control and monitoring framework is understanding the EUC landscape. Creating an accurate and up to date inventory of EUCs, including information such as ownership, materiality, dependencies, and lifecycle stage will provide organisations with transparency on the key business processes that are reliant on these types of applications. Taking this one step further, the ability to rank EUCs to determine which of them pose the greatest risk, from a financial, organisational, as well as a reputational standpoint, allows organisations to focus their ongoing monitoring and control on the applications that matter most.
With a complete understanding of the EUC landscape, organisations can then look to embed best practice governance processes so that the EUCs can be suitably managed. Monitoring critical EUCs for any changes that are made to the data, logic, or structure over the course of time and are deemed to be out of the ordinary or simply not acceptable is critical. More importantly, establishing an automated review and/or approval process for such changes will help ensure that financial or reputational damage is not incurred as a result of these changes.
In addition to mitigating risk and increasing operational efficiencies, having a proper control and governance framework in place helps satisfy compliance regulations such as the FDA CFR Part 11, which organisations that collect, process and analyse research information are expected to meet. According to the regulation, organisations must consider the impact that computerised systems might have on the accuracy, reliability, integrity, availability, and authenticity of required records and signatures.
Automation of EUC management processes imperative
Spreadsheets and other EUCs will always play a key role in research activity. Given the large volumes of data involved, manually ensuring data accuracy and integrity is impossible. Adopting automated best practice processes is perhaps the only sure-fire way of EUC management and risk mitigation. This approach entails identifying critical data files based on their materiality and complexity; inventorying; and imposing the proper controls around the development, maintenance and use of these files based on compliance – both internal policy and industry regulations.
It’s noteworthy that this approach ensures that researchers are able to capture and document ‘expert judgement’ – i.e. impose human judgement to alter data sets – so that there is alignment between theoretical calculations and the real world. Automation of spreadsheet and EUC management is worth exploring – aside from ensuring the all-important data accuracy and integrity, it will relieve researchers of the unnecessary, time-consuming and manual data management effort, which is undoubtedly better applied to core research activities.
Joe Galletta is sales manager at ClusterSeven