Infrastructure supports access to data
The amount of research data and the possibilities of using it in new ways are growing enormously. Jeremy Sharp of JANET(UK) reveals what today’s researchers require from their infrastructure
As the amount of research data increases, so there is further pressure on researchers to be able to access that information in new ways to facilitate further research. At the UK’s research and education network, JANET (UK), we have to make sure that the infrastructure stays ahead of what researchers need.
Research Council policy on data retention means that organisations may be asked to provide the raw data for research papers when they are under review and up to 10 years after publication, so it’s crucial that the network offers safe storage and access for that work. Bioinformatic data for example may need to be time-stamped, either when competing research groups are establishing precedence of who made a discovery first, or when the sequence data is an innovation that forms part of a product that may be patented, for example a gene sequence of a low-temperature enzyme for a bio washing powder.
In addition, people don’t want to be tied to a single place to check facts on the web, order a library book or access data they’ve harvested in the field, and they may well want to use personal portable devices to collect and share that data. For this reason we are developing anytime, anyplace access to the network.
Whether monitoring whales when they surface, tracking plant growth, or measuring permafrost depths in the Alps, there are as many applications as there are researchers gathering information from these difficult or geographically challenging environments. Scientists are no longer tied to a notepad and pen and they don’t have to go out into remote locations and plug in their laptops. For example, when a disabled student needed to participate in data collection in the field for an Open University geology course, the difficult terrain was overcome by the ‘Enabling Remote Access’ (ERA) project. This project worked with JANET to provide a wireless link between the student at base camp to the hands and eyes of an enabler at the rock face who gathered data and samples under her instruction. The base vehicle was then connected by 3G back to resources on JANET for storage and analysis. The new JANET 3G service, available in the UK from June 2011, complements an organisation's existing eduroam infrastructure, adding mobile broadband data capabilities.
Another trend is that organisations are increasingly using grid computing to help people access information and resources that are too specialised or expensive for a single institution to hold or maintain. In science disciplines such as particle physics and radio astronomy, networks have long been vital in bringing together distributed resources such as computing power, data storage and experimental facilities for researchers. Other areas such as the life sciences and arts and humanities are increasingly using grids. Some disciplines, like natural environment research and bio-informatics also have very large data repositories and are now developing capabilities for more intensive use of networks to move the data.
Increased network capacity has enabled researchers to distribute data quicker and more efficiently. Scientists from Glasgow working on data from the Large Hadron Collider at CERN, need high speed access to the data from Switzerland via the UK GridPP compute facility. They use JANET lightpaths to transmit data between CERN and Rutherford Appleton Laboratories, which is then distributed to UK Universities across JANET for analysis. The data from the experiments conducted at CERN, for example, is being transferred at sustained rates of up to 10Gbit/s.
The 'big science' of today creates a requirement for quicker and more efficient data distribution, but researchers are increasingly also asking JANET to provide dedicated circuits to meet their specific requirements for capacity and network characteristics. These JANET Lightpaths not only protect the privacy of the research data being transferred, but also protect the general teaching, learning and research traffic on JANET from any custom characteristics or unusual traffic types on the dedicated circuit. This requirement to flexibly partition the network for specific research demands an adaptive architecture, which in turns delivers benefits to the general user. For example, it has enabled the deployment of world-leading network capacity without disrupting the service: JANET is the first educational network in the world to incorporate 100Gb/s links - fast enough to transfer 125 doctoral theses per second if we assume that one thesis is 100Mb.
Accessing content outside their own institution remains a key requirement for researchers whether they are seeking the latest results in their field, or moving large amounts of data for processing and analysis. Research infrastructures are also evolving, with new facilities and data repositories becoming available, and all interconnected via computer networks with increasing scope and capacity. With more pervasive wireless technologies, researchers can access this ensemble of infrastructures more readily than in the past as they move about within their organisation, or travel elsewhere to meet colleagues. The UK is very well connected to the rest of the world via high-capacity network links into Europe through GEANT, and onwards worldwide, so there is now unprecedented potential for national and international collaboration with truly global access to partners, data and resources.
Jeremy Sharp is head of strategic technologies at JANET(UK)