Can big data benefit society?
What opportunities does big data offer the social sciences? What challenges does it entail? Does it make us better citizens? With larger data sets offering researchers the potential to look at more subtle interactions, big data is becoming increasingly valuable to social sciences, yet challenges remain. As part of UK funding body ERSC’s festival of social science, SAGE and the British Academy held a thought-provoking panel with funders, researchers, civil servants and the media, to explore these questions in depth.
The panel’s chair, columnist Polly Toynbee, commented that she as an individual is herself a consumer of big data. In fact it turns out that most of us are. That, said panellist Paul Woobey of the Office of National Statistics (ONS), is the main reason why we now hear ‘big data’ so often in the public arena. Yet he argued that it is an illusion to see big data as a new phenomenon. In specific sectors – chemistry and computer technology– big data has long been commonplace. What we have seen in the public domain is both an evolution of what we consider to be ‘big’, and the internet revolution, which has brought big data very much into the public domain.
Although, as Harvey Goldstein, professor of social statistics at the University of Bristol, UK commented, the social sciences are not yet in the realms of big data on a scale with, say, the Higgs Boson data, in the future, datasets are going to be very much bigger. The internet has made more data available – anyone can use it, and it is even government policy to promote this.
As a live example, Farida Vis, research fellow in the social sciences at the University of Sheffield, UK commented on a weekend project she had undertaken in the aftermath of Hurricane Sandy. She and a colleague considered photos posted to Flickr to analyse how much of that content was real, and how much was fake. Compared with the more lengthy periods that academic research can often take, this sort of research can be done quickly and cheaply. There is real on-the-ground application in evidence here that has the potential to be used for public benefit. Farida commented that is has particular relevance for crisis communication and journalism.
Panellists were quick to ensure the discussion did not solely focus on internet data, however. Paul Boyle, chief executive of the ESRC, commented that although big data is often associated with the internet, the realities are broader. The ESRC has a huge infrastructure, underpinned by datasets from a variety of sources such as longitudinal research studies.
Such big data availability brings with it a range of benefits for the social sciences. As Harvey Goldstein commented, there is the potential for more rational decision making, as there is more input from a wider range of sources. Additionally, bigger data sets allow researchers the chance to look at more subtle interactions. Quoting Twitter’s founder, Farida Vis supported this by stating that it enables the research to “shrink the world” and look at micro interactions.
Big data, however, poses a number of challenges. One of the most pressing is ‘what can we do with it?’ If big data offers a number of opportunities, then in order to make the most of those opportunities we need to understand how to analyse it.
Harvey Goldstein said that all professions need to engage with big data. He commented on GetStats, a campaign to ensure society has a better understanding of numbers. It is the campaign’s goal to ensure that data is not misused or misinterpreted. In particular, he highlighted the fact that if misused, big data has the potential to facilitate commercial or state control over citizens. In addition, he noted, that in order to help address misinterpretation, the media need to be trained to better understand how to interpret data, and to talk sensibly about these issues.
Another key issue was ‘what should we measure?’ Emphasising the importance of theoretical grounding, Paul Boyle stated that we must engage with data in the right way and for the right reasons. Paul Woobey extended this point by stating that we should not always take data at face value: what you can infer is not necessarily obvious from the data itself.
Big data brings a whole range of ethical issues, and issues of interpretation. There are genuine concerns about privacy and trust: for example ensuring that data is made anonymous. How can we demonstrate validity? Is it good data, or not?
Additionally, how can we query this data? Very different analytical techniques are needed, said Paul Woobey. Paul Boyle extended this point by discussing the importance of developing interdisciplinary training for researchers. Such training would enable them, he remarked, to link biomedical, social, economic and environmental data, for example. Paul Woobey further commented that the ONS has joined together methodologists and IT people. Traditionally these groups wouldn’t talk to each other but this blend is what is needed for big data interpretation.
Both the ESRC and British Academy are looking at ways to enhance quantitative skills training at undergraduate level. But as Farida Vis commented, we also need to ensure that qualitative data analysis is conducted in big data research, to give context to what is being explored. She saw negative implications in reducing people to numbers. This, in turn, led onto a further concern: the politics of data. At some point, human interpretations of data will create bias. A greater understanding of what you are looking at is critical in there being real value in big data analysis.
All in all, are we creating more informed citizens? Does more information do us good? From the panel there was a sense that there are some huge benefits and opportunities for the social sciences in big data. But equally there is plenty of bad data, and issues around interpretation. Who controls the data? The challenges around skills, validity and trust will need to be addressed as more and more interpretation of such data is undertaken.
Mithu Lucraft is PR manager at SAGE Publications