BigSurv18 program
Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October
Social Science Infrastructure for Big Data |
|
Chair | Mr Christof Wolf (GESIS - Leibniz-Institute for the Social Sciences) |
Time | Saturday 27th October, 16:00 - 17:30 |
Room: | 40.002 |
Survey research enjoys support from various infrastructures in the social sciences. These organizations carry out surveys (e.g. ESS or SHARE), they offer services for sampling, pretesting, data management and data archiving (e.g. CESSDA and its members) and they offer a plethora of training programs (e.g. Summer Schools from Essex, GESIS or ICPSR). With the spread of “Big Data”, e.g. data originating from digitization of everyday life (digital data from mobile devices, from online searches, from the Internet of things, from registers) social science infrastructures face new challenges. What kind of support should they provide for "Big Data"? What do their clients/users expect from them?
In this session we discuss with representatives from social science infrastructures and their users future needs relating to these new data types.
We also welcome presentations from users doing research with "Big Data" and discussing their needs for services.
Big Data at FORS
Mr Brian Kleiner (FORS)
Ms Alexandra Stam (FORS)
Mr Nicolas Pekari (FORS)
Mr Boris Wernli (FORS) - Presenting Author
Mr Georg Lutz (FORS)
FORS is an infrastructure institution and serves the Swiss social science research community. We produce, preserve, and provide national data for secondary analyses, data that can be used to address important research questions within the social sciences.
FORS recognizes that «Big data», in the form of administrative data, social media data, transactional data, and text corpus data may have significant potential for advancing social research. We recognize as well that there are still severe limitations to using such data appropriately in a scientific context, notably concerning methods available for their analysis, their quality, and their accessibility. If they are used at all, we believe that a critical view should prevail, informed by current social science best practices and expertise.
Most importantly, we believe that at this time «Big data» should supplement, but not replace traditional methods and data sources in the social sciences. Within this perspective, we see two main ways in which we can employ «Big data» in the near future at the service of the research community in Switzerland.
First, FORS can do more to facilitate the use of administrative data for research purposes. For example, we plan to map the existing provision of administrative data from various sources in Switzerland, so that researchers can have an overview of possibilities. This should include information on the procedures required to gain access to such data. In addition, FORS will examine possibilities for enriching FORS datasets by linking them with administrative data from the Swiss Federal Statistical Office, but also from the Swiss social security system or other official sources. We will continue to conduct methodological research based on the register frame used for FORS surveys.
Second, sources such as social media and text corpus data can be used as contextual information for surveys. For data analysis, there are interesting examples of combining survey data and text analysis from social media or other sources where individuals can be linked. It remains to be seen how feasible this is in the context of surveys at FORS, especially in terms of getting the consent of participants.
Third, our data service will do more to solicit, curate, preserve, and disseminate rich and diverse forms of data, including «Big data», that can be used on their own for secondary analyses, or in connection with traditional data sources. This might include databases of different kinds of objects, e.g., job announcements, Twitter feeds, etc.
Finally, another role that FORS can play as a centre of expertise in the social sciences is to provide general guidance to researchers in Switzerland for working appropriately with various sorts of «Big data». Through our own experiences, and by keeping up on current developments, we will be able to advise researchers on identifying potential non-traditional data sources; accessing them; assessing their quality and usefulness in addressing specific research questions; choosing the right methods; and avoiding misinterpretation. To this end, we will continue to train our staff and assess how «Big data» can benefit the social sciences.
Linking Social Survey and Twitter Data - Consent, Operationalisation, Archiving, and Sharing
Mr Luke Sloan (Cardiff University) - Presenting Author
Mr Tarek Al Baghal (ISER)
Mr Curtis Jessop (NatCen Social Research)
Linking social survey and Twitter data provides an opportunity to explore the relationship between the offline and virtual world. From a substantive perspective, linked data allows us to understand the relationship between attitudes and beliefs reported through a formal survey and behaviours and content creation generated online. Methodologically, Twitter offers an opportunity to address issues of item non-response and calibration of novel measures. However, gaining consent and carrying out the data linkage is a complex procedure in which issues of anonymity, security and disclosure all come to the fore. In this presentation we discuss our experiences of data linkage as part of the Understanding Society Innovation Panel 2017, focussing on the procedural aspects of ascertaining and merging data from Twitter. In particular, we discuss what can and can’t be archived and what should, and shouldn’t, be shared due to risk of disclosure.
SOMAR: ICPSR's New Social Media Archive
Ms Margaret C. Levenstein (ICPSR) - Presenting Author
Social media are implicated in many of contemporary society’s most pressing issues, from influencing public opinion, to organizing social movements, to identifying economic trends. Increasing the capacity of researchers to understand the dynamics of such social, behavioral and economic phenomena will depend on reliable, curated, discoverable and accessible social media data. To that end, the Inter-university Consortium for Political and Social Research (ICPSR) is developing a new archive of curated datasets, workflows, and code for use by social science researchers for the empirical analysis of social media platforms, content, and user behavior. The goal is to provide a user-friendly, large-scale, next-generation data resource for researchers conducting data-intensive research using data from social media platforms such as Facebook, Twitter, Reddit, and Instagram. ICPSR is the largest archive of digital social and behavioral science data in the world, with more than 50 years of experience acquiring, preparing, and delivering an ever-evolving variety of data to social scientists. It is a natural home for tackling the novel challenges of social media data and for designing, launching and hosting SOMAR.
SOMAR will bring together social media datasets as a corpus with associated services and resources (e.g., federated search, metadata enhancement, automated workflows) to aid researchers in further interacting with and mining the data. This will enable extension of original findings and the creation of new knowledge, leading to a greater return on the original investment in the data. Thus, SOMAR will encourage more social science based on the analysis of social media data.
An archive for social media data will enable researchers to discover reusable social media datasets, provide a means for evaluating and/or replicating research based on social media data, and enable new insights, longitudinal studies, or comparative analyses that are nearly impossible today. Common, transparent, and reproducible approaches to privacy protection, linkage methodology, and analytical tools for these data will help ensure that research using social media data meets the highest scientific and ethical standards, and therefore gains the legitimacy necessary to advance the underlying science to its full potential. SOMAR will provide data resources and services to researchers across domains and support the development of shared approaches for using social media data in social, behavioral, and economic research. The archive will increase research productivity across a number of research areas including, among others, consumer behavior prediction, the psychological impacts of social media use, and digital archiving.
SOMAR will enable broad dissemination of research to enhance scientific and technological understanding. The technical and financial resources required to collect, manage, and analyze social media make such investigations beyond reach for many interested researchers. SOMAR will democratize access to social media data, lower barriers to reuse of these data, and offer training and guidance across disciplines to encourage the development of a social media research community. As these datasets will be available for use in classrooms, students will benefit by learning to exploit social media as empirical data.
Infrastructure for Digital Behavioral Data in the Social Sciences: The GESIS Perspective
Mr Julian Kohne (GESIS - Leibniz-Institute for the Social Sciences)
Mr Christof Wolf (GESIS - Leibniz-Institute for the Social Sciences) - Presenting Author
We witness evermore data sources—some speak of a data deluge—which could potentially benefit scientific research in general and research on socio-political phenomena, including pressing societal issues, in particular. At the same time most scholars in the disciplines traditionally researching these latter questions, such as sociologists, political scientists, (social-)psychologists, lack the expertise to collect, prepare and analyse these new data sources effectively and efficiently.
As an infrastructure institution tasked with supporting social research, GESIS is aiming at lowering the barriers faced by social scientists who are interested in working with these new data. In particular, GESIS is focussing on data that can be described as “Digital Behavioral Data” (DBD), i.e. data originating from uses of digital devices and services such as smart phones, fitness trackers, browsers or social media sites.
The GESIS strategy rests on four pillars: Capacity Building, Services for Data Collection, Services for Data Analysis and Data Curation Services. These encompass:
* Capacity Building: Training in (statistical) programming languages, in particular R, Python and Julia, and in analytical techniques such as machine learning, text mining, webscraping or social media mining.
* Data Collection: Provision of tools and services for low-threshold data collection, e.g. libraries, smartphone apps, sensors.
* Data Analysis: Provision of tools and services for low-threshold online data analysis, e.g. Jupiter notebooks.
* Data Curation and Dissemination: Development of guidelines for curating, archiving and disseminating DBD for secondary usage and replication, including development of proper metadata schemata.
Additionally, we plan on investigating and developing best practices concerning ethical standards for research with DBD and conducting research into the methodological foundations of DBD, in particular, their validity and explanatory power, including questions of representativity and inference.