BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Crowdsourcing, Causality, and the Issue of Social Trust

Chair Professor Rene Bekkers (Vrije Universiteit Amsterdam)
TimeSaturday 27th October, 09:00 - 10:30
Room: 40.063

The Gift of Trust

Professor Rene Bekkers (Vrije Universiteit Amsterdam) - Presenting Author

Download presentation

Why are citizens in some countries much more trusting than citizens in other countries? How has trust changed over time? How can trends in trust be explained? What are the consequences of trust for health, wellbeing and the wealth of nations and individuals? How do the answers to these questions depend on the measurement of trust, the survey mode, sampling procedures and other features of the data collection and research design?

Questions on trust have been answered using only a fraction of the data available. The typical study on trust conducted by a single scholar or a co-authored study, and relies on a single data source. This is an inefficient use of the data available on trust. With a growing group of academics, gathered in the Global Trust Research Consortium (GTRC), we are harmonizing data on trust from surveys conducted throughout the world. The ambition of the GTRC is no less than to harmonize all the available survey data on generalized social trust. The consortium is an open collaboration between scientists of all disciplines interested in trust, including political science, sociology, psychology, economics, and communication science. The GTRC provides open access to the results of its research on trust.

In this paper, we present the design of the GTRC and the harmonized trust database (HTD). The HTD is an ex post survey data harmonization (SDH) project. The HTD serves to answer both substantial questions on the correlates, determinants and consequences of trust, as well as methodological questions on the measurement of trust. The HTD includes an exceptionally large number of observations. Currently 2,671,945 observations from 190 surveys in 150 countries, from 1948 to 2017. An additional 130 surveys have been identified that await harmonization, likely surpassing the 4 million frontier as the ESRA conference is held in October. The current status is posted at https://globaltrustresearch.wordpress.com/status/. The exceptionally large number of observations in the HTD allows researchers to answer questions on trust with more degrees of freedom and statistical power than the typical study relying on a single data source.

The HTD is a Big Data project because data on trust were not collected for the purpose of joint analysis. We discuss the challenges of ex post harmonization and their similarities with unstructured data in computational social science. Examples of substantial questions that can be answered using the power of the HTD are how trust depends on country characteristics and how trust develops over the life course, taking period and cohort effects into account. Examples of methodological questions that can be answered using the power of the HTD are how trust responses depend on the scale used to measure trust and the survey mode. We illustrate the potential of the HTD with a mega-analysis, also called integrated data analysis in psychology or analysis of individual patient data in epidemiology, to answer these questions.


Problems in Identifying Causality in Observational Data

Mr Ray Poynter (The Future Place and Nottingham Trent University) - Presenting Author

Big data appears to be the answer to two major concerns relating to survey data. Firstly, survey data has become less and less representative. Big data offers the possibility in many cases of a census. Secondly, in the light of developments in neuroscience and behavioural economics we are increasingly aware of the frailties of the question and answer format. It is now widely held that we are poor witnesses of our own behaviour and motivations. Big data offers behavioural data, the perceived panacea to the weaknesses of the question and answer paradigm.

However, big data and in particular the observation of naturally occurring phenomena brings with it several challenges relating to identifying and assessing causality. For example, consider HRT and CHD (hormone replacement therapy and coronary heart disease). Since the mid-1980s many observational studies (for example the Nurses Health Study) demonstrated large reductions (e.g. 40%) in CHD. However, a large randomised control test was conducted by WHI (Women’s Health Initiative) which refuted most of the health benefits of HRT, in the context of CHD. Analysis suggested that in the observational data, there was a systematic bias of healthier (and wealthier) women being treated, and these women were likely to have better CHD outcomes. The causality that the earlier studies had assumed was that the differences in the outcomes were caused by differences in the treatments given. However, to a considerable extent, the differences in the outcomes were caused by social and health factors, and these factors also had a large role in determining who received the HRT treatments.

Researchers seek to control for observational biases in data by controlling for other factors, particularly in longitudinal studies and in MMM modules used to evaluate the impact of multi-media campaigns. Techniques include weighting, statistical manipulation, matching, and attempts to create counterfactuals. However, the methods used to do this are dependent on assumptions about coverage and causality. A realisation of this type of problem is one of the factors that has led to increased interest in Mosteller’s suggested Type III errors, errors where the null hypothesis is correctly rejected but for the wrong reason (potentially leading to the wrong recommendations, which Marascuilo and Levin have suggested is a Type IV error).
The presentation and paper will highlight the difficulties encountered in the real world with observational data sets and illustrate the challenges associated with causality through the medium of modelling synthetic data. The analysis will illustrate difference treatments of observational data in the context of different underlying patterns in the data. Synthetic data is used for this purpose since the underlying latent structures are known and the results of the analyses can be compared with what has been created.

The model and dataset will also be made available, so that other researchers can experiment with different underlying structures, and/or different analysis strategies.


Crowdsourced Small Area Estimation. Crowdsourcing and Estimating Safety Perceptions at Neighbourhood Level in London

Final candidate for the monograph

Mr David Buil-Gil (Centre for Criminology and Criminal Justice, University of Manchester) - Presenting Author
Dr Reka Solymosi (Centre for Criminology and Criminal Justice, University of Manchester)
Dr Angelo Moretti (Geography Department, University of Sheffield)

Download presentation

A growing body of social research is applying crowdsourcing techniques to collect data on crime patterns and attitudes and emotions towards crime and the criminal justice system. Data generated through people’s participation in (generally) online platforms serving a variety of functions allow for mapping phenomena at a low spatial level to explore immediate environmental and social organisation predictors to crime and associated constructs. In contrast to traditional survey methods, which struggle to capture the low spatial and temporal variability in variables of criminological interest, crowdsourced data are useful to map phenomena at their precise spatial location and explore their temporal patterns. However, crowdsourced data have been repeatedly criticised due to their bias arising from participants' self-selection and due to providing non-representative data. Studies looking into participation inequality have found systematic over-representation of certain groups; men tend to participate more so in such activities than women, and employed people, people between the ages of 20-50, and those with a college or university degree are all more likely contributors. Then, although data provided online by participants allow renewed exploratory approaches to crime and deviance, the level of representativeness of such data towards the target population might be too small to positively assess the reliability of analyses produced from these.

In this research we propose an innovative approach to reduce the bias in crowdsourced data and increase the precision of area-level estimates. By applying area-level model-based small area estimation techniques to crowdsourcing methods, the bias and low precision of crowdsourced data might be reduced. Small area estimation techniques are helpful to produce estimates with increased reliability by introducing area-level models that borrow strength from related areas, as well as neighbouring areas and time series. As a starting point to bridge the gap between crowdsourcing and small area estimation techniques, this research makes use of the Place Pulse 2.0 dataset, which contains reports of users who note “Which place looks safer?” from two images, to produce precise estimates of safety perception at low spatial level in London (England).

Calibration and bootstrap innovative procedures will be explored and assessed to weight reports and minimise bias arising from non-random samples. Then, area-level models will be fitted with covariates from the census and other administrative sources in order to produce precise estimates at low area level from the area-level Empirical Best Linear Unbiased Predictor (EBLUP) estimator. The covariates selected to fit the models are selected based on the literature review about low level predictors of perceived safety, such as ethnic heterogeneity, crime rate, population density and poverty.

By presenting an innovative approach to bridge the gap between crowdsourcing and small area estimation methodologies, we are able to reduce the self-selection bias in crowdsourced data and to precisely map safety perceptions at a small area level. Mapping perceived safety at a low spatial level might be useful to focus interventions aimed to reduce public anxieties and to detect environmental features and social organisation variables.