BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Ok Google - Is Siri Busy? Innovating With Mobile Phone Data?

Chair Professor Annette Jäckle (University of Essex)
TimeFriday 26th October, 11:30 - 13:00
Room: 40.109

Performance and Sensitivities of Home Detection on Mobile Phone Data

Final candidate for the monograph

Mr Maarten Vanhoof (Open Lab, Newcastle University and Orange Labs France) - Presenting Author
Dr Clement Lee (Open Lab, Newcastle University)
Dr Zbigniew Smoreda (Orange Labs France)

Download presentation

Winner of the Student Paper Competition
Large-scale location based traces, such as mobile phone data, have been identified as a promising data source to complement or even enrich current official statistics. In many cases, home detection forms an important prerequisite step to mobilize the massively gathered data into workable formats. Still, very little research exists on either validation (comparison with ground truth datasets) or uncertainty estimation of home detection practices, both at the user and at the nationwide level. In this paper, we present an extensive empirical analysis on the validation of home detection algorithms as performed on a nationwide mobile phone dataset from France. We analyze the validity of 9 different home detection algorithms and assess the magnitude of different sources of uncertainty. Based on 225 different set-ups for the home detection of around 18 million users we discuss different measures for validation, investigate sensitivity to user parameter choices and observation period restriction, and explore spatio-temporal patterns of home detection in France. Our findings show how overall, nationwide performance of home detection is rather poor, with correlations maximizing at 0.60 only. Additionally, we show that both time and duration of observation have a clear effect on the performance of home detection on a nationwide scale, that the effect of parameter choice is rather small compared to other uncertainties, and that unknown market shares remain a factor of uncertainty. Our findings and discussion offer welcoming insights to other practitioners who are willing to apply home detection on similar datasets, or who are in need of an assessment of the challenges and uncertainties related to mobilizing mobile phone data for official statistics.


More Than Meets the Eyes: Complementing Surveys With Mobile Phone Digital Data Trail

Professor Ivano Bison (University of Trento) - Presenting Author
Mr Mattia Zeni (University of Trento)
Mr Matteo Busso (University of Trento)
Mr Enrico Bignotti (University of Trento)
Professor Fausto Giunchiglia (University of Trento)
Professor Giuseppe Veltri (University of Trento)

Download presentation

In this study, we will present findings about a set of natural experiments in which we have employed a mobile app developed to record information collected by participants' mobile devices such as GPS, ambient sensors, altogether with the collection of behavioral data and self-reported questions. After collecting information using a questionnaire about participants' psychological and social characteristics, we carried out our tests recording participants' activities by means of a mobile app for the period of time of two weeks (N= 200), and repeated the data collection twice. The digital trail produced by each participant was recorded by the mobile devices twenty times per second. We focused on an implementation of time diaries that combined both types of data in order to study time budgeting behavior of students at the University of Trento, Italy. We elicited self-reported questions during their daily activities (every 30 minutes they were asked about their whereabouts and activity). From a more general methodological standpoint, we used this information to discuss the calibration of this micro personal data with meaningful human categories regarding space, time, behavior and activities. In summary, we have three levels of analysis: sensory based from their mobile phones, the prompted self-reported questions and the meta-data about the use that participants make of their devices. Patterns of behavior are identified by means of machine learning classification on all three levels. Based on such results, we will reflect about total survey error and nonsampling error about this data collection method. Our preliminary results suggest some venues of integration between self-reported survey items and mobile-based behavioral information.


Methodological Implications of Device-Related Error Sources in Integrating Smartphone Sensor Data and Survey Data

Dr Nejc Berzelak (University of Ljubljana, Faculty of Social Sciences) - Presenting Author
Mr Uroš Podkrižnik (University of Ljubljana, Faculty of Social Sciences)
Professor Vasja Vehovar (University of Ljubljana, Faculty of Social Sciences)

Modern smartphones incorporate sensors to collect data about the position, orientation, motion and environment. Previous studies have demonstrated that passively collected data about location and motion can be effectively used also for social science research. However, a detailed elaboration of such data collection approaches from the perspective of social science methodology and their integrative placement among survey data collection methods is still lacking. This extends to appropriate consideration of errors arising from sensor data in the context of their integration with survey data. While error sources have been comprehensively elaborated for survey research, particularly by the Total Survey Error framework, systematic efforts to accomplish a consistent conceptualisation for complementary sensor data remain limited.

This paper contributes a critical elaboration of specific error sources in data collection using smartphone sensors to complement survey data. It focuses predominantly on technical aspects that may have important methodological implications for social science research and addresses three main research questions:

1) How may technical characteristics of smartphones and behaviour of research participants in interacting with their devices affect the quality of data relevant for social science research?
2) How can these error sources be placed into the conceptual framework of the Total Survey Error?
3) How can device paradata contribute to better understanding of potential influences of these factors on the data quality?

The elaboration is built on a comprehensive review of the literature from various fields to apply the findings of previous studies onto the context of survey research. We further highlight the potential error sources using the findings of a pilot evaluation of a dedicated mobile application for passive collection of location and mobility data. We identify technological and behavioural factors as two broad groups of specific error sources in combining smartphone sensor and survey data. The former stem from technical characteristics of the device itself as well as the performance of the underlying algorithms to process the sensor data. The latter set of errors relates to specific ways in which study participants use and interact with their smartphones, where participants may intentionally or unintentionally affect the data collection performance by altering smartphone features or using the device in an unanticipated way.

We discuss potential biasing effects of specific error sources under different conditions and systematically place them into the framework of the Total Survey Error. This exercise in harmonising the conceptualisation of errors in sensor and survey data is particularly useful for integrative data collection where processed sensor data are considered as a substitute for self-reports.

Finally, we underline the importance of implementing appropriate measures to better understand and monitor the technical environment during the data collection. We evaluate the potential of device paradata in offering insights into variations between devices and specific actions of research participants that may affect the data collection process. The use of paradata allows researchers to regain some locus of control over the data collection or at least identify some of the factors that may compromise the data quality.


Complementing Official Statistics With Mobile Phone Data

Mr Lino Galiana (INSEE) - Presenting Author
Mr Benjamin Sakarovitch (INSEE)
Mr Zbigniew Smoreda (Orange Labs)

Download presentation

Call detail records (CDR) present information on people contact and the two antennas that have been used to transmit the call or message from the caller to the callee. Thus, phone data are a privileged tool for spatial analysis where we look at people presence and mobility within a territory.

Even with anonymized data, it is possible to determine a likely neighborhood of living which can be crossed with official statistics presenting spatial dimension. Crossing official sources and big data by their geographic features enables us to assess big data sources representatively as well as enriching them with official statistics. Big data also acts as a good complement to traditional data sources, giving a volume of information far more importance than survey data can, as well as more precision than declarative data. Using phone interaction timestamp and coordinates, it is possible to define several spatial and temporal granularities when crossing phone and official data.

This study uses anonymized 2007 French CDR from Orange customers as well as French geolocalized household income data from INSEE official records. Phone data present identifiers for the individual who called and the one that has been called as well as gps coordinates of the antennas used to transmit the call or text message. Using these coordinates, French territory is divided into voronoi cells, polygons assigning each point of space with its closest antenna. Using voronoi level area, population spatial distribution can be compared between the two sources and phone data can be used to give a complementary view from official records. Crossing geolocalized income records and phone data also open new perspective on spatial statistics based on phone data.


Mobile Phone Data for Official Statistics: Elements for a Production Framework

Mr C. Alexandru (Romanian National Institute of Statistics (INSSE))
Ms E. Coudin (Institut National de la Statistique et des Etudes Economiques (INSEE))
Mr M. Debusschere (Belgian National Statistical Institute (Statistics Belgium))
Ms M.E. Esteban (Spanish National Statistics Institute (Statistics Spain –INE))
Ms S. Kienzle (Statistisches Bundesamt (DESTATIS))
Mr O. Nurmi (Finnish National Statistical Institute (Statistics Finland))
Mr B. Oancea (Romanian National Institute of Statistics (INSSE); University of Bucharest)
Mr P. Piela (Finnish National Statistical Institute (Statistics Finland))
Mr D. Salgado (Spanish National Statistics Institute (Statistics Spain –INE)) - Presenting Author

Download presentation

Rest of authors
R. Radini - Italian National Institute of Statistics (ISTAT)
B. Sakarovitch - Institut National de la Statistique et des Etudes Economiques (INSEE)
S. Saldaña - Spanish National Statistics Institute (Statistics Spain –INE)
L. Sanguiao - Spanish National Statistics Institute (Statistics Spain –INE)
M. Tennekes - Dutch National Statistics Office (Statistics Netherlands –CBS)
S. Williams - Office for National Statistics (ONS)
M. Zwick - Statistisches Bundesamt (DESTATIS)

Beyond doubt, mobile phone data stand as one of the most promising Big Data sources for the production of official statistics. In consonance, in the recent ESSnet on Big Data participated by 22 partners of the European Statistical System (ESS) a work package was completely devoted to the access to these data, the development of statistical methodology, the analysis, construction, and implementation of IT tools and of quality issues to make this promising information source become a regular resource in the production of official statistics.

We offer a summary of the works conducted in this work package, going from the intricate issue of accessing diverse forms of mobile phone data (microdata/aggregated data) over setting up an inferential framework to use aggregated mobile phone data in combination with official data to produce population counts, to the development of some IT tools for providing a proof of concept and first analytical results upon real data. All these enter as relevant factors in the quality assessment of the final estimates.

As explained in the results of the ESSnet, although we have been able to collect enough real data as to conduct the analytical study, the access to mobile phone data is still an open question which needs further work within the ESS and the European Union. A first set of conclusions and guidelines for partners of the ESS have been obtained.

We propose to use the two-phase life-cycle model for statistical microdata to describe the generation of mobile phone data thus also allowing us to assess errors and ultimately data quality under a common approach. A core data model is also proposed providing a standard approach for diverse statistical domains. The geographic location of network events (calls, SMS/MMS, Internet connections, pings, ...) gives rise to the spatial attributes of the statistical units and stands as a very important piece of the model. We provide a Bayesian procedure to estimate these geographic locations.

For the inference exercise connecting mobile phone data with target populations of interest, unable to use traditional design-based techniques, we have explored the use of hierarchical statistical models as in ecological sampling to propose a generic inferential framework for any target population (commuters, resident tourists, inbound tourists, general population,…). For computational reasons we have also chosen a Bayesian framework to compute posterior distributions and diverse indicators thereof (median, mean, credible intervals, ...).

These proposals are completed providing software tools to implement this methodological framework, showing a proof of concept with both simulated and real data. Finally, we also provide some preliminary quality assessment of the statistical outputs obtained with this framework.