BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Combining General Population Surveys With Big Data From Activity Trackers or Smartphone Apps

Chair Mr Peter Lugtig (Utrecht University)
TimeSaturday 27th October, 09:00 - 10:30
Room: 40.004

The goal of this session is to showcase 5 papers that have each tried to use smartphones or activity trackers to enrich large-scale general-population surveys with Big Data. Many of the current examples in the literature have used small samples of volunteers (e.g. students) to test the potential of trackers and smartphones. There are unique challenges to scaling up to general population surveys. The session centers around the following topics:
- Implementation issues: nonresponse, loss in field, technical problems, use of devices

- Issues in collecting, accessing and storing the data for large samples, across the general population.

- Additional value of Big dData in combination with survey data. How do combined data show a better picture of core variables of interest?

Measuring Young People's Physical Activity Using Accelerometers in the UK Millennium Cohort Study

Dr Lisa Calderwood (Centre for Longitudinal Studies, UCL) - Presenting Author
Dr Emily Gilbert (Centre for Longitudinal Studies, UCL)

Download presentation

Measuring physical activity presents methodological challenges for survey research. Most large-scale population based studies use self-reported data to measure physical activity which is subject to both recall and social desirability bias. The use of devices that measure physical activity directly can offer a solution to these problems. Activity monitors, also known as accelerometers, are capable of capturing a wide range of movements as well as the differing intensity of activities. And increasingly, accelerometers are also being recognised for their ability to measure sedentary activities.

The Millennium Cohort Study (MCS) follows over 19,000 children born in the UK in 2000/1. The sixth sweep of the survey collected data from cohort members when they were 14 years old and included the collection of physical activity data using activity monitors.

This paper presents the development of the approach taken to the implementation of activity monitors at the sixth sweep of MCS. Field interviewers placed wrist-worn accelerometers with respondents during face-to-face visits and asked them to wear the device for two complete days. The feasibility of collecting accelerometer data from 14-year olds was assessed through a number of qualitative and quantitative methods and comparisons between two different devices were made:

-Prior to piloting, depth interviews were carried out with 14-year olds and their parents to explore the acceptability of activity monitor data collection (among other survey elements)
-Respondents and interviewers provided feedback as part of the pilot surveys
-Data on accelerometer placement rates, return rates and return times were evaluated
-Office procedures (such as charging, calibrating, downloading data) were assessed
-Respondent compliance data were evaluated

Findings from this development work informed the approach taken at the MCS Age 14 Survey, which resulted in the successful collection of objective physical activity data from nearly 5000 young people, alongside more traditional social survey data. This paper highlights a number of considerations for the implementation of objective physical activity data collection in large-scale social surveys.


Testing the Logistics of the Accelerometer Project in SHARE

Dr Luzia Weiss (Max Planck Institute of Social Law and Social Policy) - Presenting Author
Dr Annette Scherpenzeel (Max Planck Institute of Social Law and Social Policy; TUM Munich)
Mrs Nora Angleys (Max Planck Institute of Social Law and Social Policy)

Download presentation

The Survey of Health, Ageing and Retirement in Europe (SHARE) is a multidisciplinary and cross-national panel database of micro data on health, socioeconomic status and social and family networks of individuals aged 50 or over covering 27 European countries and Israel (www.share-project.org). SHARE’s main aim is to provide data on individuals as they age in order to analyse the process of individual and population ageing in depth. One key area covered by SHARE is health.

Health in large-scale population surveys is typically measured by self-report questions. Such items are likely to suffer from “differential item functioning” (DIF), the interpersonal and intercultural variation in interpreting and using the response categories for the same question. From the first wave (2004) on, SHARE therefore combined self-reported health with objective health measurements in the form of physical performance measurements, such as the measurement of grip strength, peak flow and chair stand measures. Such measures minimize DIF of self-reported measures, facilitate the comparison across countries, and permit adjustments of self-reported measures of health.
One important element well-known to be correlated to an individual’s health is activity. In SHARE, activity is measured so far by two self-report items on the frequency of moderate respectively vigorous activity—variables that are likely to suffer from DIF. In its eighth wave, SHARE therefore will extend the ojective measures with the collection of objective physical activity data by using accelerometry.

Accelerometer data will be collected in a subsample of SHARE, in 10 pre-chosen countries, aiming at full activity data from 200 respondents per country. In a first step, this collection of activity data will be tested during SHARE’s Wave 8 pretest taking place in summer 2018. The gained data and experiences will be used to evaluate and optimize the fieldwork procedures. SHARE data are collected using face-to-face interviews admitted by trained interviewers at the respondents’ homes. The interviewers will not have the accelerometers with them on the day of the interview, so logistics will include shipment of the accelerometers, self-attachment of the device by the respondents, downloading the data at different survey agencies, and finally the linkage of the data to the original interview data which is collected using Computer Assisted Personal Interviewing (CAPI).

As this project includes 10 different countries (and 11 survey agencies), all data collection has to be harmonized ex-ante. This presentation summarizes the experiences made in this setting of a large-scale population survey and will point out solutions to harmonizing the collection of accelerometer data in a multi-national setting.


Using GPS Data as Auxiliary Data to Review the Data Quality of a Time Use Survey

In review process for the special issue

Miss Anne Elevelt (Utrecht University) - Presenting Author
Dr Peter Lugtig (Utrecht University)
Dr Vera Toepoel (Utrecht University)
Professor Stijn Ruiter (Netherlands Institute for the Study of Crime and Law Enforcement)
Professor Wim Bernasco (Netherlands Institute for the Study of Crime and Law Enforcement)

Download presentation

Sensor data offers great potential for social scientists interested in studying attitudes and behaviors. Participants carry their smartphone everywhere, enabling scientists to collect GPS data and track, for example, how much participants move around and where they go. These kind of data are particularly interesting when they can be linked to and compared with other data sources.

As GPS data incorporate information about people’s time use, linking GPS data to self-reported time use surveys can be valuable for understanding how people spend their time. First, we may use GPS data to inform missing data in the Time Use Diaries. We can get a more detailed view of when, where and for how long activities take place, and can finally use both data sources to investigate errors of measurement.

In 2015, as part of the CILS4EU cohort study (Kalter et al., 2013), a large cohort of adolescents was invited to complete a time use study. Respondents filled out their activities for 10-minute intervals in a smartphone diary-app. On top of this, respondents were asked to share GPS data. This yields a unique and rich database of both time use and time location data.

In this paper, we show how to use sensor (or GPS) data as auxiliary data in time use research. First, we link the GPS and time use data. Second, we use these GPS data to inspect the quality of the time use data. We show, for example, how travel times measured by GPS correspond with self-reported travel episodes. We examine which codes match, and which ones don’t. Third, we explain why and when episodes are more likely to be reported or observed incorrectly. Finally, we give implications from this study for time use research in general. To our knowledge, this is the first study that has examined concordance of real-time GPS data with self-reported time use data, and uses GPS data as auxiliary data to review data quality.


Quality of Spending Data Collected With a Receipt Scanning App in a Probability Household Panel

Dr Alex Wenz (ISER, University of Essex) - Presenting Author
Professor Annette Jackle (ISER, University of Essex)

Download presentation

The measurement of consumer expenditure is one particular area that could benefit from new data collection opportunities using mobile technologies. Existing surveys use diary methods to collect data on expenditure, which are long and burdensome to complete and rely heavily on the respondent’s ability to recall information. Alternatively, respondents can be provided with a mobile app to scan shopping receipts or report purchases. In this paper, we assess the quality of spending reported by a general population sample using an app, by comparing total and category spending against benchmarks.

N = 2,432 members of the Understanding Society Innovation Panel, a probability household panel in the United Kingdom, were invited to download an app on their mobile device to record their expenditure on goods and services for one month. Participants were asked to scan their shopping receipts, to record their spending in the app, or to report a day without a purchase. They received incentives for downloading and using the app. 10% of sample members used the app at least once in each of the five weeks. Fieldwork was carried out from October 2016 until January 2017. We compare the data reported in the app against benchmark data from the UK Living Costs and Food Survey spending diaries as well as against survey data collected in the Innovation Panel annual interviews.

First, we will gauge the level of total and category spending reported in the app compared to benchmarks. Second, we will assess the importance of offering participants the option to report spending directly in the app instead of scanning receipts: we will examine for which spending categories the ‘direct entry’ option provides data that are more comparable to benchmarks than scanned receipts only. Third, we will investigate which types of participants report spending data that are more comparable to benchmarks, considering socio-demographic characteristics, patterns of mobile device use and financial behaviours.

This paper provides novel evidence on the quality of expenditure data collected using mobile technologies as well as on the scalability of app-based data collection in the context of a probability household panel of the general population.


WHO IS WHO. An Algorithm to Attribute the Device's Navigation to Users Sharing the Same Device

Mr Carlos Ochoa (Netquest) - Presenting Author
Mr Carlos Bort (Netquest)
Mr Miquel Porcar (Netquest)

With the expansion of Internet and mobile devices, new measurement opportunities appear for supplementing survey data, which could allow to reduce respondent burden, improve the quality of the measurement and/or measure new things.

Passive data collection can take different forms. We focus on the use of a tracking application (a ‘‘meter’’) installed on the participants’ devices (PC, tablet and/or smartphone) to register their online behavior. The meter provides information about the URLs of the web pages visited by the participants, the time and length of the visit, and ad exposure. If the meter is installed in addition by members of an online access panel, it allows to collect from the same individuals what they are doing online (via the meter) as well as their opinions (via surveys). Thus, it can provide very rich information.

However, metering data collection still faces several challenges. First, the large amount of data generated per respondent makes the analysis complex. Second, individual navigation may be spread across different devices which would require installing a meter in all the individual’s devices to get the full picture. Finally, the meter can be installed on devices used by multiple users, not only the panelists.

This last issue is the focus of this presentation. One potential solution is limiting the data collection to non-shared devices, but it may hurt representativeness. Another is to add a “login dialog” to the meter, so each time the user starts a browsing session a pop-up message asks about his identity. However, there is then a risk to distort the passive nature of the data, since participants need to be "active" and identify themselves each time.

Having these concerns in mind, we propose a completely different approach: separate individuals’ navigation by looking at the data, using advanced data analysis. The idea is that browsing information is a personal trait, something unique that unequivocally identifies each individual the same way a fingerprint does. For instance, if Mr. X always starts by opening the newspaper webpage when he connects to the internet on the shared PC at home, while Ms. X always starts by connecting to her Facebook account; in that case, by looking at the first webpage opened in each new session, we could identify the user.

Using data from the Netquest online opt-in panel, where panelists have installed a meter on their devices, we have developed a machine learning technique to separate browsing streams without asking users, preserving the passive nature of the data.

In order to do so, we artificially mixed individual navigations coming from single-user devices and tried to separate them again using an algorithm based on grouping URLs in sessions, evaluating URLs correlations within sessions and mapping correlations into distances using a Multidimensional Scaling algorithm.

Our first versions of this algorithm properly separate URLs with around 90% of accuracy. We compared this performance to a “login dialog” solution, proving how the supposed reliability of asking for identities fails in practice.