BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Big Data Enhancements to Surveys: Methods and Tools

Chair Mr Niklas M. Loynes (NYU / University of Manchester)
TimeFriday 26th October, 11:30 - 13:00
Room: 40.105

When Behavioral Data Isn't Enough: Mixing Survey, Log, and Usability Data for a Holistic Understanding of User Experience

Dr Jessica Cornick (Facebook) - Presenting Author
Ms Alexandra Sullivan (Facebook)
Ms Lan Guo (Facebook)
Dr Adam Sage (Facebook)

While online services can use routine logged data to understand when people are using their product and what features they are using most, that data doesn’t give insight into WHY. To that end, we developed a more holistic framework to measure people’s experience with our tools. We couple routinely logged data with voluntary surveys and interviews from a qualitative usability study.

We measure three main components: Effectiveness of the tools, Efficiency of the tools, and Sentiment (how people feel about their experience with the tools). The in-platform survey provided attitudinal measures of the three components, objective measures of effectiveness and efficiency were assessed via the routinely logged data, and the qualitative usability sessions provided in depth understanding of key tasks to give us a more holistic understanding of the user experience. This talk explains how we approached each component. The combination of data from multiple sources yielded a richer perspective that would have been difficult to obtain with any one data stream alone.


Using a Large GPS Dataset to Enhance Survey Matching

Mr Ryan McShane (Southern Methodist University) - Presenting Author

Download presentation

NOAA Fisheries is the United States’ federal steward of their nation’s ocean resources. To this end, they currently collect data on catch and “effort” in the recreational fishing sector in the charter boat modality. Catch is estimated with the Access Point Interview Survey (APAIS), in which biologists wait at randomly assigned docks, marinas, and slips and interview all passengers of any charter boat they intercept. The caught fish are inspected, identified, measured, and counted at the passenger level to produce an estimate for catch per unit effort (CPUE). “Effort” is estimated via a costly telephone survey with low response rate. CPUE and effort are then combined to produce an estimate of total catch per species.

In an effort to reduce costs, improve accuracy, and increase timeliness, NOAA Fisheries are experimenting with an alternative data collection procedure in which charter boat captains report the total catch per species at the end of each trip with a GPS-enabled electronic device (currently, Thorium). The captains’ reports are used as auxiliary data to the probability sample of intercepts, resulting in an estimator that has a similar form to a capture recapture estimator. This estimation procedure requires matching intercepted trips to reported ones. Since intercepted trips are a probability sample, this allows estimation of a reporting rate.

However, since charter boats may take multiple trips in a day, the trip report to APAIS interview matching procedure requires a time component. The times reported in the captains’ log are often misleading or erroneous, and thus the author sought an alternative source of data to enhance matching. Each vessel produces a GPS position report on a periodic basis – to date, there are over 2.5 million such reports.

We describe how the periodic GPS position data is used to improve shore arrival time estimates. Initially, locations at which vessels are stationary for extended periods of time and which match the NOAA Site Register (sites at which APAIS interviews may take place) are identified. Then, trips to and from these locations are identified. From these trips, we can estimate arrival time to be as accurate as half of the report resolution. Additionally, duration of trip is estimable. For vessels whose GPS reporting is turned off, we can then use previous trip data to enhance arrival time estimates. Finally, the relationship between catch and trip duration is explored. We discuss the obstacles with the dataset size and home base and trip identification. Our findings may be extended to other surveys which may be augmented with GPS data.


A Case Study of Processing Large Scale Data - A Method to Accomplish Reproducibility

Mrs Inga Brentel (Research Associate at Institute of Social Science, Heinrich-Heine-University Düsseldorf, Germany) - Presenting Author
Mr Olaf Jandura (Professor at Institute of Social Science, Heinrich-Heine-University Düsseldorf, Germany)
Mrs Kristi Winters (Research Associate at GESIS Leibniz-Institute for Social Science in Cologne, Germany)

Download presentation

Relevance & Research Question:
Our project is an attempt to fill a lacuna in communications studies by creating a harmonized longitudinal dataset (since 1954) on media use in Germany exploiting the Media-Analysis-Data, which is based on representative surveys with 30.000 respondents each year. In making large-scale media use data accessible for academic research in high quality standards of data documentation lies the relevance of this project. The research question, therefore, is: how to make the Media-Analysis-Data – as a ‘big data’ – accessible for academic research while being transparent to ensure reproducibility.

Methods & Data:
This paper will present the various theoretical and practical use of a digital harmonization software, CharmStats, utilized over the course of this project. The goal of the data processing was to create a scientific use file setting excellent documentation standards with the help of CharmStats and to continue harmonizations up to 2009. Using CharmStats we review the challenges and solutions developed in large-scale data processing as a mass variable harmonization case study. With more than 1.5 million cases per dataset – in total there are two harmonized datasets –, including almost 30.000 variables for over 60 years for pressmedia, almost 40 years for radio and now eight years for online media, the Media-Analysis-Data can be counted as the biggest dataset of media use in Germany made available for academics. Therefore, the methodological approach of this project can be counted as a user case for documenting and harmonizing big data for academic research to secure traceability.

Results:
Target of the project is to make the complex and labour-intensive data processing procedure for large-scale data fully transparent and traceable. CharmStats offers the possibility to fulfil the project´s goals as it produces proprietary statistical software syntaxes for data processing plus a report for documentation. For the presentation we will portrait the different steps taken to fulfil the project´s goals to answer the research question:

1) Finding a structure to work with,
2) Setting standards for data documentation making data processing traceable with CharmStats,
3) Producing a harmonized dataset, and
4) Making the dataset reproducible, moreover, making it an accessible and sustainable source for academic research throughout the Library of Online Harmonization (scheduled for release in 2019).