BigSurv23 Program





Please note that the program is subject to change. Kindly refer to the latest updates provided at the conference website.

All academic events will take place at the Campus of San Francisco de Quito University.

Saturday 28th October


08:00 - 17:00

Download the BigSurv23 Program in PDF format here.
You will be redirected to www.aenu.ec

Registration and Information Desk (Room: DaVinci Hall)

09:00 - 10:30

CONCURRENT SESSIONS D

Session 1: What did the robots hear the humans say? Advances in Coding Survey Open-Ends Using ML Methods

Machine Learning Assisted Autocoding Tools for Improving the Experience of Manual Coding of Real-World Big Text Data
Ms Emily Hadley (RTI International) - Presenting Author

Ms Caroline Kery (RTI International)

Mr Durk Steed (RTI International)

Ms Anna Godwin (RTI International)

Mr Ethan Ritchie (RTI International)

Mrs Donna Anderson (RTI International)

Mr Rob Chew (RTI International)

Mr Peter Baumgartner (RTI International)

Considerations for data quality in open-ended embedded probes across population and methodological subgroups
Mr Zachary Smith (National Center for Health Statistics, Centers for Disease Control and Prevention) - Presenting Author

Dr Kristen Cibelli Hibben (National Center for Health Statistics, Centers for Disease Control and Prevention)

Dr Paul Scanlon (National Center for Health Statistics, Centers for Disease Control and Prevention)

Dr Valerie Ryan (National Center for Health Statistics, Centers for Disease Control and Prevention)

Mr Benjamin Rogers (National Center for Health Statistics, Centers for Disease Control and Prevention)

Dr Travis Hoppe (National Center for Health Statistics, Centers for Disease Control and Prevention)

Dr Kristen Miller (National Center for Health Statistics, Centers for Disease Control and Prevention)

Multi-label classification of open-ended questions with BERT
Professor Matthias Schonlau (University of Waterloo)

Dr Julia Weiß (GESIS )

Mr Jan Marquardt (GESIS) - Presenting Author

Linguistic shifts and topic drift: Building adaptive Natural Language Processing systems to code open-ended responses from multiple survey rounds
Dr Sarah Staveteig Ford (U.S. State Department) - Presenting Author

Dr Jon Krosnick (Stanford University)

Dr Matthew DeBell (Stanford University)

09:00 - 10:30

CONCURRENT SESSIONS D

Session 2: The Methodologists talked with the Data Scientists and it wasn't fair! Here's What to Do About It!

Assessing the downstream effects of training data annotation methods on supervised machine learning models
Mr Jacob Beck (LMU Munich )

Dr Stephanie Eckman (Independent)

Professor Christoph Kern (LMU Munich)

Mr Rob Chew (RTI International ) - Presenting Author

Mr Bolei Ma (LMU Munich)

Professor Frauke Kreuter (LMU Munich, University of Maryland)

What If? Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness
Mr Jan Simson (LMU Munich) - Presenting Author

Dr Florian Pfisterer (LMU Munich)

Professor Christoph Kern (LMU Munich)

Applied Strategies for Advancing Racial Equity and Addressing Bias in Big Data Research
Ms Emily Hadley (RTI International) - Presenting Author

Ms Rachel Dungan (Academy Health)

My Training Data May Need a Trainer: Applying Population Representation Metrics to Training Data to Assess Representativity, Machine Learning Model Performance and Fairness
Dr Trent Buskirk (Bowling Green State University) - Presenting Author

Dr Christoph Kern (Ludwig Maximilian University of Munich)

Mr Patrick Schenk (Ludwig Maximilian University of Munich)

09:00 - 10:30

CONCURRENT SESSIONS D

Session 3: Harmonize Your Vocals! Exploring Voice Capture and Processing for Collecting Open-Ended Survey Responses

Open-Ended Survey Questions: A comparison of information content in text and audio response formats
Mrs Camille Landesvatter (MZES, University of Mannheim) - Presenting Author

Mr Paul Bauer (MZES, University of Mannheim)

Innovating web probing: Comparing text and voice answers to open-ended probing questions in a smartphone survey
Dr Jan Karem Höhne (University of Duisburg-Essen) - Presenting Author

Dr Timo Lenzner (GESIS - Leibniz Institute for the Social Sciences)

Dr Konstantin Gavras (Nesto Software GmbH)

API vs. human coder: Comparing the performance of speech-to-text transcription using voice answers from a smartphone survey
Dr Jan Karem Höhne (University of Duisburg-Essen) - Presenting Author

Dr Timo Lenzner (GESIS - Leibniz Institute for the Social Sciences)

Assessing Performance of Survey Questions through a CARI Machine Learning Pipeline
Dr Ting Yan (Westat) - Presenting Author

Mr Anil Battalahalli (Westat)

09:00 - 10:30

CONCURRENT SESSIONS D

Session 4: We Triangulated but Got A Rhombus!? Methods for Improving Insights Based on Data Combined from Multiple Sources

Beware of propensity score matching as a method for the integration of different data sets
Dr Hans Kiesl (Ostbayerische Technische Hochschule Regensburg) - Presenting Author

Dr Florian Meinfelder (Otto-Friedrich-Universität Bamberg)

A Novel Methodology for Improving Applications of Modern Predictive Modeling Tools to Linked Data Sets Subject to Mismatch Error
Dr Brady West (Institute for Social Research, University of Michigan-Ann Arbor) - Presenting Author

Dr Martin Slawski (Department of Statistics, George Mason University)

Dr Emanuel Ben-David (U.S. Census Bureau)

Validating matches of electronically reported fishing trips to investigate matching error
Dr Benjamin Williams (University of Denver) - Presenting Author

Dr Shalima Zalsha (NORC)

Dr Lynne Stokes (Southern Methodist University)

Dr Ryan McShane (Amherst College)

Dr John Foster (NOAA Fisheries)

10:30 - 11:00

Coffee Break

11:00 - 12:30

CONCURRENT SESSIONS E

Session 1: Watch Out - Your Phone Answered My Survey! Procesing, Compliance and Estimation Using Data Captured via Smartphone Meters and Wearable Devices

Provide or Bring Your Own Wearable Device? An assessment of compliance, adherence, and representation in a national study.
Dr Heidi Guyer (RTI International) - Presenting Author

Ms Margaret Moakley (RTI International)

Professor Florian Keusch (University of Mannheim)

Professor Bella Struminskaya (Utrecht University)

Continuous Monitoring of Health and Wellness Using Wearable Sensors: New Data Source for Social Science
Dr Dorota Temple (RTI International) - Presenting Author

Dr Meghan Hegarty-Craver (RTI International)

Dr Hope Davis-Wilson (RTI International)

Dr Edward Preble (RTI International)

Dr Jonathan Holt (RTI International)

Dr Howard Walls (RTI International)

Dr David Dausch (RTI International)

Wearables Research and Analytics Platform (WRAP?) Demo: Integrating wearables, surveys and monitoring systems
Dr Heidi Guyer (RTI International) - Presenting Author

Dr Vinay Tannan (RTI International)

Mr Ben Allaire (RTI International)

Dr Eric Francisco (RTI International)

11:00 - 12:30

CONCURRENT SESSIONS E

Session 2: Any Kinks in the Links? Exploring Data Linkage and Quality Frameworks for Modern Surveys

Survey Design Considerations for Data Linkage
Professor Sunshine Hillygus (Duke University) - Presenting Author

Professor Kyle Endres (University of Northern Iowa)

Burden, benefit, consent, and control: Moving beyond privacy and confidentiality in attitudes about administrative data in government data collection
Dr Aleia Fobia (US Census Bureau) - Presenting Author

Ms Jennifer Childs (US Census Bureau)

Dr Shaun Genter (US Census Bureau)

A Data Quality Scorecard to Assess a Data Source’s Fitness for Use
Ms Lisa Mirel (NCSES/NSF) - Presenting Author

Dr John Finamore (NCSES/NSF)

Dr Elizabeth Mannshardt (NCSES/NSF)

Dr Julie Banks (NORC)

Dr Don Jang (NORC)

Dr Jay Breidt (NORC)

11:00 - 12:30

CONCURRENT SESSIONS E

Session 3: Don't Know Much About Survey Participation? After this Session You Will!

Does intent to participate in the U.S. Census align with actual participation? The matching of public opinion data with Census data.
Dr Yazmin Garcia Trejo (U.S. Census Bureau) - Presenting Author

Ms Maranda Pepe (U.S. Census Bureau)

Ms Charlene Medou (U.S. Census Bureau)

Ms Jordan Misra (U.S. Census Bureau)

Motivations and barriers to survey participation in a smartphone-based travel app study: Evidence from in-depth interviews in Chile
Mr Ricardo Gonzalez (LEAS at Universidad Adolfo Ibañez) - Presenting Author

Miss Doerte Naber (GESIS - Leibniz Institute for the Social Sciences)

Mr Adolfo Fuentes (University of Edinburgh - LEAS at Universidad Adolfo Ibañez)

Don't Know! Don't Care? We Should! ``Don't Know
Dr Christopher Henry (Bank of Canada) - Presenting Author

Dr Daniela Balutel (York University, Canada)

Dr Kim Huynh (Bank of Canada)

Dr Marcel Voia (Universite d'Orleans, France)

11:00 - 12:30

CONCURRENT SESSIONS E

Session 4: I'm Biased Towards Accuracy! Advances in Evaluating and Adjusting Estimates within Finite Population Frameworks

Rethinking the Test Set: A Finite Population Perspective
Mr Robert Chew (RTI International) - Presenting Author

The Sensitivity of Selection Bias Estimators: A Diagnostic based on a case study and simulation
Mr Santiago Gómez (Vrije Universiteit Amsterdam) - Presenting Author

Mr Dimitris Pavlopoulos (Vrije Universiteit Amsterdam)

Mr Ton De Waal (Statistics Netherlands)

Mr Reinoud Stoel (Statistics Netherlands)

Mr Arnout van Delden (Statistics Netherlands)

Composite Weighting for Hybrid Samples
Dr Mansour Fahimi (Marketing Systems Group) - Presenting Author

“balance” - a Python package for balancing biased data samples
Dr Tal Galili (Meta) - Presenting Author

Dr Tal Sarig (Meta)

Mr Steve Mandala (Meta)

12:30 - 14:00

Group Lunch (University Restaurant) - Lunch tickets available

14:00 - 15:30

CONCURRENT (ORGANIZED) SESSIONS F

Session 1: Bridging Survey and Twitter Data: Methodology and Application

Given the large volume of opinions people express on social media, a new lens exists for measuring public opinion as a supplement to traditional survey-based methods. But systematic differences between surveys and social media–in terms of how they are collected, processed, and analyzed–mean that there is no one-to-one translation between observations from each method. To make the best use of both types of data in concert, scholars need to better understand how they differ and how to translate between them. This panel compared data on attitudes toward Covid-19 vaccination, economic threat, and schooling from (1) probability-based surveys, (2) linked Twitter posts from a subset of survey respondents who consented to data linkage, and (3) a random sample directly from Twitter. Collectively, these papers will help identify which theoretical gaps between data streams are relatively easy to bridge and which require more scholarly attention.

Chair: Lisa Singh (Georgetown University)

Lurk More (or Less): Differential Engagement in Twitter by Sociodemographics
Dr Lisa Singh (Georgetown University) - Presenting Author

Dr Michael Traugott (University of Michigan)

Dr Nathan Wycoff (Georgetown University)

Conversation Coverage: Comparing Topics of Conversation By Survey Respondents on Twitter and the General Twitter Population
Dr Ceren Budak (University of Michigan) - Presenting Author

Dr Rebecca Ryan (Georgetown University)

Mr Yanchen Wang (Georgetown University)

Can We Gain Useful Knowledge of Public Opinion from Linked Twitter Data? Reweighting to Correct for Consent Bias
Ms Jessica Stapleton (SSRS) - Presenting Author

Mr Michael Jackson (SSRS)

Ms Cameron McPhee (SSRS)

Dr Lisa Singh (Georgetown University)

Dr Trivellore Raghunathan (University of Michigan )

Comparative Topic Analysis of Tweets and Open-Ended Survey Responses on Covid-19 Vaccinations, Financial Threats, and K-12 Education
Dr Joshua Pasek (University of Michigan) - Presenting Author

Dr Leticia Bode (Georgetown University)

Dr Le Bao (Georgetown University)

14:00 - 15:30

CONCURRENT (ORGANIZED) SESSIONS F

Session 2: Leveraging data science and external data sources to adjust for total survey error in health surveys

This session features five talks on CDC’s leveraging of data science and external data sources to adjust for total survey error in health surveys:
1. Validity - Travis Hoppe - Natural Language Processing (NLP) analysis of open-ended probes to identify invalid responses
2. Measurement Error – Morgan Earp - Using regression trees to examine measurement error between self-reported and measured chronic conditions in NHANES to adjust the National Health Interview Survey (NHIS)
3. Coverage Error - Katherine Irimata - Leveraging data science and external data sources to adjust for coverage error in online panels including the National Center for Health Statistics' (NCHS) Research and Development Survey (RANDS) and create rapid surveys of health outcomes
4. Nonresponse Error - Jim Dahlhamer - Using trees to enhance nonresponse adjustment of the NHIS
5. Model Based Early Estimates Program - Lauren Rossen - Using lessons learned from the above approaches to create the model based early estimates program for NCHS


Chair: Morgan Earp (US National Center for Health Statistics)

Validity - Natural Language Processing (NLP) analysis of open-ended probes to identify invalid responses
Benjamin Rogers (CDC/DDPHSS/NCHS/DRM) - Presenting Author

Measurement Error – Using regression trees to examine measurement error between self-reported and measured chronic conditions in NHANES to adjust the National Health Interview Survey (NHIS)
Morgan Earp (US National Center for Health Statistics) - Presenting Author

Coverage Error - Leveraging data science and external data sources to adjust for coverage error in online panels including the National Center for Health Statistics' (NCHS) Research and Development Survey (RANDS) and create rapid surveys of health outcomes
Katherine Irimata (CDC/DDPHSS/NCHS/DRM) - Presenting Author

Nonresponse Error - Using trees to enhance nonresponse adjustment of the NHIS
Jim Dahlhamer (CDC/DDPHSS/NCHS/DHIS) - Presenting Author

Model Based Early Estimates Program - Using lessons learned from the above approaches to create the model based early estimates program for NCHS
Lauren Rossen (CDC/DDPHSS/NCHS/DRM) - Presenting Author

14:00 - 15:30

CONCURRENT (ORGANIZED) SESSIONS F

Session 3: Missing Data: The Where, the How, and the Why

Missingness is ubiquitous in surveys. Whether by design or accidental, missing data impedes statistical analyses and hinders generalizability of inferences. Imputation directly models the observed data, and weighting models the probability of a unit being observed: both somehow “learn” from observed data and usually assume that data is missing at random (MAR). When data is missing not at random (MNAR), the missing data mechanism needs to be modeled. This can be complex, rely on unverifiable assumptions and require deep insight into the missing data mechanism, or “How” the data is missing. Strategies for handling MNAR data leverage missing data patterns, or “Where” data is missing, reasons for missingness, or “Why” the data is missing, and external information. Although none are free of assumptions, some approaches can be more realistic and/or flexible than others. The proposed session includes four talks and a discussion from a demographically diverse group of scholars.

Chair: Ali Shojaie (RTI) / Discussant: Rod Little (University of Michigan)

It’s who is missing that matters: Can a nonignorable missingess mechanism explain bias in estimates of COVID-19 vaccine uptake?
Rebecca Andridge (Division of Biostatistics, The Ohio State University College of Public Health) - Presenting Author

Understand, Detect, and Treat Missing Data in Administrative Data
Dan Liao (Senior Research Statistician at RTI International) - Presenting Author

Marcus Berzofsky

Lance Couzens

Approaches for Incorporating Summary Birth History Data in Child Mortality Estimation
Katie Wilson (Department of Biostatistics, University of Washington) - Presenting Author

Likelihood-Based Inference for the Finite Population Mean with Post-Stratification Information Under Non-Ignorable Non-Response
Dr Sahar Zangeneh (RTI International) - Presenting Author

14:00 - 15:30

CONCURRENT (ORGANIZED) SESSIONS F

Session 4: Modernizing the U.S. Census Bureau’s Statistical Foundation through Enterprise Frames

The U.S. Census Bureau has long maintained frame-like data on individuals, households, businesses, and governments to support census and survey operations. However, these data are rarely used for enterprise-wide operations, despite abundant evidence of the value of integrating data to produce new and/or improved statistical products. The agency has established the Frames Program to meet the need for a modernized data infrastructure with a linked universe of information from which sampling can occur and statistical summaries directly produced. During this session, Census Bureau staff will summarize objectives and achievements of the nascent Frames Program, highlight the evolution of the existing Business, Job, and Geospatial Frames, detail efforts to establish a linkage infrastructure to better leverage these resources, and introduce the new enterprise frame: the Demographic Frame. Three presentations will detail initial assessments of the fitness for use of the Demographic Frame in census and survey taking.

Chair: Victoria Velkoff (U.S. Census Bureau)

Enterprise Frames: Creating the Infrastructure to Enable Transformation
Dr Anthony Knapp (U.S. Census Bureau) - Presenting Author

Dr Lori Zehr (U.S. Census Bureau)

Demographic Frame: Leveraging Person-Level Data to Enhance Census and Survey Taking
Dr Jennifer Ortman (U.S. Census Bureau) - Presenting Author

Using a Demographic Frame to Potentially Enhance the American Community Survey
Ms Deliverance Bougie (U.S. Census Bureau) - Presenting Author

Obtaining Non-Employer Business Owner Data from the Demographic Frame
Mr Michael Ratcliffe (U.S. Census Bureau)

Ms Erica Marquette (U.S. Census Bureau) - Presenting Author

Demographic Frame Evaluation
Ms Aliza Kwiat (U.S. Census Bureau)

Mr Matt Herbstritt (U.S. Census Bureau) - Presenting Author

15:30 - 16:00

Coffee Break

16:00 - 17:00

Special Networking Events

17:00 - 17:45

Closing Remarks