BigSurv23 Program
Please note that the program is subject to change. Kindly refer to the latest updates provided at the conference website.
All academic events will take place at the Campus of San Francisco de Quito University.
Saturday 28th October
08:00 - 17:00 | Download the BigSurv23 Program in PDF format here. |
09:00 - 10:30 | CONCURRENT SESSIONS D
Machine Learning Assisted Autocoding Tools for Improving the Experience of Manual Coding of Real-World Big Text Data Considerations for data quality in open-ended embedded probes across population and methodological subgroups Multi-label classification of open-ended questions with BERT Linguistic shifts and topic drift: Building adaptive Natural Language Processing systems to code open-ended responses from multiple survey rounds |
09:00 - 10:30 | CONCURRENT SESSIONS D
Assessing the downstream effects of training data annotation methods on supervised machine learning models What If? Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness Applied Strategies for Advancing Racial Equity and Addressing Bias in Big Data Research My Training Data May Need a Trainer: Applying Population Representation Metrics to Training Data to Assess Representativity, Machine Learning Model Performance and Fairness |
09:00 - 10:30 | CONCURRENT SESSIONS D
Open-Ended Survey Questions: A comparison of information content in text and audio response formats Innovating web probing: Comparing text and voice answers to open-ended probing questions in a smartphone survey API vs. human coder: Comparing the performance of speech-to-text transcription using voice answers from a smartphone survey Assessing Performance of Survey Questions through a CARI Machine Learning Pipeline |
09:00 - 10:30 | CONCURRENT SESSIONS D
Beware of propensity score matching as a method for the integration of different data sets A Novel Methodology for Improving Applications of Modern Predictive Modeling Tools to Linked Data Sets Subject to Mismatch Error Validating matches of electronically reported fishing trips to investigate matching error |
10:30 - 11:00 | Coffee Break |
11:00 - 12:30 | CONCURRENT SESSIONS E
Provide or Bring Your Own Wearable Device? An assessment of compliance, adherence, and representation in a national study. Continuous Monitoring of Health and Wellness Using Wearable Sensors: New Data Source for Social Science Wearables Research and Analytics Platform (WRAP?) Demo: Integrating wearables, surveys and monitoring systems |
11:00 - 12:30 | CONCURRENT SESSIONS E
Survey Design Considerations for Data Linkage Burden, benefit, consent, and control: Moving beyond privacy and confidentiality in attitudes about administrative data in government data collection A Data Quality Scorecard to Assess a Data Source’s Fitness for Use |
11:00 - 12:30 | CONCURRENT SESSIONS E
Does intent to participate in the U.S. Census align with actual participation? The matching of public opinion data with Census data. Motivations and barriers to survey participation in a smartphone-based travel app study: Evidence from in-depth interviews in Chile Don't Know! Don't Care? We Should! ``Don't Know |
11:00 - 12:30 | CONCURRENT SESSIONS E
Rethinking the Test Set: A Finite Population Perspective The Sensitivity of Selection Bias Estimators: A Diagnostic based on a case study and simulation Composite Weighting for Hybrid Samples “balance” - a Python package for balancing biased data samples |
12:30 - 14:00 | Group Lunch (University Restaurant) - Lunch tickets available |
14:00 - 15:30 | CONCURRENT (ORGANIZED) SESSIONS F
Given the large volume of opinions people express on social media, a new lens exists for measuring public opinion as a supplement to traditional survey-based methods. But systematic differences between surveys and social media–in terms of how they are collected, processed, and analyzed–mean that there is no one-to-one translation between observations from each method. To make the best use of both types of data in concert, scholars need to better understand how they differ and how to translate between them. This panel compared data on attitudes toward Covid-19 vaccination, economic threat, and schooling from (1) probability-based surveys, (2) linked Twitter posts from a subset of survey respondents who consented to data linkage, and (3) a random sample directly from Twitter. Collectively, these papers will help identify which theoretical gaps between data streams are relatively easy to bridge and which require more scholarly attention.Chair: Lisa Singh (Georgetown University) Lurk More (or Less): Differential Engagement in Twitter by Sociodemographics Conversation Coverage: Comparing Topics of Conversation By Survey Respondents on Twitter and the General Twitter Population Can We Gain Useful Knowledge of Public Opinion from Linked Twitter Data? Reweighting to Correct for Consent Bias Comparative Topic Analysis of Tweets and Open-Ended Survey Responses on Covid-19 Vaccinations, Financial Threats, and K-12 Education |
14:00 - 15:30 | CONCURRENT (ORGANIZED) SESSIONS F
This session features five talks on CDC’s leveraging of data science and external data sources to adjust for total survey error in health surveys:1. Validity - Travis Hoppe - Natural Language Processing (NLP) analysis of open-ended probes to identify invalid responses 2. Measurement Error – Morgan Earp - Using regression trees to examine measurement error between self-reported and measured chronic conditions in NHANES to adjust the National Health Interview Survey (NHIS) 3. Coverage Error - Katherine Irimata - Leveraging data science and external data sources to adjust for coverage error in online panels including the National Center for Health Statistics' (NCHS) Research and Development Survey (RANDS) and create rapid surveys of health outcomes 4. Nonresponse Error - Jim Dahlhamer - Using trees to enhance nonresponse adjustment of the NHIS 5. Model Based Early Estimates Program - Lauren Rossen - Using lessons learned from the above approaches to create the model based early estimates program for NCHS Chair: Morgan Earp (US National Center for Health Statistics) Validity - Natural Language Processing (NLP) analysis of open-ended probes to identify invalid responses Measurement Error – Using regression trees to examine measurement error between self-reported and measured chronic conditions in NHANES to adjust the National Health Interview Survey (NHIS) Coverage Error - Leveraging data science and external data sources to adjust for coverage error in online panels including the National Center for Health Statistics' (NCHS) Research and Development Survey (RANDS) and create rapid surveys of health outcomes Nonresponse Error - Using trees to enhance nonresponse adjustment of the NHIS Model Based Early Estimates Program - Using lessons learned from the above approaches to create the model based early estimates program for NCHS |
14:00 - 15:30 | CONCURRENT (ORGANIZED) SESSIONS F
Missingness is ubiquitous in surveys. Whether by design or accidental, missing data impedes statistical analyses and hinders generalizability of inferences. Imputation directly models the observed data, and weighting models the probability of a unit being observed: both somehow “learn” from observed data and usually assume that data is missing at random (MAR). When data is missing not at random (MNAR), the missing data mechanism needs to be modeled. This can be complex, rely on unverifiable assumptions and require deep insight into the missing data mechanism, or “How” the data is missing. Strategies for handling MNAR data leverage missing data patterns, or “Where” data is missing, reasons for missingness, or “Why” the data is missing, and external information. Although none are free of assumptions, some approaches can be more realistic and/or flexible than others. The proposed session includes four talks and a discussion from a demographically diverse group of scholars.Chair: Ali Shojaie (RTI) / Discussant: Rod Little (University of Michigan) It’s who is missing that matters: Can a nonignorable missingess mechanism explain bias in estimates of COVID-19 vaccine uptake? Understand, Detect, and Treat Missing Data in Administrative Data Approaches for Incorporating Summary Birth History Data in Child Mortality Estimation Likelihood-Based Inference for the Finite Population Mean with Post-Stratification Information Under Non-Ignorable Non-Response |
14:00 - 15:30 | CONCURRENT (ORGANIZED) SESSIONS F
The U.S. Census Bureau has long maintained frame-like data on individuals, households, businesses, and governments to support census and survey operations. However, these data are rarely used for enterprise-wide operations, despite abundant evidence of the value of integrating data to produce new and/or improved statistical products. The agency has established the Frames Program to meet the need for a modernized data infrastructure with a linked universe of information from which sampling can occur and statistical summaries directly produced. During this session, Census Bureau staff will summarize objectives and achievements of the nascent Frames Program, highlight the evolution of the existing Business, Job, and Geospatial Frames, detail efforts to establish a linkage infrastructure to better leverage these resources, and introduce the new enterprise frame: the Demographic Frame. Three presentations will detail initial assessments of the fitness for use of the Demographic Frame in census and survey taking.Chair: Victoria Velkoff (U.S. Census Bureau) Enterprise Frames: Creating the Infrastructure to Enable Transformation Demographic Frame: Leveraging Person-Level Data to Enhance Census and Survey Taking Using a Demographic Frame to Potentially Enhance the American Community Survey Obtaining Non-Employer Business Owner Data from the Demographic Frame Demographic Frame Evaluation |
15:30 - 16:00 | Coffee Break |
16:00 - 17:00 | Special Networking Events |
17:00 - 17:45 | Closing Remarks |