BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Peering through Transition Lenses: New Landscapes and Horizons for Survey Research and Social Science

Chair Dr Zeeshan-ul-hassan Usmani (MiSK Foundation)
TimeFriday 26th October, 09:45 - 11:15
Room: 40.008

The Future Is Now: How Surveys Can Harness Social Media to Address 21st-Century Challenges

Final candidate for the monograph

Dr Amelia Burke-Garcia (Westat) - Presenting Author
Mr Brad Edwards (Westat)
Dr Ting Yan (Westat)

Download presentation

Conducting survey research is increasingly difficult. Researchers face an ever more complex combination of technological and social change, increasing costs and making it more difficult to recruit sampled respondents initially and to retain them for panel studies. The extremely rapid development of communication and information technology and the synergy between them are changing human social interaction and may ultimately render the current survey paradigm (researcher-centered “structured conversation” with a willing randomly-selected respondent) outmoded. We envision a new survey world “that uses multiple data sources, multiple modes, and multiple frames” (Lyberg & Stukel). It leverages social media to make survey research more people-centered, efficient, and relevant. Early research is promising. Hsieh and Murphy examined Twitter as a means to answer research questions that might have been informed by surveys a decade ago. Baker took a broader view of “big data” with a similar approach. Other research has established that social media can reduce attrition in panel studies, increase participant engagement in follow-up protocols, and identify and locate panel members who otherwise would be hard to reach. But a comprehensive review of the ways social media can support, even transform, surveys is lacking. Tourangeau’s framework for defining difficult-to-survey populations is based loosely on survey life cycle stages: challenges in sampling, identifying, reaching, and persuading. One could argue that the whole population in many developed countries has become challenging to survey, but these have been overcome in other, less-developed parts of the world with the ubiquity of mobile phones, internet penetration, social media, and GIS data. We adapt Tourangeau’s framework to explore social media’s potential for changing survey research, using recent real world examples: Sampling Challenges: Given the use patterns and vast amounts of behavioral data on platforms like Google, Twitter, and Facebook/Instagram, social media could be a source for research panels, supporting pinpointing questions for self-report. We compare their value with that of probability survey designs. Identifying People and “Best” Data Sources: Gaining access to social media data may help augment or replace survey data. Research objectives may be refined to identify data elements that don’t need to meet the same thresholds for data quality and representativeness as surveys, in order to create survey designs that ask respondents questions that only they can – and are willing to – answer. We consider how respondents may be willing to grant access to their social media by “liking” a survey (2016 ANES) and how crowdsourcing can be used to collect survey data more cheaply than traditional survey approaches (mTurk/TurkPrime). Reaching: Social media targeting can be a singularly effective means for reaching survey respondents. We explore how the basic targeting provided by platforms can be enhanced through audience psychographic data (U.S. Census in Savannah). Persuading: Social media tools also have great potential for engaging respondents’ interest in survey topics, through ads tailored to respondent demographic/psychographic subgroups and designed to fit specific platforms. We discuss some examples (U.S. Census, other Westat work).


New Paradigms in Online Declarative Data Collection

Dr Kamil Wais (7N) - Presenting Author

Download presentation

Offline research paradigms are still dominant, even in the world of online research techniques. Most online surveys are mainly offline questionnaires converted into more or less advanced online HTML forms. Hence, they are not natively online—not only do they not exploit the full potential of Internet technologies, but they are outdated already by their inherited design. Thus, online techniques inherit some undesirable offline characteristics: high burden and low value for respondents and non-equivalent information exchange between a respondent and a researcher. This is one of the reasons why social researchers usually fail to establish long-term relationships with respondents—the respondents devote their time and share their knowledge, but receive nothing valuable in return. Hence, their intrinsic motivation is low, resulting in long-term trends of decreasing response rates in social studies.

Another reason for decreasing response rates is the prevalent model of respondent-researcher relationship, which is based on the following heuristic: 1) find potential respondents within a time window that is convenient for a researcher and not necessarily convenient for the respondents; 2) convince them to devote their time to work through a questionnaire, which is often unattractive and time-consuming; 3) convince them to share valuable, often quite private and sensitive information, with someone who will benefit from it; 4) do not share any valuable information with respondents in return; 5) give the respondents as little as possible in return to be as cost-effective as possible; 6) repeat the whole process as many times and as for many respondents as necessary.

In the long run this model is clearly not sustainable, but we—social scientists—act like the general population of respondents is infinite, or it is an easily renewable resource. This situation can be seen as the classic "tragedy of the commons" describing a situation where a shared resource is spoiled and depleted by collective actions of all the actors driven by their individual self-interest, which is at odds with the long-term interests of the common good.

We need to seek, and to be willing to accept, new paradigms in social science research techniques that will allow us to develop new online research tools designed to build long-term relationships with respondents.

We could try to achieve this goal by providing respondents with highly customised instant feedback. This approach, based on valuable information exchange, could be implemented as after-question and after-survey feedback in a new type of online surveys.

The real-world examples show that even simplistic versions of this approach can be highly successful. Salary surveys, which ask people about their current salary and compare it to the salaries of people similar to them, have been proven to be highly scalable and able to collect millions of salary profiles.

While thinking about promising potential and possible implementations of this approach in social sciences, we also need to think about possible theoretical, empirical, and technological approaches to this problem, and methodological concerns that should be addressed.


Mixed Methods Approaches to Programmatic Social Science Research in Applied Setting

Dr Kate Johnson-Grey (Google)
Dr Molly Delaney (Google) - Presenting Author

Traditionally, social scientists have relied heavily on surveys for hypothesis generation and testing in the workplace. The challenge in using surveys as a single source of data is that complex phenomenon may not be fully captured in a self-report, snapshot format. For example, participants may not have insight into what drives their behavior; responses may be subject to social desirability concerns or sampling biases; and well-written survey items are intentionally narrow in scope, therefore unable to to capture unexpected contextual or perspective-based nuances.

As the social world continues to expand into the digital realm, naturally-occurring interactions are increasingly available as a new data source for social science research both within and outside of the workplace. Social scientists have begun incorporating a wider range of automated text analysis tools to take advantage of these burgeoning new data sources with substantial benefit to the field, tackling problems such as how to quantify perceptions of traditional workplace processes (e.g., hiring, performance management, and retention). For example, at Google we have leveraged longitudinal social science survey results and machine learning to identify key drivers of productive teamwork. Companies have used social network analysis to identify how information flows throughout the workplace and where blockages may be getting in the way of innovation. And at Riot Games, researchers quantified their employees’ use of authoritative language use in games to address toxic gameplay concerns and drive customer satisfaction.

In this session, we discuss methods for incorporating text analytic, big data techniques and traditional social scientific experimentation into programmatic research design. We outline how these methods have been used for both exploratory hypothesis generation and confirmatory theory testing, and provide guidance for how to select techniques and resources based on the available data and research questions one hopes to address. Finally, we end our discussion with strengths and weaknesses of these techniques for addressing concerns of replicability and generalizability in social science research.


Statistical Inference Aided by Big Data

Mr Masahiko Aida (Civis Analytics) - Presenting Author

Survey data is a powerful and flexible tool for measurement of public opinion. The method of collecting representative sample and means of estimating population parameter is well established and understood.

However, executing ideal sampling and data collection is becoming harder each year, because of changes in our society. Once upon a time, area sampling was standard, and people could be reached by canvassing. People responded to mail surveys and solicitation of telephone survey calls. None of these are true anymore today.

While changes made some of the old means of survey research obsolete, we have yet to establish a new one. Though, we do see opportunities emerging. Like many researchers, the author considers the fusion of survey research and so-called big data a potentially promising avenue.

While it is difficult to define big data, for the sake of this research, the author considers a collection of the database that encompasses administrative records, publicly available official statistics, consumer data, voter data, and transaction data as such. The organization author calls such data a "basefile," as it serves as a basis of modern analytics.

Use of such database became economically feasible due to the rapid advancement of cloud-based storage and computing resources such as Amazon AWS. The computing power that used to require a large mainframe or UNIX cluster is now available at the fingertips of each analyst at a reasonable cost.

The big data has its problems. Many of them suffer coverage problems as they rarely have universal coverage of the population of interest. Data curation process is often biased - like customer database which systematically lacks data of non-customer is unfit to make a generalized claim of all potential customers.

The author believes, by combining two classes of data (survey data and big data), the researcher can draw the statistical inference that takes advantage of the strength of both data. Survey data is less biased, and research can ask questions of the choice. On the other hand, big data has less variance and once collected, data access is inexpensive. Indeed, the effort to combine cheap data and higher quality data have been attempted from early days of sampling (radio listening study by Hansen, Hurwitz,‎ & Madow, 1953).

This paper will report proposed end-to-end workflow of statistical inference in the day of big data and discuss challenges at each step.

(1) Survey data collection
(2) Database management (basefile)
(3) Record linkage of survey data and basefile
(4) Machine learning and synthetic estimates
(5) Inference with survey data and synthetic estimates