BigSurv20 program


Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December

Back

Getting the combination right: Further exploring the use of surveys with other data sources

Moderator: Barry Schouten (jg.schouten@cbs.nl)
Slack link
Quick Zoom

Detailed zoom login information
Friday 20th November, 11:45 - 13:15 (ET, GMT-5)
8:45 - 10:15 (PT, GMT-8)
17:45 - 19:15 (CET, GMT+1)

Applying the multi-level, multi-Source (MLMS) approach to the 2016 General Social Survey

Dr Tom W Smith (NORC) - Presenting Author
Dr Jaesok Son (NORC)

To more fully understand human society, surveys need to collect and analyze multi-level and multi-source data (MLMS data). Methodologically, the use of MLMS data in general and the augmenting of respondent-supplied information with auxiliary data (AD) from sample frames, other sources, and paradata in particular can notably help to both measure and reduce total survey error. It can be employed to detect and reduce nonresponse bias, to verify interviews, to validate information supplied by respondents, and in other ways. Substantively, MLMS data can greatly expand theory-driven research such as by allowing multi-level, contextual analysis of neighborhood, community, and other aggregate-level effects and by adding in case-level data that either cannot be supplied by respondents or is not as accurate and reliable as information from AD (e.g. health information from medical records vs. recall reports of medical care). Thus, the MLMS approach boosts both the methodological vigor and substantive power of survey research. It is a general framework for conducting and improving survey research.
MLMS starts by retaining information that is part of the sample frame used for the survey. The GSS is a multi-stage, area probability sample of households in the United States with its sample points based on the US decennial Census (DC) and the American Community Survey (ACS). Within the selected sample points, addresses are selected from the US Postal Service’s DSF augmented by NORC’s own listing of addresses in sampling points for which the postal listings were not sufficient. These were mostly rural areas where home delivery is limited and therefore home addresses are not available. NORC used the DSF provided by their contractor Valassis. The augmented DSF provides household–level information. The DC and ACS data from the sample frame provide aggregate-level information. Since this Census-based sample frame data set was considerably augmented by additional data from the Census, the combined Census-based data are discussed below.
Supplementing the sample frame data, the addresses and sample points were linked to 12 auxiliary data (AD) sources. The Postal Delivery Sequence Data were available at the household level. There were three commercial databases, all with household-level information and one of these also included some aggregate-level information. (Due to contractual restrictions, the three commercial databases cannot be identified.) The commercial databases were mostly collections of various publicly available information, such as voting records and donations, but they also added some estimates based on their models (e.g. consumption patterns and life styles). The 8 remaining databases only have information at the aggregate level starting in some cases with Census block groups and expanding in some cases to counties or metropolitan areas. Aggregate-level information also included measures such as demographic information and poverty rates in the ACS data, as well as distance between households and various other locations (e.g. sites, facilities) in the EPA/FEMA and Street Pro databases.

Measuring migration stocks using traditional and social media data

Dr Arkadiusz Wiśniowski (University of Manchester) - Presenting Author
Dr Dilek Yildiz (Vienna Institute of Demography)
Professor Guy Abel (Asian Demographic Research Institute)
Professor Ingmar Weber (Qatar Computing Research Institute)
Professor Emilio Zagheni (Max Planck Institute for Demographic Research)
Mr Stijn Hoorens (RAND Europe)

Download presentation

Having up-to-date information about the nature and extent of migration within the EU is important for policy making, such as labour market policy or social services. However, timely and reliable statistics on the number of EU citizens residing in or moving across other Member states are difficult to obtain. Official statistics on EU movers are developed by national offices of statistics and published by Eurostat, but they come with a considerable time lag of about two years.
With the rise of the Internet, new data sources offer opportunities to complement traditional sources for EU mobility statistics. In particular, the availability of high quantities of data derived from social media has opened new opportunities. Therefore, we propose a statistical model that integrates data on migrant stocks within the EU from traditional sources such as census, population registers and Labour Force Survey, with new forms of data derived from Facebook. Then, we investigate the potential of the model to facilitate “now-casting”, that is, providing nearly real-time estimates that can serve as early warnings about changes in EU mobility. The model provides measures of uncertainty for the estimates of migrant stocks.
In the model, we assume that the data from each source are measured with bias and accuracy specific to that source and year of the measurement. For instance, we assume that census data are typically the most accurate and with the smallest bias, Facebook-derived stock data are heavily biased, whereas Labour Force Survey are subject to largest inaccuracy due to sampling errors. We correct for these inadequacies by incorporating informative prior distributions for the model parameters that allow borrowing of information across time and among countries.



Identification of emerging skills in job advertisements

Dr Michael Stops (Institute for Employment Research) - Presenting Author
Dr Britta Matthes (Institute for Employment Research)

Digitisation, together with other technological, environmental and social challenges, is expected to lead to a further accelerated transformation of the world of work. The question arises which competencies and skills are needed to effectively cope the resulting structural changes. A greater part of the answer to this question can be derived from information about the most up-to-date requirements for jobs that firms are planning to create in the (near) future. We argue that such information can be found in job advertisements, because job advertisements potentially list skill requirements that, beside others, were generally not known before or that were not yet specific to the job title of the job advertisement.
We present our initial analyses of job advertisements that stem from a large German employment website that is run by the German Federal Employment Agency. To begin with, we rely on existing dictionaries on skill requirements that are used in personnel administration and career advice to systematically evaluate the unstructured texts of job advertisements. Before we assign the terms from the text to the terms from the dictionaries, the text material from the job advertisements and the dictionaries were pre-processed in manifold ways, e.g., to reduce different inflected words with the same meaning to their same word stem. Terms from the text, that we could not assign, are manually assessed by an expert group that classified these terms either as new, as a known term from the dictionaries or as not being a skill requirement. We validated the results by conducting a consensual validation procedure. To generate suggestions for this assignment, we used word-embedding procedures beside other simpler search strategies.
Our method allows us indeed to identify “new skills” and also an assessment of their current importance.



Measuring employment rate using job vacancy indicator

Mr Irfan Sampe (Bank Indonesia)
Mrs Anggraini Widjanarti (Bank Indonesia) - Presenting Author
Ms Arinda Dwi Okfantia (Bank Indonesia)

The way to look for a job in Indonesia is changing rapidly. The internet is the preferred way for Indonesian job seekers to look for work, shown by the number of visits to two of the largest online job vacancy ads in Indonesia that has reached 10.81 million per month. With the job vacancy ads data are publicly available, there are a lot of country that utilize it to construct labor indicator, especially labor demand. Currently, labor indicators in Indonesia are very scarce. They solely rely on National Labour Force Survey that conducted semi-annual and published two and half month after the survey. Therefore, it is pivotal to develop timely and punctual labor indicator in Indonesia. In this paper we study employment rates and their interconnection with labor demand that we extract from job vacancy ads data. Our results show that the usage of job vacant indicator contribute to improving the scarceness of labor indicators in Indonesia, specifically it can be used as prompt indicator to measure employment rates in Indonesia.