Skip to main content

Italy

Communication strategies for the recovery of non-response: the example of surveys on Building Permits - Federica Pellizzaro, Gloria Carpita and Valerio Torcasio (Istat, Italy)

Languages and translations
English

Communication strategies for the recovery of non-

response: the example of surveys on Building Permits

Federica Pellizzaro, Gloria Carpita and Valerio Torcasio (Istat, Italy)

Surveys concerning the Institutions involve different types of respondents with different characteristics:

Institutions operating in the social-health and social-care area and local Institutions such as regions, provinces

and municipalities.

Among the various types of surveys on Institutions, we can find those conducted by Istat with an infra-annual frequency (monthly or quarterly) defined as "conjunctural surveys". They are characterized by a formal communication process that follows a systematic procedure from the first

contact with the respondent to the recovery of non-response and there is an obligation to reply and, in some cases, pecuniary sanction. Two examples of this type of survey are the Survey on Building Permits and the Quick Survey on Building

Permits. Both surveys involve the Italian municipalities that monthly validate on line questionnaires to make

them available to ISTAT. They collect information on new residential or non-residential building projects or

enlargement of pre-existing buildings with the aim of producing a set of indicators and variables that will be

transmitted to Eurostat.

The first contact with the respondent begins with sending the informative letter that describes the aim of

the survey, the mandatory nature of the response, references to which to turn in case of problems in filling

in the questionnaire, what will happen with the information provided etc… Despite the formal language and

style of the informative letter, it is underlined how important the individual participant’s response is for the

success of the survey.

Since 2019 when this survey was taken over by the Istat Data Collection Directorate, data collection

management and communication strategies have evolved. Specifically the improvement of automated IT

monitoring procedures have guaranteed a higher quality of the data.

One of the most successful organizational innovations is the use of Certified Electronic Mail (PEC) for the management of communications, reminders, and other notices to the survey units (municipalities) to search to minimize the number of non-replies and late dispatches of supplies for individual months. PEC provides the sender the certainty of sending and delivering messages in maximum security with the same legal value of the registered letter with acknowledgment of receipt. We worked on writing texts of communications to be sent to the respondents and on the design of

the very strict planning of the calendar for mass sending of reminders and warnings via PEC and

ordinary mail, aimed at preventing the respondent units from failing to respond.

This scheduled communication process ensure a greater timeliness of data collection in order to

achieve the decrease of the number of municipalities sanctioned for non-compliance for both surveys. In

this scheduled contact with the respondent, the respondent is invited to send the data relating to a specific

month not yet received to allow Istat to comply with the terms provided for by the law.

To stimulate the respondent's participation in the survey and to avoid the risk of incurring a

pecuniary sanction, a reminder notice is sent in the recovery phase of the non-responses close to

the deadline of the date of dispatch of the monthly supply (7 days before that date), a reminder

shortly after the expiration date (7 days after the expiration date) and a last notice close to the

final date to avoid incurring a penalty (14 days after the expiry date of the supply) via PEC and

mailbox of institutional e-mail.

In addition to all this, we also started to adopt a systematic procedure for sending reminders to non-responding Municipalities. To implement this operation of sending notices, a standardized statistical procedure has been developed

with SAS (Statistical Analysis System), that is a command-driven software package used for statistical

analysis and data visualization, and that allows us to know, in real time, which municipalities are in default

on a specific date. This means that we can send the communication letters to the municipalities concerned

in a precise manner and by inserting specific details on the missing data and on the survey.

We worked on the design, implementation and ordinary management of the generalized

procedures in SAS language for the preparation, normalization and standardization of the lists of

municipalities for the various mass shipments as well as for the monitoring of data collection

activities and for the simplification of manual procedures.

In this context, other procedures have been created in SAS language useful for the elaboration of

survey monitoring tables to verify the response rate, as well as for the analysis of distribution of

responding municipalities compared to the expiry dates for sending the data.

As part of this activity and following the introduction of the new complex sanctionability criteria

introduced in the context of short-term statistics, we have also edited and prepared the lists of

units in breach of the obligation to respond. The IT systems and procedures (Oracle and SAS)

necessary for monitoring the subsequent editions of the investigation were therefore prepared

and the structured output files were produced according to the layouts agreed with the initiative

that deals with preparing the assessment files.

The procedures in SAS language for the extraction of the lists of defaulting units follow complex

criteria of sanctions in the economic context, based on the dual criterion of compliance with the

punctual monthly deadlines with a tolerance period and maximum admissible annual cumulative

delay. Processing was then carried out to create the lists of survey units to be subjected to the

sanctioning procedure following the failure to transmit the data within the times established by

the survey information.

Consequentially, both in the reminder phase and in the formal notice phase, this procedure has given excellent results. So all this has led to a net decrease in the number of responding units sanctioned. Over time we have seen a substantial increase in the sending of data within the established time. The entire process aimed at acquiring greatest number of information has led to a net decrease in the number of non-responses and units sanctioned. Good communication is not just about providing respondents with information about their obligations and

deadlines. Respondents need to be sure that data submitted by them really matters for statistics and

community and that the survey as it is carried out by a public entity that will make these data available for

development projects in the country.

Improvement data collection for the Italian Road Accident Survey with injured and dead persons - Francovich L, Santorsa M. and Ielpo R. (Istat, Italy)

Languages and translations
English

1

UNECE Expert Meeting on Statistical Data Collection 2023

12 – 14 June 2023

Data collection improvement for the Italian Road Accident Survey with fatalities and injuries

20221

Francovich Lisa2, Istat- Italian National Statistical Institute, [email protected]

Santorsa Maria I.3, Istat- Italian National Statistical Institute, [email protected]

Ielpo Roberto4, Istat- Italian National Statistical Institute, [email protected]

Abstract

The working reorganization undergone in Istat in 2021 significantly changed the role and functions

of the Istat territorial offices, with an important impact on the activities of the Central Directorate

for Data Collection (DCRD). This led to review the data production processes to adapt them to the

new organizational context, in particular of some processes that over the years had been

decentralized on the territory, as was the case for Road Accidents Survey. In this work, we focus on

the data collection new methods applied in 2022. The aim is describing them and highlight how

they can guarantee and improve the efficiency in some phases of the process, right during a time of

transition towards a new organizational model.

Keywords

Data collection, road accidents, process efficiency, quality of statistics, data correction, respondent

1. Introduction

Istat offers the cognitive framework on road accidents in Italy through two surveys: a monthly

survey aimed at collecting detailed information on road accidents with fatalities and personal

injuries and aimed at deepening the knowledge of the phenomenon. And a quarterly survey, carried

out with the collaboration of the Municipal Police in about 200 municipalities throughout the

country and made for collecting summary data on the number of accidents, deaths and injuries and

producing preliminary estimates on road accidents in urban areas. Both are categorized as surveys

of public interest, are included in the National Statistical Program, and provide for the obligation to

respond for public entities.

In this paper the focus is on the monthly survey with the specific objective of describing the

measures taken to standardize and automate the process of quality control of the collected data and

the correction activities undertaken through the respondents’ re-contact. To understand the actions

carried out it is important to start from the survey’s specificities and the description of context in

which they took place.

2. The monthly Road Accidents with personal injuries and fatalities Survey and its

organizational context

The survey is carried out with the collaboration of the police forces responsible for traffic control

and traffic regulation on the roads, mainly Traffic Police, Carabinieri Stations and Municipal police.

1 Extended abstract 2 Paragraph 1, 4 3 Paragraph 2,3,5 4 Support in the production of the document and production of tables and graphs

2

Based on the definition of road accident5 established by international standards and adopted in Italy,

road accidents that fall in the survey field are those recorded by a Police Authority, occurred in

streets or squares open to public traffic, with at least one vehicle involved, and resulting in injuries

or fatalities (within 30 days). Therefore, are excluded from the survey those road accidents that do

not result in fatalities or injuries or that do not occur in public traffic areas, or that do not involve

vehicles. The survey unit is therefore the single road accident with fatalities or injuries to persons

and the information collected refers to the time when the accident occurred.

For each road accident the Police Authority that recorded it must transmit to Istat a series of

detailed information aimed at: locate the accident in time (date and time) and space (municipality,

type and name of the road); describe the road characteristics (pavement, road surface,

intersection/straight road, presence and type of road signs) and the weather and light conditions;

reconstruct the accident dynamic by specifying the nature, the circumstances that supposedly

caused it, the type and characteristics of the vehicles involved; specify the information on the driver

of the vehicles involved (age, sex, nationality, type of driving license) and the consequences for

persons (name of the injured or dead and hospital they were taken).

The survey is carried out with the cooperation of ACI (Automobile Club of Italy) and other local

organizations in a complex and articulated context. In fact, since 1999, Istat has enhanced its

collaboration at local level with provincial (NUTS3 level) or regional authorities (NUTS2 level)

that actively participate in the survey phase, through special agreements (Memorandum of

understanding and Bilateral Conventions). In addition, since 2007, a process of decentralization of

the survey at a regional level has been enhanced, involving Istat Territorial Offices present in all

regions (henceforth referred to as UT) in order to improve the level of coverage and quality of the

collected information. This process concerned the Umbria, Campania, Basilicata, Marche, Molise

and Abruzzo regions.

Here are the three organizational models that characterized the survey until 2021 (Figure 2):

- Standard flow, with direct data sending by the Municipal police to Istat, and is adopted in

Valle d'Aosta/Vallée d’Aoste, Sicilia and Sardegna;

- Data collection decentralization to UTs, as well as monitoring, control and correction

activities (Umbria, Campania, Basilicata, Marche, Molise and Abruzzo);

- Decentralization to Province and Region authorities of data collection and monitoring. It

is adopted among regions adhering to the Memorandum of understanding (Tuscany,

Piemonte, Lombardia, Emilia-Romagna, Puglia, Friuli-Venezia Giulia, Veneto, Liguria,

Calabria and Lazio) or to the Bilateral Conventions in Bolzano/Bozen and Trento

Autonomous Provinces.

Organizational specificities in the territory have also led to the adoption of a flexible data flow

system. At present, there are different ways and timing of sending data to Istat: Traffic Police and

the Carabinieri stations use a decentralized model on a national basis, irrespectively of the Region

or Province agreements with Istat, while Municipal Police uses both the decentralized model and

the direct data sending to Istat (Standard flow). Carabinieri stations and Municipal Police use a

monthly transmission frequency, while the Traffic Police transmits data to Istat on a quarterly basis.

(Figure 1).

5 International definitions (European Commission, Eurostat, OCSE, ECE, etc.) of road accident state that a road

accident is “that event in which at least one vehicle is involved and that happened on the road network and that causes

fatalities or injuries to people” (Vienna Conference, 1968).

3

Figure 1 - Data flow system from Police authorities to Istat (standard model and decentralized model)

The channels used for data transmission consist mainly of two data acquisition system:

- GINO++ (Online Survey Management) dedicated to Municipal polices. Through this

system, police can transmit the data of road accidents by registering them in a online Data

Entry or uploading the files generated by their management software. The system, in use

since 2019, ensures the correct output of the information collected thanks to internal quality

and consistency checks on data that avoid the delivery of partial or incorrect information.

- INDATA portal, addressed to local police corps that use their own management software,

where filedata can be uploaded, using a record layout prepared by Istat, that however need

to be reviewed and corrected, not being guaranteed a controlled data entry. Municipal Police

that have not yet adapted to the new standards of GINO++, the Carabinieri and the Traffic

Police use this system.

The organizational structure described so far has been further modified as a result of a major

reorganization that has involved Istat and its UTs in September 2021. Therefore, starting with the

2022 survey, data collection activities in regions where the survey was decentralized to the UTs

(Abruzzo, Basilicata, Campania, Marche, Molise and Umbria) and in the standard flow regions

(Sicilia, Sardegna and Valle d'Aosta/Vallée d’Aoste) have passed to the Central Directorate for

Data Collection (DCRD) and specifically into the 'RD-Road Accidents' working group that is part

of the Data Collection Service for Demographic, Social and Welfare Statistics (RDH). The new

organizational framework at present has two macro-areas, the one in regions adhering to

Memorandum of understanding or special agreements (11 regions in total) and the area in nine

regions centrally managed by DCRD-RDH6 (Figure 2).

6 The transfer of the survey to RDH took place gradually, as the data collection activities for a given year of data t are

carried out from March in year t to May in year t+1, to allow UTs to complete the activities related to the 2021 survey.

It began in January 2022 in Abruzzo, Basilicata, Molise, Sardegna, Sicilia and Valle d'Aosta/Vallée d’Aoste, where

4

Figure 2 – Road Accident Survey: organizational models, before and after Istat 2021 re-organization

3. The Road Accident Survey in the new organizational framework: main objectives and

actions The latest organizational changes needed a redefinition of production processes in order to adapt

them to the new context. The transition of the survey management from many subjects on the

territory (UTs) to a single entity (DCRD-RDH) also led to review the organizational system of the

survey that in the regions with UT decentralization presented different models with different impact

on the processing of the collected data, on the timing and type of datasets returned to thematic

service (DCSW-SWC)7.

Thanks to a reconstruction carried out in collaboration with the thematic service DCSW-SWC, it

emerged, in fact, that in some UTs the decentralization process concerned only the activities of

monitoring and recovery of total non-responses as regards Municipal Police; in others UTs, the

decentralization affected all the police corps (Municipal Police, Carabinieri, Traffic Police) but only

some stages of the process; in others, however, it covered both aspects, with a positive impact on

the quality of the data collected and on the timeliness and coverage of the information produced.

Changes in the survey organization, if on the one hand it has been a necessity and a major challenge

given the high level of quality and efficiency achieved in some regions, on the other hand, it was an

important opportunity to standardize and harmonize data collection on a territory represented made

of 9 different regions, with about 2,300 municipalities and an annual average of over 36,500

accidents and a share of fatal accidents equal to 25% of the value recorded at the national level

(Table 1).

RDH was responsible for completing the survey in 2021, to conclude in August 2022 with the delivery to DCRD-RDH

of the Campania regions, Marche and Umbria, for the 2022 survey. 7 Central Directorate for Data Production - Integrated Service for health, care and welfare system.

5

Table 1- Numbers of road accidents collected in regions assigned to DCRD-RDH in 2022. Years 2010-2021. Absolute values

Regions

Number of towns

Year

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

Total road accidents

Valle d'Aosta/Vallée d’Aoste 74 370 299 295 315 295 283 285 256 267 313 194 247

Umbria 92 2.913 2.856 2.363 2.402 2.258 2.285 2.382 2.361 2.385 2.306 1.699 2.001

Marche 228 6.728 6.535 5.482 5.549 5.422 5.333 5.185 5.484 5.216 5.399 3.695 4.663

Abruzzo 305 4.099 4.058 3.671 3.603 3.429 3.217 3.037 2.946 3.145 3.160 2.205 2.729

Molise 136 657 639 581 507 511 461 479 510 478 555 378 421

Campania 550 11.129 10.225 9.698 9.103 9.182 9.111 9.780 9.922 9.721 10.058 7.088 9.014

Basilicata 131 1.147 1.054 949 888 936 936 945 848 979 903 677 918

Sicilia 390 14.255 13.283 11.790 11.823 11.366 10.864 11.067 11.056 11.019 10.702 8.053 9.943

Sardegna 377 4.206 3.785 3.472 3.664 3.492 3.537 3.508 3.425 3.461 3.633 2.479 3.200

Total 2283 45.504 42.734 38.301 37.854 36.891 36.027 36.668 36.808 36.671 37.029 26.468 33.136

Italy 7904 212.997 205.638 188.228 181.660 177.031 174.539 175.791 174.933 172.553 172.183 118.298 151.875

Road accidents with fatalities

Valle d'Aosta/Vallée d’Aoste 11 9 10 7 13 6 3 7 9 4 - 1

Umbria 74 59 48 57 45 59 33 44 43 50 43 52

Marche 106 120 95 79 98 92 97 90 86 93 67 81

Abruzzo 78 78 86 67 72 77 75 66 73 75 56 73

Molise 27 18 17 22 25 21 15 27 12 21 24 15

Campania 235 232 229 213 208 215 208 235 193 205 170 203

Basilicata 45 31 42 20 39 40 40 29 36 26 18 33

Sicilia 260 247 211 229 192 211 179 197 195 194 155 205

Sardegna 97 91 90 111 91 103 99 84 99 69 89 86

Total 933 885 828 805 783 824 749 779 746 737 622 749

Italy 3.871 3.616 3.515 3.161 3.175 3.236 3.105 3.178 3.086 2.982 2.275 2.737

With the aim of improving the process, the main task of DCRD-RDH was to ensure a harmonized

and standardized system of the data collection process and the activities, overcoming the differences

of each region without however renouncing the good practices adopted in the territory.

Initially, it was planned to use the complete organizational work model, which concern the entire

data collection process and all the local police corps, and was implemented before 2022 only in

Basilicata, Campania and Umbria. Subsequently, on request of the DCSW-SWC service, the

activity of data collection was restricted only to Municipal Police, thus excluding accidents

recorded by Traffic Police and Carabinieri stations. The reasons are essentially linked to the data

delivery timing, given the need for the DCSW-SWC service to anticipate the return by RDH of the

so-called "annual consolidated data" that is the complete data, checked and corrected through the

contact of respondents8. The reasons are also linked to the need to reduce the statistical burden on

respondents and to the need to ensure compliance with the times at different stages of the process.

Therefore, the commitment of RDH from 2022 concerned the process of data collection from

Municipal Police with the dual objective of ensuring the total coverage of the survey (also using a

dedicated call center service for inbound and outbound activity with respondents) and the quality of

the data collected.

Specifically, the following activities have been undertaken:

- Collection of information from Municipal police;

- Training, assistance and support during data collection and as regards the use of GINO++;

8 The survey decentralization to UTs included the sending the complete annual data by May 31st of the year following

the reference data year. By switching data collection management to the DCRD-RDH service, the DCSW-SWC team

has requested to anticipate of the data transmission to the first half of April of the year t+1 and to restrict the quality

control and the total coverage control to the accidents reported by the Municipal Police.

6

- Monitoring and control of the survey total coverage9;

- Contacts with the local police forces aimed at recovering the total non-response;

- Quality control of the data collected and contact with the Municipal police to correct the

errors found in the data transmitted;

- Final control of the over/under coverage of the phenomenon on the basis of historical data

series and with other sources, with consequent contact of local police forces in case of

significant differences 10.

The quality control of the collected data consists in verifying that all the information required in the

questionnaire for each accident is complete and consistent with each other. In some cases,

Municipal polices are contacted to request clarification and to proceed with the correction and/or

integration of the missing and/or incorrect information.

As stated before, this activity concern Municipal Police data files transmitted through INDATA

portal, while the data transmitted by GINO++ are excluded from quality control. Although very few

Municipal police use INDATA, the proportion of accidents transmitted is still relevant. The analysis

of the data collected by Municipal police in regions followed by RDH in 2022 highlights, in fact,

that the transmission of data through the INDATA portal accounts for just 1% of Municipal police

but with a significant share of accidents transmitted, equal to 23,5% of the total, that is over 4,900

incidents transmitted in INDATA by only 21 local police, all operating in municipalities with at

least 20,000 inhabitants (Table 2).

Table 2 - Number of Municipal polices and percent of road accidents by data transmission system. Year 2022

Data transmission system

Region Municipal Police % records transmitted

GINO++ INDATA INDATA-GINO++ (a) Total INDATA GINO++ Total

Valle d'Aosta/Vallée d’Aoste 74 - - 74 - 100,0 -

Umbria 91 - 1 92 1,7 98,3 100

Marche 214 8 3 225 45,9 54,1 100

Abruzzo 304 1 - 305 24,6 75,4 100

Molise 136 - - 136 - 100,0 -

Campania 544 4 2 550 12,5 87,5 100

Basilicata 128 3 - 131 79,6 20,4 100

Sicilia 386 4 1 391 25,4 74,6 100

Sardegna 376 1 - 377 26,9 73,1 100

TOTAL 2.252 21 7 2.281 23,5 76,5 100

(a) Municipal polices migrated in GINO++ during the year.

It is easy to understand that the process of data check and correction is challenging, especially with

a complex and long questionnaire like in this survey, with many variables that are subject to

dissemination. Specific computerized SAS® procedures have been thus developed, that are iterative

and made to prepare an accurate error map in order to simplify the correction activity (through the

re-contact of Municipal police) and with the aim of:

a) identify duplicate records and records out of the survey field11 ;

b) extract records that are incorrect or do not meet minimum quality requirements;

c) identify and describe the errors in each record.

9 In the absence of accidents with fatalities or injuries, the municipal police must transmit to Istat a communication of

“negative outcome”. The control of the total coverage, aimed at reducing the error of total non-response, is carried out

during the data collection phase by monitoring at a municipal level the number of road accidents and urging for monthly

missing data. 10 This check is carried out at the end of the data collection phase by comparing total number of road accidents absolute

values and percentages per municipality, per local police corps and per month, with data of the corresponding historical

series and with the quarterly survey (number of accidents, deaths and injuries). 11 Accident with no fatalities nor injury and/or that have not vehicle involved or have not occurred in a public traffic

area are out of the survey field.

7

Moreover, to allow the Municipal police to proceed independently with the record integration and

correction, a specific area in GINO++ is under construction, separate from the production area12. Its

use would avoid contact between Istat and the Municipal police for error correction and would

allow acquiring the correct data in real time and in complete safety. The joint use of SAS® control

procedures and GINO++ correction area would speed up and simplify the whole process through

the following steps:

- the SAS® procedures check the data files received on the INDATA portal and extract the

records with errors;

- incorrect data for each accident will be uploaded by Istat in the new GINO++ area dedicated

to correction;

- an email will be sent to local police corps indicating that there is information to be

corrected;

- local police corps will connect to the GINO++ specific area and make the correction by

intervening only on the variables that the system reports as incorrect.

Before the development of the SAS® mapping procedures, variables to be corrected by re-contact of

the Municipal polices where selected, as well as it was necessary to select errors to take into

account. The SAS® mapping procedures involved all variables of the questionnaire, while the

correction activity, in agreement with DCSW-SWC, was addressed to a minimum set of variables,

and were those related to date, time, place, location, nature, presumed circumstances of the

accident, vehicles and injured persons. This selection was made in order to find a fair compromise

in the cost-benefit evaluation, that is, between the need to produce a dataset as complete and 'clean'

as possible and the need to reduce the statistical burden on respondents, knowing that the local

police corps re-contact would generate an excessive response load with a negative impact on the

organization and speed of the survey.

Since the GINO++ specific correction area is under development, the correction work on the 2022

data during data collection was carried out through the re-contact of the Municipal police. In order

carry out this activity (which has taken place at different moments) in the best way and to facilitate

communication with the Municipal polices in order to limit errors as much as possible, the re-

contact work was divided among colleagues by assigning to each one of them the same

municipalities in the different correction rounds and making available to them: the descriptive error

map, containing the data identifying the accidents to correct with the indication of the wrong

variables and the description of the errors; the data base with the accidents to correct and the

questionnaire Access mask, where the questionnaire is displayed and the corrections can be made.

Being aware that re-contact with the respondent is challenging and that it requires professionalism,

attention and mastery of the contents of the questionnaire and of the tools used for data correction

activity, particular attention was also paid to the training of colleagues in charge of the correction,

who were 'prepared' for the job through a formation seminar dedicated to deepening the

questionnaire knowledge and the use of tools, with the help of simulations and practical exercises.

4. The computerized procedure for collected data quality monitoring

In the survey waves before 2022, the information completeness and correctness control,

implemented in the data collection phase, was set up in some regions with different tools (SAS,

Access, Excel) according to the informative and operational needs that in concrete emerged locally

during data collection. During 2022 data collection, given the allocation of nine regions to DCRD-

12 A separate area in GINO++ is necessary because GINO++’s record layout is different from the record layout of file

transmit through INDATA, not only in the structure (4 csv files are needed in GINO++ to describe one accident) but

also in the collected information. In addition, the questionnaire in GINO++ allows managing many vehicles involved in

the road accident, while INDATA text file is structured to contain information up to a maximum of three vehicles.

8

RDH, it was necessary to adopt a procedure that would allow the management of the data quality

control process in a more systematic and comprehensive way. The guiding key concepts have been

to standardization, simplification, and automation, where possible, of all the activities in the

process. Therefore, an error data mapping was structured, as exhaustive and systematic as possible

("internal" error profile Filippucci C., 2002). Below we present the error mapping and the logical

path through the criteria definition and methods of re-contact of Municipal polices.

The error classification resulting directly from the structure of the questionnaire (Manzari A. 2022

and Istat 2004) and from a priori knowledge on variables is described below:

A. Missing errors: a variable has a missing value. This can happen in two cases, when the

variable must be present in all records, and when it is under condition, that is the error exists

depending on a filter question.

B. Domain errors: when the variable returns an ineligible value, that is, out of the range of its

possible values, given the answer modalities in the questionnaire.

C. Not-due answers (NDA): given a filter variable, the NDA happens if a question outside the

filter has been answered.

D. Incompatibility between variables: or consistency errors, also called 'conditions of

incompatibility between variables'.

Table 3 displays 2022 percent distribution of accidents that have been mapped, that is the data

acquired through the INDATA portal, by local police corps13.

Table 3 - Road accident percentage sent to DCRD -RDH through INDATA, by region and by local police corps. Year 2022

Region Local Police Corps

Total Traffic police Carabinieri Municipal police Total

Abruzzo 27,9 53,1 18,9 100 9,3

Basilicata 17,1 51,6 31,3 100 4,4

Campania 28,9 56 15 100 24,1

Marche 25,1 41,4 33,5 100 18,5

Molise 28,8 71,1 0 100 1,6

Sardegna 19,8 58,1 22,1 100 10,7

Sicilia 25,6 36,7 37,7 100 25,1

Umbria 36,3 61,5 2,2 100 5,2

Valle d'Aosta/Vallée d'Aoste 31,8 68,2 0 100 1,1

Total 26,2 48,9 24,9 100 100

The accidents’ record layout contains about 180 variables, several of them logically

interdependent14. Writing for each variable the control rules (errors definitions) in SAS® described

above, we ended up with a consistent number of possible errors (in total 367), 181 missing, 140

domain errors, 7 not-due answers and 39 incompatibilities. The application of these rules to the

datasets sent by local police corps leads in theory to define two sets of records, those with at least 1

error and those without any error.

The analysis of the incorrect records by type of error in the nine regions managed by DCRD-RDH

is displayed in Table 4, distinguishing between all police corps (Municipal Police, Traffic Police

13 Data on the number of incidents in 2022 are deliberately expressed in percentage values, as they have not yet been

disseminated. 14 The high number of variables is depending on the fact that information about vehicles, drivers and passengers are

repeated for all vehicles involved in the accident, up to a maximum of three.

9

and Carabinieri) and the Municipal Police alone. The table also reports the percentage of duplicated

records and of those 'off-field observation' (OFO)15.

Table 4- Percentage of error, by type of local police corps. Regions assigned to DCRD-RDH. in Year 2022

Type of error All Local Police corps Municipal Police

Number of accident (absolute value) 19.327 4.811

1 Missing at least 100% 100%

1 Domain at least 11,90% 26,80%

1 NDA at least 6,50% 5,70%

1 incompatibility at least 26,40% 10,90%

Duplicated records (absolute value) 16 2

Off-field observation-OFO 0,20% 0,08%

Re-contacts 49,50% 42,50%

The quality control of the data was done on 19,300 accidents’ records, of which approximately

4,800 coming from Municipal Police and on all the variables. The analysis of errors shows that: all

the records contain at least one missing error; 11,9% of records have a domain error, with a higher

percentage for the Municipal Police (26,8%); the presence of at least one NDA error is recorded in

6,5% of all cases, while the presence of at least one error of incompatibility is lower for Municipal

Police (10,9%) in comparison with the general percentage (26,4%). On the territories, there is a

regional peculiarity in Sardegna and Sicilia as regards domain errors. The other errors show no

specific particularities. We now restrict the analysis to a subset of variables (that also destinated to

dissemination and are of public interest), focusing on the fundamental information and its dynamic

and consequences. It should be noted that the following data concern all local police corps.

The missing error mapping shows that the missing answers in the variables describing the timing of

the accident are very few, only 9, due to the missing 'hour’ and ‘minute’ variables. Even the

localization on the territory (Province and Municipality variables are never missing) and the nature

of the accident have very few missing errors, with not even 1% of answers missing. Also missing

errors on ‘type of vehicle’ variable are very low for vehicle A and B, respectively 0,24% and

0,87%. Missing errors occur mainly in variables related to the presumed circumstances of the

accident: 13,4% of cases for vehicle A, in 24,0% for vehicle B/pedestrian or obstacle; missing

values in at least one of the variables related to vehicle A driver (age, gender and accident

consequences) is present in a negligible percentage and always below 2%; moreover, the

simultaneous absence of this information affects 1,6% of cases. In the case of vehicle B, omissions

have a greater impact but do not exceed 3,0%. These results, in the opinion of the authors, are an

indication that the controls on the variables province, common, time, nature of the accident and the

presence of at least one vehicle involved in the accident are basic in most of the software used by all

police corps.

Most NDA errors, which account for 6,5% of the total number of cases, are related to accidents

involving a moving vehicle with a parked one, while information about the parked vehicle is not

required.

Domain errors (12,0%) mainly concern the circumstances of vehicle A (1,2%) or vehicle B (0,5%),

and, in third position, but far away by incidence, the nature of the accident (0,3%). There is no

domain error in the timing and in the localization of the accident. At a regional level, the highest

incidence of domain errors is recorded in Sardegna (4,2%) and Sicilia (4,8%).

The most frequent errors are incompatibilities that amount to 26,4% of the records.

15 Cfr. note 11.

10

The greatest number of inconsistency errors are observed on the variables relating to the presumed

circumstances of the accident which relate in particular to the incorrect indication of the

circumstances of vehicle B, pedestrian or obstacle depending on the nature of the accident; their

percentage ranges from 9,2% to 1,7%. Errors in the compilation of variables related to the presumed

circumstances of the accident are due to the fact that for these variables the local police corps

management systems often do not provide any type of control nor a guided compilation that can

help in the form filling-in phase and, moreover, to the fact the presumed circumstances

questionnaire section is less intuitive.

The procedures ran on the variables related to fatalities and injuries made it possible to identify non-

eligible records (OFO), which are 35 with no indication of dead or injured persons, particularly

concentrated in Abruzzo and Marche; the procedure also identify 23 records with mismatch

between the total number of fatalities and injuries (reported in a specific summary section of the

questionnaire) and the same number deduced from variables related to the consequences of the

accident (dead and injured persons) for drivers, passengers of the vehicle and pedestrians involved

in the accident. The analysis of the Municipal police data has also highlighted a specificity in the

town of Messina, where there are twenty records with at least one injured pedestrian but without the

indication of the vehicle; these cases result to be OFO, after the re-contact with the Municipal

police.

As stated before, all the records received by Istat through INDATA undergo the error mapping

procedure, but the re-contact of the respondent for the correction of the errors has concerned only

the Municipal polices, and concretely it became necessary in 42,4% of those cases for a total of

2,072 records to be corrected. If we had extended the correction activity to all local police corps, the

number of records to be corrected would have risen to over 9,000. Below is displayed the frequency

error distribution for the accidents that needed corrections, by variable, region and type of error

(Table 7).

In conclusion, the use of mapping procedures has greatly simplified the process of quality control

during data collection, allowing to quickly identify wrong records and to reproduce a map of the

errors for each record. The errors analysis has allowed to direct the attention towards the Municipal

police that more critics for incidence and typology of errors and to identify the systematic errors,

allowing a focused re-contact on respondents. They also allowed the production of an operational

report (complete and easy to use) in use by colleagues involved in the re-contact and error

correction phase.

11

Table 7 - Frequency of errors found in the variables to be corrected by region and type of error - Municipal Police

Information about the accident

Variables and Type of error

percentage values Total Errors

Detected

Abruzzo Basilicata Campania Marche Sardegna Sicilia Umbria

absolute values

values %

WHEN (data, time)

HOURS (missing/out of domain)

- - - - - - - 0 -

DAY (missing/out of domain) - - - - - - - 0 -

MONTH (missing/out of domain)

- - - - - - - 0 -

WHERE (location of the accident)

ROAD TYPE (missing) - - - 0,3 - 0,1 6,7 4 0,1

ROAD TYPE (out of domain) - - - - - - - 0 0,0

ROAD NUMBER* (missing) - 12,7 - 15,4 - 0,9 3,3 86 2,5

STREET NAME (missing) - - - - - - - 0 0,0

PROGRESSIVE KILOMETERS* (missing )

- 5,1 0,2 9,6 2,8 0,7 3,3 62 1,8

TYPE of accident

NATURE OF THE ACCIDENT (missing)

- - 0,2 0,6 - 3,2 6,7 63 1,9

VEHICLES involved

VEHICLE**A (missing) - - - - 0,4 1,7 - 33 1,0

VEHICLE** B (missing) - - 0,2 0,3 2,8 3,8 3,3 80 2,4

CIRCUMSTANCES*** VEHICLE A (missing/domain/incompatibili ty)*

48,6 44,1 46,2 33,7 29,5 45,0 23,3 1447 42,8

CAUSES of the accident

CIRCUMSTANCES*** VEHICLE B (missing)

37,8 20,3 35,0 16,0 4,3 30,3 10,0 930 27,5

CIRCUMSTANCES*** PEDESTRIAN/OBSTRUCTI ON (missing)

13,5 16,9 17,8 20,1 19,6 8,2 20,0 423 12,5

CIRCUMSTANCES*** VEHICLE B - intersection (incompatibility)

- - - 2,0 38,8 0,5 - 126 3,7

CIRCUMSTANCES*** VEHICLE B - non- intersection (incompatibility)

- 0,8 - 1,7 1,8 5,0 10,0 107 3,2

CIRCUMSTANCES*** PEDESTRIAN (incompatibility)

- - - - - 0,1 3,3 2 0,1

CIRCUMSTANCES*** VEHICLE IMPACT. STOP/TRAIN/OBSTACLE (incompatibility)

- - - 0,3 - 0,1 - 2 0,1

CIRCUMSTANCES *** LISTING/FALL (incompatibility)

- - - - - 0,4 - 8 0,2

CONSEGUEN CES of the accident to people

INJURED/OUTCOME (incompatibility between summary and outcome)

- - 0,2 - - - 10,0 4 0,1

DEATHS/OUTCOMES (incompatibility between summary and outcomes)

- - - - - - - 0 0,0

Total errors (absolute values) 362 118 409 344 281 1833 30 3377 100,0

Total Record incorrect (absolute values) 206 80 275 248 225 1021 17 2072 43,2

* This variable is considered missing for accidents that occurred on motorways, national, regional, or provincial roads.

** For each accident it is possible to enter up to a maximum of 3 vehicles involved (A, B and C) sorted by degree of responsibility in the dynamics of the accident. *** The presumptive circumstances of accidents are intended to understand the accident dynamic. They refer only to two moving vehicles (A and B). In the case of accidents involving a single moving vehicle, they refer to vehicle A and the pedestrian in case of pedestrian collision; to the parked vehicle/train/obstacle in case of collision; to the obstacle not collided in case of sudden slip/braking or fall.

5. Conclusions

The SAS® procedures represent the process innovation implemented in 2022 that enabled the

workload and the new organization to be addressed. In fact, the use of these mapping procedures

has resulted in a significant reduction in time and resources involved in the data quality control

process and in the correction activities with positive implications for the results. It is not possible to

determine precisely the advantages obtained in terms of recovered time and resources by not having

elements of comparison with the past (substantial number of regions to be treated, decentralization

on many different territories and offices). But certainly, we can say that the use of the mapping

procedures allowed the achievement of the objectives in the activities planned on the survey 2022

12

respecting deadlines data delivery timetable to DCSW-SWC, despite the small number of resources

allocated on this activity.

Moreover, the reports produced by municipality, and in particular the map of errors, allowed to

identify the critical spots on the territory and to manage in a more flexible and targeted way the re-

contacts with respondents; they also allowed to find and adopt specific methods of re-contact for

bigger municipalities and for those with many errors. The map of municipalities with the highest

number of errors and the most frequent errors also allows to plan targeted information and training

actions, focusing on the most critical aspects of the questionnaire. The error map could then be a

useful tool to be extended to all local police corps re-contact activities, in order to improve the data

quality, in particular if used in conjunction with the new correction area in GINO++.

The contact with respondents in the data correction phase is confirmed to be very useful helping to

strengthen the collaboration relationship with the Municipal polices and often becoming a moment

of training on the job, with the consequences of significantly reducing some types of errors, in

particular in municipalities where the accidents’ data entry information is managed by the same

person.

Acknowledgement

Thanks to Silvia Bruzzone (ISTAT - Italian National Statistical Institute - Central Directorate for

Data Production) for sharing DCRD-RDH the documentation and programs produced in DCSW-

SWC for data control.

Thanks to Angela Albanese, Luigi De Luca, Daniela Lo Nigro, Annalucia Ferrante, Elisabetta

Lipocelli e Adriana Pardi (ISTAT - Italian National Statistical Institute - Central Directorate for

Data Collection) for their support in data correction activities through the contact of the

respondents.

References

Filippucci, C. (edited by), Strategie e modelli per il controllo della qualità dei dati, Franco Angeli,

2002.

Riccini Margarucci, E. (edited by), Concord V.1.0 Controllo e correzione dei dati. Manuale utente e

aspetti metodologici, Istat, 2004.

Istat, Linee guida per la qualità dei processi statistici che utilizzano dati amministrativi. Version

1.1, August 2016. https://www.istat.it/it/files/2010/09/Linee-Guida-fonte-amministrativa-v1.1.pdf

Istat, Manuale Concord V. 1.0 Controllo e correzione dei dati: manuale utente e aspetti

metodologici, Roma, Istat, 2004.

Manzari, A., Aspetti generali sulle procedure di controllo e correzione dei dati, Presentation made

in ISTAT on 13-06-2002.

Brancato, G., Boggia, A. Ascari, G., Linee Guida per la Qualità delle Statistiche del Sistema

Statistico Nazionale. Ver. 1.0, Istat, March 2018. https://www.istat.it/it/files//2018/08/Linee-Guida-

2.5-agosto-2018.pdf.

Casale, D. (edited by), CLAG: verso un software generalizzato per l’acquisizione controllata dei

dati via Web e l’organizzazione autonoma e flessibile della rete di rilevazione. Istat, 2010.

An agile approach to direct official surveys - Paola Bosso, Silvana Curatolo and Pasquale Papa (Istat, Italy)

Languages and translations
English

12-14 June 2023

EXPERT MEETING ON STATISTICAL

DATA COLLECTION 2023 -

RETHINKING DATA COLLECTION -

Paola Bosso - Istat Directorate for Data Collection - Speaker

Silvana Curatolo - Istat Directorate for Data Collection

Pasquale Papa - Istat Directorate for Economic Statistics

AN AGILE APPROACH

TO DIRECT OFFICIAL SURVEYS

Why an "agile" approach to official direct surveys

Starting point: “The nature of data collection is bound to change. Using solely primary data collection would be too time-consuming, costly and burdensome to satisfy the increasing demand”. [Salemink I. et al. 2020].

Agile approach:

General Features:

• Efficiency and speed • User orientation • Cost reduction • Multidisciplinarity

Operational solutions in survey processes: • "Once only" approach: interoperability • Multisource approach • Application of adaptive survey techniques • Samples and questionnaires reduction • Questionnaires optimization • Process automation-oriented techniques • Primary role of CAWI technique • Web portal for users • Specialized assistance services to respondents

FOCUS: Effectiveness of CAWI technique supported by a centralized contact center service in business surveys

Data quality constraint: TSE paradigm

The role of Cawi in business direct surveys

In CAWI business surveys, ISTAT applies an agile approach. CAWI mode is supported by different tools:

3

Respondent Web Portal: • single access point for filling in the questionnaires • functionalities and services supporting the users

Centralized contact center inbound service: • information and support to the units, thematic and non temathic aspects • direct support for filling in the questionnaires on request

Centralized contact center outbound service: • massive reminders to non respondents (by sms/email/PEC) • customized reminders (by phone) • recovering of missing values (by phone)

Compensate for the disantvantages of single-mode CAWI

✅ Cost ✅ Timeliness ✅ Flexibility in filling in the questionnaires ✅ Absence of interviewer (privacy) ✅ Data quality control

x Coverage x Absence of interviewer

(response rate, misurement bias, missing values)

x Interruption of the questionnaire

4

• support and problem solving activities (INbound) can improve the collaboration to the survey, the data quality and the response rate

CAWI mode has advantages and disantvantages:

The absence of the interviewers can be partially compensate with an «agile» contact center service:

• remind and recoveries activities (OUTbound) can improve the response rate and the completeness of collected information

The agile approach in ISTAT: an application (1/2)

Permanent Business Census ed. 2022:

→ about 278.000 units → data collection period: 28nov2022-31mar2023 → CAWI with the support of centralized contact center service

5

• 32.000 service requests in 4 months • 60% by phone and 40% by email/PEC

• 56% of access problems • 90% solved by operators

Inbound service - some evidences:

0%

10%

20%

30%

40%

50%

60%

Access problems

Usability General information

Event communic.

Thematic assistance

Mandatory and penalty

Refuse

% P. Census Enterprises SR ed. 2022 - Motivation SR

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% Resolution SR

-

2.000

4.000

6.000

8.000

10.000

12.000

14.000

16.000

Dec 2022 Jan 2023 Feb 2023 Mar 2023

Service Request

0%

10%

20%

30%

40%

50%

60%

70%

80%

Phone Mail PEC

%

The agile approach in ISTAT: an application (2/2)

6

Outbound service – Features:

→ 1 preliminary activity to recover missing telephone numbers → 2 telephone reminder to different non-responsive units → 1 final telephone reminder to the most important non-responsive units,

some of which contacted in the previous waves

Typology n. units in

the list % units

contacted

% responding units after reminder

1st telephone reminder 35,111 28.7 39.7

2nd telephone reminder 36,761 30.3 44.2

Final telephone reminder 5,971 68.3 50.3

Outbound service - Some evidences

The segments of the 2022 permanent business census

1. Large economic units at national level (500+ employees) REGISTERED with long questionnaire (1,612 companies)

2. nationally relevant economic units (250-500 employees) REGISTERED with long questionnaire (2,282 companies)

3. Medium economic units (20-250 employees) REGISTERED with long questionnaire (62,521 enterprises)

4. Small economic units (10-20 employees) REGISTERED with long questionnaire (29,753 companies)

5. Other economic units (10 AND more employees) NOT registered or NEW Portal with long questionnaire (32,394 companies)

6. Micro economic units (less than 10 employees) REGISTERED with short questionnaire (44,054 companies)

7. Micro economic units (less than 10 employees) NOT registered or NEW Portal with short questionnaire (105,788 companies)

7

For data collection purposes, the sample was divided into 7 segments, according to the following variables: dimension, previous registration to web portal, new entry in the web portal, questionnaire type (long or short form):

Role of the inbound assistance service in the segments of the permanent business

census (1/2)

• About 11% of respondent companies called the service, on average for 11 minutes of assistance. • The use of the service especially concerned companies included in segments 5,7,1,6, (mainly micro and very large

enterprises) • Larger companies that are already registered on the web portal send more thematic requests (content of the

questionnaires)8

Role of the inbound assistance service in the segments of the permanent business

census (2/2)

To better understand the relationships between the variables: response rate, use of inbound service and segment, a multiple correspondence analysis was carried out.

The analisys identified 3 main groupings:

9

1. higher service rates and

satisfactory response rates

(effective service)

2. lower service rates and

satisfactory response rates

(autonomous businesses)

3. higher service rates and

unsatisfactory response rates

(partially effective service)

Conclusions

• The centralized contact center service is a useful tool towards an agile data collection process aimed at efficiency and compliance with user needs

• The service is an effective support to the CAWI technique in order to maintain the response rates

• Analysing the behavior of the respondents in different segments of the sample allows to optimize the available resources allocation

• In general, the larger companies and micro-enterprises experience greatest usefulness of the service. The motivation is respectively a greater organizational complexity and a less habit to participate in surveys. For the micro it is necessary to identify additional tools in order to support the participation rates

• The centralized contact center should be integrated with additional customized services (e.g. CATI interview on demand)

• Other aspects contribute to the realization of an agile approach to business direct surveys. In particular using alternative sources, reducing samples and questionnaire length, process automation

• Ensuring the data quality remain the constraint, in the framework of the TSE (Total Survey Error) paradigm

10

Appendix

11

94% 92%

76% 75%

28%

75%

36%

1 2 3 4 5 6 7

Response rate for segment

Permanent Business Census ed. 2022 - Response rate for segment of enterprises

Thanks! PAOLA BOSSO

SILVANA CURATOLO

PASQUALE PAPA

[email protected]

[email protected]

[email protected]

Expert Meeting on Statistical Data Collection12 – 14 June 2023

1

An agile approach to direct official surveys

Paola Bosso1 | ISTAT – Italian National Statistical Institute, Rome, Italy

Silvana Curatolo2| ISTAT – Italian National Statistical Institute, Rome, Italy

Pasquale Papa3 | ISTAT – Italian National Statistical Institute, Rome, Italy

1. Recent trends

A basic document expressing the new role of direct surveys is produced by Statistics Netherlands and

Statistics Canada [12]. It enhances that the methods of carrying out direct business surveys as conceived

in recent years are no longer sustainable. Various analyses conducted by Istat [11,13,14] confirm these

trends.

In order to address these trends, most of the NISs have undertaken complementary strategies. Firstly, they

concern the progressive use of alternative sources and by applying data science techniques. In detail, a

strategy towards which the main statistical institutes are oriented is that of reducing the role of direct

surveys, resorting to them only when strictly necessary and when there is no availability of alternative

sources, administrative or other type (big- data, sensor data, meter data, etc.). In Istat (Italian national

statistical institute) several concrete progresses have been made in recent years on the use of

administrative sources and some of them are currently under development (e.g. electronic invoicing or

fees) while still today the use of other alternative sources is not very frequent as many applications are

currently still at the prototype level. Summarizing these trends, there is a general convergence towards a

multisource approach to surveys, in which the direct survey represents only one of the various components

that can be used.

An immediate consequence, very interconnected with the trends described above, consists in the design

of survey processes that are increasingly ‘agile‘ and less invasive towards respondents, oriented to

efficiency and timeliness. In this context, the use of simple data collection techniques supported by

supplementary services aimed at reducing the missing answers and more generally the non-sampling

component of the error is a particularly effective solution. The design of services to support users involved

in the surveys (e.g. professional assistance services, web portals) are basic elements that contribute to

increasing the efficiency of survey processes.

The following Figure 1 shows the trend of human resources employed in conducting data collection

processes of business surveys from the introduction of the centralized data collection model in Istat, in

2016, up to today. In particular, the analysis shows a substantial halving of the resources employed in the

data collection processes, falling from over 16 fte resources in 2017 to around 8 fte resources in 2023.

1 Paragraphs: 2, 3, 4 2 Paragraphs: 5 3 Paragraphs: 1,6,7,8.

Expert Meeting on Statistical Data Collection12 – 14 June 2023

2

Figure 1. Istat human resources employed in data collection for business surveys. Years 2017 -

2023 (Full Time Equivalent=100).

The reduction in available human resources did not only concern the personnel directly employed in the

data collection processes but also that employed in other activities connected to the surveys. The need to

continue to ensure adequate quality levels of the statistical outputs produced in relation to the substantial

reduction in the human resources available has required a considerable effort aimed at increasing the

efficiency of the data collection systems. This effort has made it possible to maintain substantial stability

in response rates in official business surveys over the years.

2. Features of an agile approach to official direct survey data collection

The general characteristics of an 'agile' approach are traced back to Efficiency and speed, User orientation,

Cost reduction, Multidisciplinarity. The approach involves, in turn, a set of operational solutions in direct

survey processes management, which can be summarized in the following principles. a) "Once only"

approach: interoperability; b) Multisource approach; c) Adaptive survey techniques; d) Questionnaires

optimization techniques; e) Process automation-oriented techniques; f) Primary role of CAWI technique;

g) Design of Web portal for users; h) Design of specialistic assistance services to respondents; i) Smaller

samples; l) Shorter questionnaires. All the principles listed must be applied with a view to maintaining

adequate levels of quality and on this issue the reference is that of the TSE (Total Survey Error) paradigm

[1,2]. In the context thus outlined, the specific focus of this document concerns the analysis of the

effectiveness of CAWI technique supported by a centralized contact center service in business surveys.

2.1 CAWI technique: advantages and disadvantages

The data collection mode is defined based on the aims of the survey and the characteristics of the target

population, maximizing data quality and minimizing costs.

Choosing a mode or a mix of data collection modes, in a specific field, is therefore a problem of

maximizing quality with the constraint of available resources. There is no ideal data collection mode for

all situations, but the advantages and disadvantages of each of them need to be assessed according to the

specific situation of the survey.

In the business surveys, the CAWI mode, used in a single-mode design, represents a good compromise

between quality and cost. This mode is in fact very adapted to the needs of business surveys, which often

require the collaboration of multiple structures and roles of the enterprises. It also guarantees good

coverage given the high digitization of enterprises. One of criticality, as highlighted in the literature,

remains the absence of an interviewer which could negatively affects the response rate and the

completeness of questionnaires.

The implementation of a Contact Center which supports the CAWI mode with the assistance to the

compilation (Inbound) and the reminders for the recovery of non-response and partial responses

(outbound), can partially compensate for these disadvantages and promote accuracy of responses and

improvement of response rate in absence of the interviewer.

In details, reminders can be executed with different modes: massive postal reminders (sms, PEC or mail)

to all units that at a certain date do not respond or telephone reminders customized for different cases (e.g.

partial compilation, no compilation, "core" missing data). Telephone mode is more expensive, but highly

effective, because in addition to the recovery of questionnaires (reduction of total non-response) it can

0

500

1000

1500

2000

2017 2018 2019 2020 2021 2022 2023

Expert Meeting on Statistical Data Collection12 – 14 June 2023

3

also improve the quality and completeness of the information collected (reduction of partial non-

response).

3. User orientation as a solution aimed at increasing the efficiency of the data collection

Some technological and organizational solutions have already been implemented in Istat in 2016 at the

same time as the introduction of the centralized data collection model [3,4,5,7,11]. These solutions require

today further development and consolidation with a view to increasing the efficiency of data collection

processes and hold up survey participation rates. Below are some thematic areas that require require

special consideration in the specific field of user orientation.

3.1 Statistical web portal

The Business Statistical Portal, with its main functions, is a crucial tool at the service of users involved in

business surveys and to ensure adequate participation rates in business surveys. Single point of access,

possibility of delegation and updated status of the obligations to be performed represent very important

services in the simplification of the statistical obligations required of users. After a few years from its put

into production, the system requires an update and an extension of its functionalities, in particular with a

view to greater integration with all the other tools used in the data collection of economic surveys. During

the years 2022-2023, with the inclusion of the two transport surveys Maritime Transport Survey and Air

Transport Survey, in the past managed with independent acquisition system, all the bussiness surveys are

included in the Portal and the latter can fully play its role.. Over the last few months, a process has also

been launched to redesign the section of the Portal dedicated to the return of personalized information to

the companies involved in the surveys in order to offset, at least in part, the required statistical burden. It

should be remembered that some larger companies are involved in around 20 Istat statistical surveys each

year, several of which produce data at infra-annual intervals.

Table 1. Surveys and Authorized Users of the Business Statistical Portal - May 2023.

Surveys

included in the

Portal

Number of

companies

authorized to access

Number of

registered NSI

external users

Number of

registered NSI

internal users

Number of Register

data change reports

72 914.493 978.588 741 131.992

4. Centralized contact center service offered in the year 2022: overview

The Inbound service is centralized and multi-channel. It provides standard assistance to all the type of

users (enterprises, organizations, farms, individuals, households) involved in the Istat CAWI surveys (in

a context of unimode or mixed mode design). Assistance can be provided via a toll-free number or via

email and PEC; these occurrences are reported in the survey presentation letter.

The centralized management of the service allows the standardization of the assistance procedures for the

different types of units involved in the Istat surveys. This means a greater efficiency in terms of data

collection times and costs (shorter response times and opportunity to achieve economies of scale) and

better quality of the information provided to users (more efficient control, non-redundant information).

Since 2016, the service experienced a significant extension and growth in volumes and currently supports

the data collection of around 90 surveys (recurring, occasional and permanent censuses), included in the

National Statistical Program and issued by European Regulations.

In 2022 the centralized contact center handled in total 228.000 service requests (SR). Respondents used

the phone channel in 80% of cases and the email/PEC channel in the remaining 20% (Figure 2). For the

Expert Meeting on Statistical Data Collection12 – 14 June 2023

4

phone channel, the average call duration was 6 minutes, while the average time of managing an email

request was of 7 minutes.

Figure 2. Number of SR handled monthly (on the left). Access Channel used by respondent

units (on the right). Years 2021-2022.

In 46,3% of cases (105.495 SR) users were families or individuals involved in socio-demographic surveys

(sample surveys and census); in the 53,7% of cases (122.505 SR) users were Enterprises, Organizations

and Farms involved in business surveys (sample surveys and census).

More in detail, for the socio-demographic surveys the main volume of SR is due to Permanent Census of

Population (39,9%), while sample surveys develop a very small contact flow in the year (6,4%). Instead

for the business surveys, the main volume is generated by the recurrent economic surveys dedicated to

the enterprises, short-term and structural surveys (36,5%) (Figure 3).

Figure 3. Volume of SR by type of survey (on the left) and type of respondent units (on the right).

Year 2022

As concern the reason of the contact, the different cases can be grouped in 8 macro-class, in the order, as

follow:

1. Access problems (54,3%) - Requests relating to the difficulties in accessing the site (resetting

passwords, losing or forgetting login credentials, ..);

2. General information (29,1%) - Generic information requests on the survey, like: topic, use of the

data, data collection methods, schedule of data collection, etc.

Expert Meeting on Statistical Data Collection12 – 14 June 2023

5

3. Event communication (4,7%) - The users contact to communicate an event which could compromise

the eligibility in the survey or influencing the way to compiling the questionnaire;

4. Usability (4,4%) - request which highlight problems of usability with the data collection system (e.g.

visualization of the questionnaire within the system, roles and powers for completing the

questionnaire) or with electronic questionnaire (e.g., non-editable fields, methods of sending the

questionnaire and receiving the receipt of successful completion, etc.);

5. Thematic assistance (3,6%) - Information about specific variables of the questionnaire or eligibility

aspects;

6. Mandatory and penalty (2,7%) - Specific information requests about the obligation to replay and the

administrative penalty;

7. Mode choice (1,6%) - Request to compiling with the support of a telephone interviewer (CATI), for

the surveys which include a mixed mode CAWI-CATI;

8. Refuse (0,3%) - The units declares that they don’t want to cooperate.

Another important aspect concerns the resolution (Figure 4). In 90% of the cases, requests were resolved

by contact center operators (I level) and the remaining 10% by Istat referents (II level). This gap is set to

improve further over the time as the experience acquired progressively by operators positively affects the

number of requests resolved at the first level. A criticality on this aspect can be due to the change of

provider of the service.

Figure 4. Reason of SR (on the left). Resolution of SR (on the right). Years 2021-2022

4.1 Focus on the Permanent Business Census

The data collection of the last Permanent Census of Businesses took place from 28 November 2022 until

31 March 2023. It involved 278.000 Italian enterprises. The units involved could contact the inbound

service through the toll-free number (active from Monday to Friday from 09.00 to 19.00) or writing to the

assistance e-mail box or PEC. Outside the time of service the enterprises could leave their contact details

to be called back within 24 hours by an operator.

The census has developed in total 32.822 service requests. The use of mail/PEC channel for

communications and support requests is widely widespread among the enterprises, compared to

household, in fact the asynchronous channel had a significant weight of about 40% (Figure 5).

Expert Meeting on Statistical Data Collection12 – 14 June 2023

6

Figure. 5. Permanent census of Enterprises ed.2022 - Number of SR handled monthly (on the left). Access

Channel used by enterprises (on the right).

The main reason of assistance requests from enterprises concerned access problems (54,3%) (Figure 6).

Compared to the total SRs, the second reason for requesting assistance concerned the usability of data

acquisition system and the questionnaire (28,4% for the Permanent Census of Enterprises vs 3,8% for the

total). This gap is probably due to the complexity of completing questionnaire which, in the case of the

enterprises, requires the cooperation of several structures and roles with consequent problems of

delegations and permissions to compile. Instead, regarding the problem solving at I Level, the Census of

Enterprises showed a value similar to the total cases (about 90%).

Figure 6. Permanent Census of Enterprises vs Total SR ed.2022 - Reason of SR (on the left) - Resolution of

SR (on the right).

5. The Outbound service

The Outbound service, on the other hand, is only by telephone, because email and PEC reminders are

managed with automated procedures directly by Istat. It mainly involves enterprises and institutions, only

recently some surveys on individuals have been introduced. In 2022 the centralized contact center handled

about 530,000 telephone reminders, of which about 343,000 with successfully completed reminders or

with units that sent the questionnaire before the reminder (useful contact) (Figure 7).

-

2.000

4.000

6.000

8.000

10.000

12.000

14.000

16.000

Dec 2022 Jan 2023 Feb 2023 Mar 2023

Service Request

0%

10%

20%

30%

40%

50%

60%

70%

80%

Phone Mail PEC

%

Expert Meeting on Statistical Data Collection12 – 14 June 2023

7

Figure 7. Number of useful contact handled monthly, years 2021-2022 (on the left). Number of useful

contacts vs total number of contacts (on the right), year 2022.

The reminder generally consists of a courtesy phone call informing the unit of the involvement in the

surveys and the deadlines for sending the data (basic Outbound). Only for some specific surveys, the

operators of the outbound service provide assistance in completing the questionnaire (advanced

Outbound).

In 93.4% of cases the units were Enterprises involved in recurrent economic surveys (short-term and

structural surveys) and only in 6.6% of cases the units were Organizations involved in business, cultural

and demographic surveys (Figure 8).

Figure 8. Percent of useful contacts by type of reminder (on the left). Percent of useful contacts by

type of unit (on the right).

For the basic reminder, the average call duration was 3 minutes, while the average time of advanced

reminder was of 10 minutes. Generally reminders have a longer duration for Organizations (2.2) than for

Enterprises (1.2).

5.1 Focus on the Permanent Business Census

For the last Permanent Census of Enterprises the basic outbound service was preceded by a massive search

of the telephone numbers of Enterprises that have never registered on the data acquisition system

Expert Meeting on Statistical Data Collection12 – 14 June 2023

8

dedicated to the survey (Business Statistics Portal). The contact center managed to recover more than

8,000 missing telephone numbers out of a list of about 15,000 (54.5 percent).

In total, three different waves of basic telephone reminders were made. The first two waves, of about

35,000 units each, were aimed at different units all belonging to the different segments into which the

sample was divided. In the third wave, of about 6,000 units, some units (31.7 percent) already present in

the first two waves were contacted again.

Table 2. Permanent Census of Enterprises 2022, telephone outbound reminders

The data in the Table 2 show that the effectiveness of reminders, measured by the number of respondents

after the reminder on the number of useful contacts, increases as the reminder date approaches the survey

due date.

Figure 9. Useful contacts by segment (percentage values, on the left). Respondents after the

reminder by segment (percentage values, on the right).

Segments number 3, 6 and 7 (see also paragraph 6) recorded the highest number of useful contacts while

segments number 1 and 2 recorded the highest number of questionnaires sent after the reminder (Figure

9). Segments number 3 and 4 also have a number of questionnaires sent after the reminder close to 50

percent, while segment number 5 registers the lowest percentage (27.0 percent).

6. Effectiveness of the centralized contact center service, for segments of the 2022 Permanent

business census sample

For data collection purposes, the sample was divided into 7 segments, according to the following

variables: dimension, previous registration to web portal, new entry in the web portal, questionnaire

type (long or short form). A detailed description of the identified segments follows.

1. Large economic units at national level (500+ employees) registered to web portal, with long

questionnaire (1,612 companies)

2. nationally relevant economic units (250-500 employees) registered to web portal, with long

questionnaire (2,282 companies)

3. Medium economic units (20-250 employees) registered to web portal, with long questionnaire

(62,521 enterprises)

Expert Meeting on Statistical Data Collection12 – 14 June 2023

9

4. Small economic units (10-20 employees) registered to web portal, with long questionnaire

(29,753 companies)

5. Other economic units (10 AND more employees) NOT registered to web portal, or NEW Portal

with long questionnaire (32,394 companies)

6. Micro economic units (less than 10 employees) registered to web portal, with short

questionnaire (44,054 companies)

7. Micro economic units (less than 10 employees) NOT registered to web portal, or NEW Portal

with short questionnaire (105,788 companies).

Response rates for each segment are reported in the following Figure 10. Particularly low are the

rates observed in segments 5 and 7, respectively at 28 and 36 per cent.

Figure 10. Response rates by segments of the Permanent business census 2022 (percent values).

Figure 11. Respondents to Permanent business census 2022, with and without assistance, by

segments (percent values).

Figure 11 shows that about 11% of respondent companies called the service in order to have assistance,

on average for 11 minutes. The use of the service especially concerned companies included in segments

5,7,1,6, (mainly micro and very large enterprises), while lower is the use of the service for small and

medium-sized enterprises. Furthermore, requests for assistance are more frequent among companies not

previously registered on the web portal. Concerning the content of the service requests by segment, larger companies that are already registered

on the web portal record more thematic (content of the questionnaires) requests (43 percent) while for

smaller units, notably those not previously registered on the portal, prevail the non-thematic content

requests (Figure 12).

94% 92%

76% 75%

28%

75%

36%

1 2 3 4 5 6 7

13%

8%

7%

10%

17%

12%

16%

11%

87%

92%

93%

90%

83%

88%

84%

89%

0% 20% 40% 60% 80% 100%

1

2

3

4

5

6

7

TOTALE

Respondent Without assistence Respondent With assistence

Expert Meeting on Statistical Data Collection12 – 14 June 2023

10

Figure 12. Respondents to Permanent business census 2022, by contents of service requests and by

segments (percent values).

To better understand the relationships between the variables: response rate, use of inbound service and

segment, a multiple correspondence analysis was carried out. The percentage of inertia explained by the

first two dimensions is equal to 18.46 percent and 12.93 percent respectively.

Figure 13. Analysis of multiple correspondences for segment, assistance and response to the

questionnaire of the Permanent business census 2022.

The analisys identified three main groupings pointed out in Figure 13. Group 1 reports segments

characterized by higher service rates associated with satisfactory response rates (effective service); Group

2 segments associated to a lower service rates and satisfactory response rates (autonomous businesses).

Group 3 shows segments with higher service rates associated with unsatisfactory response rates

(unresolved service).

57%

69%

84%

89%

93%

92%

93%

90%

43%

30%

16%

10%

6%

8%

7%

10%

1

2

3

4

5

6

7

TOTALE

Non thematic Thematic

GROUP 1

GROUP 2

GROUP 3

Expert Meeting on Statistical Data Collection12 – 14 June 2023

11

Figure 14. Respondents to Permanent business census 2022, by economic activity sector (percent

values).

The analysis of requests for assistance by respondents to the 2022 Permanent Business Census by macro

sector of economic activity shows a greater tendency to request assistance service support from companies

in the service sector (13 per cent) compared to the industry sector (10 per cent percent) (Figure 14). The

reasons of this difference may concern both the smaller average size of companies belonging to the service

sector and the experience of industrial companies to participate in surveys conducted by the National

statistical institute.

Figure 15. Respondents to Permanent business census 2022, by legal form (percent values).

Figure 15 shows that the use of inbound assistance is limited for the legal forms Partnership and Capital

company while it tends to be more significant for Individual entrepreneurs, freelancers and self-employed

persons and for cooperative companies. The other legal forms have a limited weight in the Permanent

Census sample.

10%

13%

11%

90%

87%

89%

0% 20% 40% 60% 80% 100%

Industry

Services

Total

Respondents Without assitance Respondents With assitance

14% 9% 9% 12%

20% 18% 11%

86% 91% 91% 88%

80% 82% 89%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1.1 Individual entrepreneur, freelancer and self-employed

person

1.2 Partnerships

1.3 Capital company

1.4 Cooperative

Society

1.5 Private law consortium and other forms of

cooperation between

companies

1.6 Economic public body,

special company and public service

company

1.9 Company or private

entity established abroad not otherwise classifiable

which carries out an activity

Respondent With Assistance Respondent Without assistance

Expert Meeting on Statistical Data Collection12 – 14 June 2023

12

7. Other innovative solutions to increase the efficiency of the data collection system

7.1 Use of alternative sources

Data collection through direct statistical surveys presents growing problems of sustainability, therefore it

must be increasingly agile and characterized by adequate participation rates and high quality standards.

This need implies a new approach to survey processes aimed at minimizing the burden on respondents by

resorting to direct surveys only when strictly necessary and at the same time reducing the amount of

information requested and more generally the effort required of the respondent. This approach implies

maximizing the use of alternative sources to those from direct surveys on businessess. In the field of

business surveys, the new sources to be used can be traced back to the following main types a) information

collected for administrative purposes; b) other information available mainly in the form of big data (in

particular data from web platforms, meter data, sensor data, etc.). As regards the first type, the activity

has been aimed at the acquisition and operational use in the direct surveys of new sources, interacting, in

coordination with the various players involved, with the administrations who hold them. The activity

includes an experimental phase, currently in progress, of analysis based on comparison with traditional

sources from direct detection. Then the phase of effective use in the investigation processes, replacing

traditional sources where possible, will be carried out. In this context, particular attention is paid to the

operational use of the electronic invoicing and fees available from the Italian Revenue Agency. In Istat,

the systematic acquisition of these sources is underway according to the timescales necessary for business

surveys. The activity presents various obstacles both at a regulatory level with the need to establish supply

agreements with supplier bodies, and at a technical level as it is necessary to identify the most suitable

environments for archiving the data received, and at a methodological level as it is necessary to confirm

the possibility of integrating data from direct surveys with data from administrative sources. Another

aspect to manage is that of the protection of personal data acquired through administrative sources. Once

fully operational, these sources can make significant lightening of the economic and structural surveys on

the turnover of both industry and services and those on retail sales.

With regard to the second typology, the first objective is to identify the information available and the

contexts of possible operational application in the field of economic surveys. That involves the use of

techniques aimed at extracting and making usable the new sources for the purposes of economic surveys.

Also in this case the objective, in a first phase, is to start an experimental activity aimed at combining the

information available from direct surveys with that obtained from new sources, with the aim of evaluating

its quality and responsiveness to information needs. The activity is attributable to the path identified in

Istat by the Roadmap for the production of Trusted Smart Statistics (TSS) [8, 16]. The Roadmap is a

strategic document that guides the implementation of operational programs prepared annually for the

production of new statistical products made with Big Data sources, typically through the use of new

technologies and methodologies. The activity is mainly oriented towards the dimensions of efficiency,

due to the automated integration of data sources and flows and the reduction of the statistical burden on

respondents. For the aforementioned objectives, the application of web scraping techniques and other

forms of web intelligence is also foreseen for the acquisition of information present on the web and useful

for the purposes of business surveys. A first example of application of these techniques concerns the

retrieval of information on multinational companies to be used for the reconstruction of production chains.

Other applications concern the automation of the coding processes of the products realized by companies

or of the economic activity sector, through the acquisition of the information available on the websites

and the use of machine learning techniques.

Finally, further fields of interest in order to implement the use of alternative sources concern the

application of interoperability techniques for the real-time acquisition of information available in a

structured form at other National Statistical System institutions and System-to-System (S2S)

communication for the digital acquisition of data available in the so-called smart industries (Industry 4.0).

Expert Meeting on Statistical Data Collection12 – 14 June 2023

13

7.2 Process automation techniques

The reduction trend of human resources employed in data collection processes, the availability of

increasingly trained and skilled human resources and the development of new technologies, in particular

connected to the use of Artificial Intelligence (AI), Web Intelligence (WI), as well as the opportunities

offered since the development of industry 4.0, the application of automation techniques of survey

processes has become an increasingly important aspect in the design of statistical business surveys [15].

Examples of applications currently under development in Istat concern the automation of some repetitive

phases of the assistance service offered to respondents, as well as automated procedures for the coding of

products realised by businesses and the sector of economic activity to which they belong. Of particular

interest are also the techniques aimed at acquiring information on the structure of multinational companies

and its changes over time as well as the reconstruction of production chains.

8. Conclusions: the prospects for the realization of an agile approach to direct official surveys

The centralized inbound and outbound contact center service represents a relevant issue in the

convergence towards an agile data collection process aimed at efficiency and compliance with user needs.

The service is an effective support to the CAWI technique as it helps to sustain response rates even in the

case of decreasing human resources employed in the data collection processes. In the specific case of the

2022 Permanent business census, 11% of respondents required the inbound service, for a total of about

11 minutes of average assistance. The share of requests for assistance varies by segment of the sample. In

segments 1 and 6 the role of inbound assistance was effective as it was accompanied by high response

rates. In segments 5 and 7 relatively high service demands were associated with very low response rates.

In these cases the service was of help but it is not enough and should be accompanied by complementary

solutions. In general, the greatest usefulness of the service tends to be concentrated on larger companies,

characterized by greater organizational complexity and a greater number of surveys, or on micro-

enterprises, less accustomed to participating in surveys and less equipped. But, for the latter it is necessary

to identify additional tools in order to support the participation rates.

The adoption of a CAWI technique supported by a contact center service is only one component of the

convergence towards an agile approach to official direct investigations of companies.In particular, the

continuing trend of decreasing human resources, the development of new technologies and the

increasingly pressing need to reduce the burden on respondents will open new challenges for producers

of official statistics. The prospect is the convergence towards a multi-source approach to data collection

aimed at making the most of the availability of alternative sources to those from direct surveys. In

particular, the primary sources will be those of an administrative nature, those in the form of big data and

those coming from the web. In this context, a greater role will be played by data science techniques aimed

at adequately exploiting these sources. Fundamental to achieving these objectives is the exploitation of

process automation technologies, in particular artificial intelligence and web intelligence. In this context,

direct survey will continue to play an important role, but only in situations where recourse to other sources

is not possible. They will also tend to take on a more agile form, with smaller samples, shorter

questionnaires, automation of data collection and simpler collection techniques, as CAWI with specialistic

services to support the user .

Expert Meeting on Statistical Data Collection12 – 14 June 2023

14

9. References

[1] Groves R.M. and Lyberg L. (2010), Total Survey Error: Past, Present, and Future Public Opinion

Quarterly, Volume 74, Issue 5, 2010, Pages 849–879, https://doi.org/10.1093/poq/nfq065.

[2] Biemer P. Total survey error design, implementation, and evaluation Public Opinion Quarterly,

Volume 74, Issue 5, 2010, Pages 849–879.

[3] Rivais L., St-Denis M., Lensen S. (2013), Centralising data collection at Statistics Canada. Seminar

on Statistical data collection. Unece - Conference of european statisticians.

[4] Saraiva dos Santos P., Moreira A. (2013), Creating a data collection department: statistics portugal's

experience. Seminar on Statistical data collection, Unece - Conference of European statisticians.

[5] Snijkers G., Haraldsen G., Jones J. & Willimack D. (2013). Designing and conducting business

surveys. John Wiley & Sons.

[6] Schouten B., Calinescu M. and Luiten A.. What are adaptive survey designs? In Survey Methodology.

Volume 39, Number 1. Statistics Canada. Code 12-001-X. June 2013.

[7] Istat (2016), Istat’s modernisation programme,

https://www.istat.it/it/files//2011/04/IstatsModernistionProgramme_EN.pdf (accessed 10 June 2023).

[8] European Statistical System Committee (2018) Bucharest Memorandum on Official Statistics in a

Datafied Society (Trusted Smart Statistics). Available at https://ec.europa.eu/eurostat/web/european-

statistical-system/-/dgins2018-bucharest-memorandum-adopted (accessed 10 June 2023).

[9] Binci S., Monetti F., Papa P., (2019) Centralised data collection: effects of a new administrative

penalties provision procedure in business short-term surveys EESW19, 6th European Establishment

Statistics Workshop, 24-27 Settembre 2019, Bilbao.

[10] Bavdaž M., Snijkers G., Sakshaug J. W., Brand T., Haraldsen G., Kurban B., & Willimack D. K.

(2020). Business data collection methodology: Current state and future outlook. Statistical Journal of the

IAOS, 36(3), 741-756.

[11] Bellini G., Monetti F., Papa P. (2020) The impact of a centralized data collection approach on

response rates of economic surveys and data quality: the Istat experience. Statistika, STATISTICS AND

ECONOMY JOURNAL VOL. 100 (1) 2020, ISSN 1804-8765 (Online) ISSN 0322-788X (Print).

[12] Salemink I., Dufour S., Van der Steen M. , (2020) A vision on future advanced data collection,

Statistical Journal of the IAOS 36 (2020) 685–699 DOI 10.3233/SJI-200658, IOS Press

[13] Binci S., Monetti F., Papa P., (2022) Data collection strategies in business short-term official

surveys: a balance between legal obligation and awareness, Workshop BDCM 2022 – Sixth International

Workshop on Business Data Collection Methodology13-15 June 2022, Oslo, Norway, Giugno 2022.

[14] Bellini G., Bianchi G., Di Paolo G.G., Papa P., (2022) Towards a selective automation process of

assistance to the survey units included in business surveys, Workshop BDCM 2022 – Sixth International

Workshop on Business Data Collection Methodology13-15 June 2022, Oslo, Norway, Giugno 2022.

[15] Bruni, R.; Bianchi, G.; Papa, P. Hyperparameter Black-Box Optimization to Improve the Automatic

Classification of Support Tickets. Algorithms, 2023, 16, 46. https://doi.org/10.3390/a16010046.

[16] Istat, Smart statistics from big data, Available at https://www.istat.it/en/analysis-and-products/smart-

statistics-from-big-data (accessed 10 June 2023)

Data collection strategy on an elusive population: technique, process design, monitoring indicators - Linda Porciani, Monica Perez, Federico De Cicco, Eugenia De Rosa, Francesca Inglese (Istat, Italy)

Languages and translations
English

Data collection strategy on an elusive population:

technique, process design and monitoring indicators

Monica Perez, Linda Porciani, Federico De Cicco – ISTAT|Directorate for Data Collection

Francesca Inglese–ISTAT|Directorate for Methodology and statistical process design

Eugenia De Rosa –ISTAT|Directorate for Social statistics and welfare

ISTAT|Italian National Institute of Statistics

12 - 14 June 2023

UNECE Expert Meeting on Statistical Data Collection 2023

Outline

ISTAT-UNAR project “Labour discrimination against LGBT+ people and diversity policies in enterprises”

 Mixed method (quantitative-qualitative) and multiperspective approach (stakeholders, enterprises, LGBT+ people)

 Different surveys and different target groups of LGBT+ people based on respondents’ self-identification

Experimentation of the Respondent Driven Sampling Technique (RDS)

 Survey Design: Sampling Technique and Web-questionnaire  Process design (2 step) and privacy concern  Monitoring indicators  Lessons learnt

• Provide insights on labour discrimination against LGBT+ people

• Hard to reach an invisible population: reticence of some people and underreporting of discriminatory phenomena

• Generalizability of results and sampling challenges: representative surveys on the LGBT+ population are difficult to carry

out mainly due to the absence of lists of people whose sexual orientation and/or gender identity are known

Surveys targeted at LGBT+ people: the challenges

Survey on individuals who are/have been in a

Civil Union (same-sex couples, over

21,000 people)

2020-2021

Survey on LGB people who have never been in a civil union,

through the Web Respondent Driven

Sampling and convenience sample

(more than a thousand of LGB respondents)

January-May 2022

Survey on trans and non-binary people,

through a non- probabilistic sample

in progress

3

1 2

3

Different data collection strategy for different targets within the LGBT+ population

4

Respondent Driven Sampling (RDS)| motivations

RDS strategy is helpful to reach the hidden population

1. The sampling strategy has a probabilistic approach

It combines the snowball technique with a mathematical model

(probabilistic) (Salganik e Heckathorn, 2004; Volz e Heckathorn, 2008)

2. The sample is based on social network of individuals of target population It starts with a sample of convenience It is respondent-driven: at each wave, respondents are used to select or drive the next sampling wave by selecting other individuals from the target population Through many sampling waves the dependence of the final sample on the initial sample is reduced The RDS sample inclusion probabilities are estimated assuming that the sampling process is a Markov chain

3. The RDS permits to make inference

It allows to make inference about the network structure and estimation on the target population

5

Web-based-Respondent Driven Sampling Survey

 Formative study

 Fifty LGBT+ association throughout the national territory identified first respondents (“seeds”) belonging to the population target

 Seeds must have some characteristics defined by Istat’ researchers (sex, age, sexual orientation, geographical area)

 Respondents play an active role in recruiting new respondents who belong to the target population and to their network of relationships

 A convenience sample as an exit strategy

Respondent Driven Sampling (RDS)| survey design

Each respondent invite other 4 potential respondents

6

They send the link to 10 first respondents («seeds») belonging to the population target

First OS national survey adopting RDS

a2 a3

50 LGBT associations start the sampling actions

Collaboration Istat – Associations

Fase 1Phase 1

Seeds invite other 4 respondents

Phase 2

At the end of the questionnaire, seeds share the link with 4 other potential respondents

Active role of the seeds

Phase 3

At the end of the questionnaire, respondents share the link with 4 other potential respondents

Active role of the respondents

Each respondent invite other 4 potential respondents

Phase n

At the end of the questionnaire, respondents share the link with 4 other potential respondents

Active role of the respondents

Respondent Driven Sampling (RDS)| on the field

….

….

Expected «seeds»: n. 500

Expected «respondents»: 500*4*exp (n-1)

A Web Survey on an elusive population: a focus on indicators to manage data collection process

M. Perez, L. Porciani, F. De Cicco, A. Vitalini

A system of indicators ables to monitor

• the strength of «seed» and network propagation [IND. A]

o No. Seeds active/ non active o No. Respondents active/ non active o No. Created chain

• the typology of the respondents [IND. B]

o Sex o Sexual orientation o Age group o Participation in Association

7

The effectiveness of the method and the duration of

the survey depend on the propagation capacity of

the network

If for any reason a participant decides not to

"propagate" because he/she becomes discouraged ,

loses confidence, loses referrals, etc. that node does

not produce offspring and the network reduces its

propagation effectiveness by limiting the

achievement of a satisfactory sample

Process Indicators| the key role in a RDS survey

8

Process Indicators| the findings

Start DC 26 Jan

End DC 31 May

29

590

725

990

1177

20

250 307

500

607

6 83 108

149 171

0

200

400

600

800

1000

1200

1400

Respondents Seeds Active seeds[IND. A] • Low activity of LGBT associations: 38

over 50

• Low participation of the «seeds»: 64% over the expected

• Low activity of seeds • 6,4 active seeds over the

expected 10 • 62% seeds without propagation

• 2,4 respondents by seed (versus the expected 4)

9

Process Indicators| the findings

Start DC 26 Jan

End DC 31 May

Respondents profile

• 61,0% males

• 78,7% homosexual

• 55,6% age group 18-34

• 38,5% participation to association life

• Seeds and other respondents are similar

• Homogeneity of the respondents

• Low distance from the first association

[IND. B]

From RDS to SNOWBALL

RDS doesn’t allow to make inference!

[IND. A + IND. B]

Change strategy

26 April • For opening link in LGBT associations webpage

Istat – LGBT Associations meeting

• For sharing in progress results

• For supporting the DC strategy

• For sharing the DC change

LESSON LEARNED

10

• Experimental procedures to manage privacy issues: two-step model; request of the respondent’s email address

-------------------------------- improving procedures and data collection tools to engage distrustful people

• Low knowledge of initial respondents (seeds) no chance of training the seeds; no incentives for the respondents

--------------------------------- improving communication and training, incentives, choice of seeds by researchers

• The recruitment of possible LGB respondents: is an indelicate operation?

--------------------------------- improving a better communication on privacy by design approach

• The LGB networks are too limited or fragmented, even at a territorial level?

------------------------------ improving the monitor indicators regarding network propagation

• Other survey based on RDS

--------------------------------- studying the possibilities to apply RDS to other population (foreigners?)

A System-to-system Data Communication Channel for a Multi-technique Data Collection Process: the Case of Italian Agricultural Census - Claudia Fabi and Maura Giacummo (Istat, Italy)

Languages and translations
English

A SYSTEM-TO-SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

Istat | Central Directorate for Data Collection & Central Directorate for Information Technology

Online, 12 - 14 June 2023

UNECE Expert Meeting on Statistical Data Collection 2023

Claudia Fabi – Maura Giacummo

o The context: the data collection design for the Agricultural Census

o The data communication system: issues and solutions

o Results

o Conclusions

Index

2 A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

The data collection design projected for the 7 th

General Census of Agriculture was based on an integrated

system, entirely on digital support, which offered the possibility to adopt three different survey techniques

simultaneously:

The Data Collection Design for the 7° Agricultural Census

3

o CATI (Computer assisted telephone interviewing), in both «inbound»

and «outbound» techniques;

o CAWI (Computer assisted web interviewing);

o CAPI (Computer assisted personal interviewing).

A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

4

The data collection process was based on

SGI, an online management system

developed by Istat, with many features

dedicated to the networks involved, in

order to monitor, follow and evaluate

the work in progress with respondents.

CATI software for telephone interviews

was based on a different System,

developed and used by a Contact Center

outsourcer, who support the data

collection process providing nearly 400

telephone interviewers.

It was necessary to develop an

asynchronous data flow, to merge

contacts and outcomes from the two

management Systems.

The Data Collection Design for the 7° Agricultural Census

A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

The census list included approximately 1,700,000 units, found through the use of Administrative Registers in

the agricultural sector. The entire list was divided, before the start of the survey, into two subgroups intended

for a pre-assigned survey technique: CATI or CAPI.

However, the pre-assignment was not strictly binding, but of a preferential nature. The criteria for pre-

assignment were mostly based on the presence or absence of one or more telephone numbers. All

respondents were able to choose to participate also through one of the two techniques available on

individual initiative.

5

The Data Collection Design for the 7° Agricultural Census

A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

Schedule of data collection activities

From 7th January 2021 To 30th July 2021

Techniques available

on individual initiative

CAWI

Inbound CATI

Pre-assigned

Techniques

CAPI

Outbound CATI

As mentioned before, the survey was

attested on two distinct IT architectures,

the first dedicated to CAPI and CAWI

techniques developed by Istat, the second

to CATI techniques developed by an

external outsourcer.

Consequently, it was necessary to design

a communication channel between the

two IT systems that would allow the

results and the status of the

questionnaires’ compilation to be kept

updated, almost on a daily basis.

The efficiency and punctuality of these

operations were the key for the success

of the field survey, ensuring that the

simultaneous data collection techniques

could effectively be interchangeable with

each other.

A System-to-System Data Communication Tool: the Project

6 A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

The design process of an automated data exchange system, through files in a predefined format, coming

from SGI to the CATI system and vice versa, lasted about 6 months before the start of the survey.

The design specifically concerned the following issues:

a) scheduling of data exchange frequency: twice a day, in the morning and after lunch time;

b) exchange file format: ASCII format, .txt delimited by “pipe”;

A System-to-System Data Communication Tool: the Project

7 A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

c) content of the exchange files: a minimal content, nearly 10 variables,

needed to merge contacts and outcomes from SGI to CATI and vice versa,

rebuilding the contacts history of every farm/respondent in both Data Collection

Management Systems;

d) nomenclature of exchange files: strictly binding, with an alphanumeric

nomenclature that identifies the direction of the transmission (from CATI to SGI,

or from SGI to CATI), the date and the time related to the outcomes included in

every file;

e) quality control strategies for exchange files and identification of anomalous

records;

f) recovery plan, in case of failure of the exchange procedures.

One of the most important goal to be achieved, in a System-to-System

Data communication, is to guarantee Data Quality and to find a right trade

off between quality checks and quality data.

The system managed three type of check:

• 1st level: to evaluate whether to accept or reject the entire file (types of

controls: Correctness of the file name or structure);

• 2nd level: to exclude non-compliant records and accept the compliant

ones (some types of controls: questionnaire completed in other

techniques, correctness of identification code, coherence of date with

submission time window and name of the file);

• 3rd level: to manage the creation of the file to send back to the

outsourcer, only if 1 st

level check were ok.

A System-to-System Data Communication Tool: Data Quality

8 A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

There was also a recovery plan that allowed to

guarantee a quickly restore of data in case of errors.

An automatic system sent emails to a control team,

composed of Istat personnel and outsourcer personnel,

in case of critical failure events (such as empty or

incorrect deliveries).

The system was designed to repeat the processing of

any incorrect delivery, both entirety or partially, and to

recover the acquisition of more than one failed

delivery simultaneously.

Processed files and logs were stored in a dedicated

file system.

A System-to-System Data Communication Tool: the Recovery Plan

9 A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

Results: Daily records exchange

10

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0 8 /0 1 /2 0 2 1

1 5 /0 1 /2 0 2 1

2 2 /0 1 /2 0 2 1

2 9 /0 1 /2 0 2 1

0 5 /0 2 /2 0 2 1

1 2 /0 2 /2 0 2 1

1 9 /0 2 /2 0 2 1

2 6 /0 2 /2 0 2 1

0 5 /0 3 /2 0 2 1

1 2 /0 3 /2 0 2 1

1 9 /0 3 /2 0 2 1

2 6 /0 3 /2 0 2 1

0 2 /0 4 /2 0 2 1

0 9 /0 4 /2 0 2 1

1 6 /0 4 /2 0 2 1

2 3 /0 4 /2 0 2 1

3 0 /0 4 /2 0 2 1

0 7 /0 5 /2 0 2 1

1 4 /0 5 /2 0 2 1

2 1 /0 5 /2 0 2 1

2 8 /0 5 /2 0 2 1

0 4 /0 6 /2 0 2 1

1 1 /0 6 /2 0 2 1

1 8 /0 6 /2 0 2 1

2 5 /0 6 /2 0 2 1

0 2 /0 7 /2 0 2 1

0 9 /0 7 /2 0 2 1

1 6 /0 7 /2 0 2 1

2 3 /0 7 /2 0 2 1

The amount of information exchanged daily can be useful to evaluate the size of the architecture and the data

space needed to use a similar procedure to other surveys.

During 7 months, 412 files were exchanged, containing over 6 million records.

10,000 records is the average number of records processed daily, with peaks of 80,000 records during

particularly intense fieldwork periods.

A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

Results: Data Quality

11

Only three files were rejected at the first level of check, recovered by the outsourcer with new files very quickly.

The second level check found 45.351 errors (0.8%):

• 73% coming from the change of technique, these were not true error but discrepancies due to the lack of

a perfect synchronization between techniques

• 27% true errors that lead to records rejection (0.2% based on 6 million records exchanged).

All errors derive from the following data coherence issues:

- internal date not compatible with the date of the file

name;

- outcomes’ codes not compatible with the technique (e.g.,

an outbound outcome code for an inbound attempt, etc.).

A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

Results: CATI to CAPI transitions during the fieldwork

12

The farms pre-assigned to CATI

technique were 550,000 (32% of

the census list). Among them, only

282,536 interviews were

completed using the CATI

technique, slightly over 50%.

The others preferred to fill the

questionnaire with another

technique: around 16% to CAWI

and 24% to CAPI.

Thanks to the daily data exchange

flow to CAPI network, it was

possible to recover 24% of the

former CATI interviews, which

otherwise would have been lost.

A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

Despite the limitation of representing an

approximation of real-time

synchronization, the asynchronous update

has ensured a satisfactory smoothness in

the data collection process both of CAPI and

CATI networks, while offering respondents

wide discretion to use autonomous or

assisted compilation tools.

This integration has constituted an

unprecedented innovation for the multi-

technique surveys carried out by Istat,

allowing a real concurrent multi-technique

approach.

Conclusions

13 A SYSTEM -TO -SYSTEM DATA COMMUNICATION CHANNEL FOR MULTITECHNIQUE SURVEYS: THE CASE OF ITALIAN AGRICULTURAL CENSUS

Over time, the continuous

implementation of modules and

functional structures for managing Istat

surveys will likely lead to the

availability of a fully integrated

Management System. This System

will be available not only to Istat users

but also to outsourcers who will be

required to operate on it in perfect

synchronization with other techniques

and data collection networks.

Thanks!

CLAUDIA FABI | [email protected]

MAURA GIACUMMO | [email protected]

UNECE 23 – Topic 1: “Process Automation and Efficiency”

A System-to-System Data Communication Channel for a Multi-

technique Data Collection Process: the Case of Italian Agricultural

Census

Claudia Fabi, ISTAT, Rome, Italy – [email protected]

Maura Giacummo, ISTAT, Rome, Italy – [email protected]

Index

1. The reference context: the 7th General Census of Agriculture ................................................................... 2

2. Design of the System-to-System data communication channel................................................................. 4

2.1. The structure of the interchange files ..................................................................................................... 5

2.2. Data quality control ................................................................................................................................ 6

2.3. Failure recovery plan .............................................................................................................................. 7

3. Results ....................................................................................................................................................... 8

4. Conclusions ............................................................................................................................................. 10

1 Paragraphs 1., 2., 2.1., 4. 2 Paragraphs 2.2., 2.3., 3., 4.

1. The reference context: the 7th General Census of Agriculture

Between January and July 2021, the seventh General Census of Agriculture was surveyed, the last

one before the transition to the Permanent Census also for the agricultural sector. The only link with

tradition, however, was the inclusion in the survey of all the farms who were compliant with the

definition harmonized by Eurostat3. This census, in fact, had a strong innovative connotation as far

as the data collection phase, the survey design, the data collection networks involved and the

techniques used to fill in the questionnaires.

Specifically, for the first time in an Italian census survey, the respondent had a wide choice of how

to complete the questionnaire, guaranteeing the simultaneous presence of different survey techniques:

CAPI, CATI both inbound and outbound, and CAWI.

The census list, i.e. the Reference Universe for the survey, included approximately 1,700,000 units,

coming from the use of Administrative Registers, also partially provided by Entities external to Istat.

Before starting the survey, the census list was divided into two subsample each pre-assigned to a

specific survey technique: CATI or CAPI. However, the pre-assignment was not strictly binding, but

preferential, to facilitate the organization of the CAPI and CATI networks, so that they could plan

their work on the basis of a predictable workload.

Furthermore, since the start of the survey, all respondents have been able to choose to fill in the

questionnaire also through one of the open access techniques:

- CAWI: by self-compiling the questionnaire on a web application developed by Istat;

- Inbound CATI: requesting a telephone interview to the Istat toll-free number or by sending

an SMS or a WhatsApp message to a dedicated SIM.

Table 1.1 – Data collection design for the 7th Agricultural Census

Therefore, from the first day of the survey and until the end of the data collection, respondents had

the opportunity to connect to a dedicated Istat website using their own username and password to fill

in the questionnaire or to call the toll-free number and fix an appointment for a telephone interview,

at their convenience. In the meantime, however, CAPI and CATI networks have started to work trying

to reach those farms that had not already taken steps independently to participate. Interviewers begun

to contact respondents, scheduling appointments and proceeding with interviews.

Furthermore, the success of the CATI technique is strongly correlated to the quality (completeness

and update) of the telephone numbers in the list. To avoid the increase of non-respondent farms, the

design of the data collection expected to transfer from CATI to CAPI the farms who were unreachable

by telephone, in particular those who were unavailable at the phone for long time (more than 30

contacts without response) or if the telephone numbers available prove to be incorrect or non-existent.

This transition from CATI to CAPI was ongoing throughout the survey, both to avoid dispersing

census units for reasons related to the quality of the source list, and to allow the CAPI network to

receive new farms in time to try to find them on the territory.

3 See EU Regulation 2018/1091 (art. 2 paragraph a) for the definition of farm included into census survey.

Schedule of data collection activities

From 7th January 2021 To 30th July 2021

Techniques available

on individual initiative

CAWI

Inbound CATI

Pre-assigned

Techniques

CAPI

Outbound CATI

Figure 1.2 summarizes the subjects and instruments involved in the survey. Top left, the box

representing the IT architecture capable of supporting and sustaining survey activities: specifically, a

synergy between SGI, Istat's Survey Management System, PANDA, the data acquisition system and

Microstrategy, the monitoring and reporting system.

Figure 1.2 – Scheme of the multi-technique design for the 7th General Census of Agriculture

In the lower part of the figure, the subjects involved in the data collection networks, divided between

the CAPI network - supported by the Agricultural Assistance Centers on the territory, and the CATI

network - centralized in a contact center external to Istat.

The two management systems (SGI with the CAWI and CAPI techniques, and the outsourcer's

management and data acquisition system, with the CATI inbound and outbound techniques) were

independent. The first aim was the design of a communication module between the two Systems, so

that the work of the interviewers could be as synchronized as possible, even if in fact operating on

separate platforms.

The synchronization process kept the archives containing the survey results updated for both IT

structures, CAWI-CAPI and CATI. This has allowed, for example, to avoid to contact again all the

farms that have yet chosen to fill in the questionnaire with the CAWI technique, even if they belong

to the subsample assigned to the CATI technique. Similarly, the CAPI assigned farms that chose to

call the toll-free number and book an interview with a telephone interviewer were reported to the

CAPI network, to prevent them from being further disturbed by face-to-face interviewers.

It is easy to understand that the possibility of effective data exchange between the two software

architectures was the key to the entire census operation. In the absence of an effective, timely and

functional synchronization between data collection Systems, in a few days the CAPI and CATI

subsamples would have been affected by duplications. CAWI and CATI inbound respondents would

be subjected to continuous contact attempts even after having filled in their questionnaires with other

techniques.

Furthermore, it is precisely through this system-to-system data communication channel that the farms

unreachable by telephone were reassigned to the CAPI technique. In this way, the data collection

process gained a further possibility of optimizing the use of survey techniques, attempting where

possible the telephone interview first, certainly faster and less expensive, and, secondly, switching to

the CAPI technique, only when the intervention of face-to-face interviewers was really necessary.

2. Design of the System-to-System data communication channel In the start-up phase, a great attention was dedicated to the design, the construction and the testing of

a bidirectional communication flow between the two Systems. The communication, based on ASCII

files with shared encodings, ran twice a day to complete the synchronization operations, in pre-set

time and without any need to interrupt data collection operations during the update.

An element of complexity was represented by the need to show the results of the CATI survey in the

web application, SGI, acquiring a series of basic information related to the daily telephone contacts

made by the CATI interviewers. At the same time, the CATI System had to be constantly aligned

with the data coming from the activities carried out by the CAPI network and from the online

compilations carried out directly by the respondents.

Graph 2.1 – System to System data communication scheme adopted for the survey

The design process of an automated data exchange system, through files in a predefined format,

coming from SGI to the CATI system and vice versa, lasted about 6 months before the start of the

survey.

The design specifically concerned the following issues:

a) scheduling of data exchange frequency;

b) exchange file format;

c) content of the exchange files;

d) nomenclature of exchange files;

e) quality control strategies for exchange files and identification of anomalous records;

f) recovery plan, in case of failure of the exchange procedures.

In order to avoid a system slowdown, the twice a day data exchange was scheduled in hours in

which it was predictable to have less data traffic on the Systems, considering that the daily work of

the interviewers would probably have led to a daily recording on the order of tens of thousands

records. The data exchange hours have been set as follows:

- 1st synchronization: 06:00-08:00 a.m.

- 2nd synchronization: 01.00-03.00 p.m.

2.1. The structure of the interchange files In designing the structure and content of the interchange files, it was intended to pursue an objective

of simplicity, completeness and non-redundancy, including only the information indispensable for

the purposes of synchronizing the management Systems. This in order to keep the time required for

automatic data transmission between the two systems to a minimum, reducing the size of scheduled

sending.

The following table shows the list and characteristics of the transmitted variables.

Table 2.2 – The structure of the interchange files Variable name Description Notes

PROGR_REC Progressive number of the record Identification code of the record in the

current file

COD_IDENTIFICATIVO Identification code for the farm Identification code of the farm included in

the census list: the identification code was

assigned before the start of the survey and

didn’t admit duplications

FLAG_CATI CATI pre-assignment flag Allow to recognize if the farm was pre-

assigned to CATI technique

STATO Questionnaire status in Istat SGI

ESITO_CHIU Definitive outcome Allow to recognize if the outcome

transmitted was or not definitive, meaning

that the farm should not be contacted

anymore (“0” by default, “1” means “no

more contact”)

ESITO_OUT CATI outbound outcome in detail Outcome code in detail: this code was

intended to be used to update the CATI

outbound database or viceversa

ESITO_IN CATI inbound outcome in detail Outcome code in detail: this code was

intended to be used to update the CATI

inbound database or viceversa

DESC_ESITO Outcome description Textual description of the outcome code

DATA Date (day and time) Day and time in which the outcome has been

recorded

TECNICA Outcome technique A code that identifies the data collection

technique in which the outcome has been

recorded

The coding of each variable allowed to uniquely link the information of a contact with a farm with

its own "history" of contacts, integrating it chronologically and recording for each attempt also the

data collection technique used.

The need to make information on contact attempts and the respective outcomes mutually

interchangeable has also led to the design of outcome table code by technique that are as consistent

as possible (meaning that a coding scheme meaning "same code" = "same outcome" in each

technique). This allowed to simplify the interpretation of the file content and drastically limited the

need for additional data transmission recoding post processes.

Particular attention was also dedicated to the non-trivial aspects of nomenclature, which would allow

the automatic interpretation of the file content by the acquisition and synchronization batches.

Specifically, the name of each file included:

- two initial letters identifying the direction of the exchange, specifically the letter “C” to mean the

outbound and inbound CATI technique (therefore the outcomes recorded by the outsourcer) and the

letter “S” to mean the Istat management system, for the CAWI and CAPI techniques. Therefore a file

with a name starting with “CS” is a file that contains CATI outcomes intended for updating SGI,

while a file with a name starting with “SC” is a file that contains CAWI/CAPI outcomes intended to

update the CATI system;

- an explicit reference to the date, in the format “yyyymmdd”, corresponding to the day on which the

outcomes contained in the file occurred;

- a letter identifying the file produced and transmitted during the night "M" and the file produced in

the early afternoon "P".

This nomenclature always produces not duplicated names, so that it was never possible to overwrite

them. It was also easy to identify and recover any tranches of processing not performed automatically

by the Systems due to technical problems.

Even the transmission of files has been automated through sFTP batch, capable of copying the files

produced in predetermined destination folders, towards which the automated synchronization

procedures of the respective systems have pointed.

2.2. Data quality control The policies implemented for quality data control provided by the external outsourcer were based on

three levels that intervene in different parts of the process. In details:

- 1st level of check: This level evaluated the file received from the external outsourcer as a

whole and determining the acceptance or the rejection of the entire delivery;

- 2nd level of check: This level analysed every single record contained in each received file

and, in case of failure, excludes non-compliant records and accepts the compliant ones;

- 3rd level of check: This level prevents the generation of the return synchronization file when

the received file did not successfully pass the 1st level checks.

In general, the strategy chosen was to minimize discrepancies, reducing them to critical situations

only, without compromising the quality of the transmitted information. The following are the controls

performed at each of the three levels described above.

The first level of check performed the following controls:

- Correctness of the file name according to the predetermined and expected nomenclature;

- Correctness of the file structure according to the predefined and expected record layout.

For the control of each record, it was possible to identify two additional categories of problems: one

attributed to an overlap of response techniques and the other resulting from mapping errors or

information coherence. The first case is a discrepancy not related to any error but simply linked to

the possibility that the respondent could complete the questionnaire using more than one technique,

due to the lack of perfect synchronization between Systems. The second type derives from errors

attributable to incorrect content in the file received by the external outsourcer. In detail, the second

level included the following controls:

- Presence of a farm for which the questionnaire was already completed using a different

technique;

- Correctness of the exchange identification code;

- Coherence of the data sent by the external outsourcer;

- Consistency between the sent outcome and the compilation technique declared by the external

outsourcer (inbound or outbound);

- Compatibility of the record's processing date with the data submission time window.

Finally, the third level of check managed the decision for the generation of the return synchronization

file. The creation of the file, intended to update the CATI system, stopped if the first level blocked

the input file. Blocking the generation of the file was necessary, if the received file was rejected

during first level of check, to avoid synchronizations that did not take into account the received

information (and thus the outcomes). In fact, the return file would have been formally correct but

lacking in terms of completeness and level of updating since it could not consider the outcomes of

the attempts recorded by the CATI technique on the previous day.

The external outsourcer sent all the outcomes of all contact attempts made on the farms to allow for

archiving and monitoring all the work performed. It was a further issue because it was plausible to

have multiple records corresponding to each telephone attempt made by the CATI network on the

previous day for the same farm. In this case, the procedure updated the data by choosing to

synchronize the databases with the information from the most recent attempt and archived the other

contact attempts in the contact history. The tables updated were the same ones used by SGI. Since

these operations were carried out while keeping the application online and without interrupting the

work of the survey networks, it was necessary to manage data concurrency. The figure below shows

the expected workflow with the data and tables.

Graph 2.3 – Data Workflow

2.3. Failure recovery plan The adoption of a system-to-system data communication tool, based on a synchronization protocol,

required a recovery plan that guaranteed a quickly restore of data in case of errors.

The first action taken was activing a critical failure events alerts (such as empty or incorrect

deliveries) using an automatic system that sent emails to a control team composed of Istat personnel

and external outsourcer personnel. This allowed having a real-time notification to operational staff

about the issues encountered without the need to access logging systems or monitoring applications.

Furthermore, all control procedures were designed to repeat the processing of any incorrect deliveries,

both entirety or partially.

If it was necessary to recover the acquisition of more than one failed delivery simultaneously, it was

planned to process them sequentially, starting from the oldest and moving on to the most recent one,

updating the database with the latest information and archiving all the others. All files sent by the

external outsourcer, once processed, were stored in a dedicated file system.

3. Results

The two main purposes of the communication and synchronization architecture were: the need to

update the outcomes of the agricultural companies' questionnaire, with all the attempts done, and the

possibility to reassign to the CAPI technique those farms that couldn't be reached through CATI

technique due to issues with the quality of telephone contacts in the census list.

Let's start by reporting the amount of information exchanged daily. This data can be useful to evaluate

the size of the architecture and the data space dedicated if a similar procedure would be re-used for

other survey.

During the survey, 412 files were exchanged, regarding over 6 million records. The average number

of records processed daily was around 10,000, with peaks of 80,000 records during particularly

intense fieldwork periods. The graph below shows the daily trend of the quantity of records

exchanged, in sending and receiving files.

Graph 3.1 – Number of record processed daily by the System

Regarding of data quality, the first level check rejected only three files, recovered by the outsourcer

with new files very quickly.

The second-level check traced during the processing phase amounted of a total of 45,351. The record

attributed to the change of technique were 33,065. These, corresponding at the 73%, indicated that

the discrepancies were not due to processing errors but rather to questionnaires completed using other

techniques. The percentage of true errors is 27% of the total number of records rejected, due to two

specific data coherence issues:

- the operation date was not compatible with the date entered in the file name;

- the outcomes were not compatible with the technique, depending on an inconsistency between the

transmitted outcome code and the data collection technique (e.g., an outbound outcome code for an

inbound attempt, etc.).

This result is encouraging because it demonstrates that the real-time synchronization issues between

techniques did not pose a significant problem. Furthermore, the errors derived from coherence checks

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0 8

/0 1

/2 0

2 1

1 5

/0 1

/2 0

2 1

2 2

/0 1

/2 0

2 1

2 9

/0 1

/2 0

2 1

0 5

/0 2

/2 0

2 1

1 2

/0 2

/2 0

2 1

1 9

/0 2

/2 0

2 1

2 6

/0 2

/2 0

2 1

0 5

/0 3

/2 0

2 1

1 2

/0 3

/2 0

2 1

1 9

/0 3

/2 0

2 1

2 6

/0 3

/2 0

2 1

0 2

/0 4

/2 0

2 1

0 9

/0 4

/2 0

2 1

1 6

/0 4

/2 0

2 1

2 3

/0 4

/2 0

2 1

3 0

/0 4

/2 0

2 1

0 7

/0 5

/2 0

2 1

1 4

/0 5

/2 0

2 1

2 1

/0 5

/2 0

2 1

2 8

/0 5

/2 0

2 1

0 4

/0 6

/2 0

2 1

1 1

/0 6

/2 0

2 1

1 8

/0 6

/2 0

2 1

2 5

/0 6

/2 0

2 1

0 2

/0 7

/2 0

2 1

0 9

/0 7

/2 0

2 1

1 6

/0 7

/2 0

2 1

2 3

/0 7

/2 0

2 1

rather than non-existent data in the outcomes or identification codes. This allowed for quick recovery

of the rejected records.

The achievement of this result was the outcome of an intensive startup effort aimed at making the

external outsourcer as independent as possible in managing its own outcomes and configurations.

Regarding the synchronization efforts that facilitated the transition of agricultural companies assigned

to CATI technique but not reachable by phone, to the CAPI technique, the following data give a

comprehensive overview of the role of CATI technique in the Agricultural Census.

The initial sample consisted of 1,699,942 farms, the units selected for the CATI technique based on

the previously mentioned criteria and delivered to the external outsourcer before the start of the survey

amounted to 550,000 units, representing 32% of the census list.

At the end of the survey, interviews completed using the CATI technique were 282,536, slightly over

50% of the assigned units. The remaining units moved towards other techniques based on conscious

choices by respondents, who opted to complete the questionnaire via self-administered CAWI or by

referring to the CAPI network.

The number of farms with incorrect or non-existent phone contacts was particularly high, with

116,890 companies, approximately 21% of those assigned to the CATI technique. This represents a

significant amount that strongly affected the effectiveness of the CATI technique throughout the

survey period. However, thanks to the possibility of synchronization between management systems,

concurrent multi-technique utilization was made possible, optimizing the almost "real-time" use of

techniques and, most importantly, avoiding abandoning units with incorrect phone contacts even if

initially assigned to the CATI technique. This ensured the opportunity to mitigate the impact of the

poor quality of phone contacts in the census list, allowing agricultural companies to participate in the

survey using other techniques immediately and in a completely transparent manner for the

respondents.

The graph below illustrates the distribution of units initially assigned to the CATI technique: as

shown, in 16% of cases, respondents opted for self-administered CAWI, while another 24% were

captured through the CAPI network.

Graph 3.2 –effective response technique of CATI assigned farms

51%

8%

24%

1% 16%

0%

effective response technique of CATI assigned farms

CATI OUTBOUND closed without filled questionnaire

CAPI CATI INBOUND

CAWI No compiled questionnaire

4. Conclusions

Despite the limitation of representing an approximation of real-time synchronization, the

asynchronous update, scheduled to occur automatically at predetermined times without the need to

stop data collection operations during the update, has ensured a satisfactory smoothness in the data

collection process both of CAPI and CATI networks, while offering respondents wide discretion to

use autonomous or assisted compilation tools.

In general, both the possibility of establishing computerized dialogue between different Systems and

the ability to acquire data from any physical location where the interviewer is located represent forms

of adaptive evolution of survey instruments. They are increasingly necessary as data collection

activities must meet the respondents' needs, their preferences for one communication channel with

Istat over another, their availability of time, and their geographical distribution. These adaptations are

essential for successfully maintaining contact with the respondents and obtaining their indispensable

cooperation.

The integration, although it did not allow for the real-time import of CATI contact outcomes or the

immediate export of CAPI contact attempts or self-completion accesses, constituted an unprecedented

innovation for the multi-technique surveys carried out by Istat. These surveys are typically designed

to allow either sequential or concurrent multi-technique approaches but on predefined and non-

permeable subsets of the population.

Unfortunately, the "near real-time" synchronization, while representing an important technological

and organizational innovation for census surveys at Istat, is still an approximation of what would

constitute the optimal approach for synchronous multi-technique surveys. The optimal approach

would involve centralizing technical and operational management in a single computerized

instrument developed by Istat.

Over time, the continuous implementation of modules and functional structures for managing Istat

surveys will likely lead to the availability of a fully integrated Management System. This System will

be available not only to Istat users but also to outsourcers who will be required to operate on it in

perfect synchronization with other techniques and data collection networks.

For the 7th General Agricultural Census, a single System architecture was not yet available. This

certainly resulted in a greater deployment of human and technological resources to compensate for

the lack of complete synchronization between the various systems. However, it marked the beginning

of a direction towards the surveys of the future.

Data collection methods to produce new enterprise variables using new data sources - Francesco Scalfati, Gianpiero Bianchi, Sergio Salamone (Istat, Italy)

Languages and translations
English

Data collection methods to produce new enterprise variables using new data sources

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

UNECE Expert meeting on Statistical Data Collection

(12 -14 June 2023)

Scalfati F., Bianchi G., Salamone S.

ISTAT (Italy)

https://statswiki.unece.org/x/MADUE

Objective

Data Collection strategy

Data Collection Process

Experimental result

Conclusion

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Outline

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Objective

The aim of this work is to produce a

statistical framework able to extract

detailed information on the innovative

capacity of enterprises and produce new

statistical variables, by means a data

analytics approach.

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Data Collection strategy (1/2)

This work combines multiple sources (big data,

survey data and registers) in order to produce

indicators that provide the profile of the enterprises.

In particular, the identification of the patenting

enterprises allows linking them to the structural

characteristics and provides additional dimensions

available for this goal.

The data source used, is the most complete and

updated database on patents published by the

European Patent Office (EPO) which it acquires data

from the EPO's master bibliographic database. The

target data in EPO are the applicants based in Italy

published patent/s.

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Data Collection strategy (2/2)

The planned statistical output has as reference

population the active enterprises available from the

Italian National Business Register (ASIA).

The proposed approach for collecting statistical

information on the innovative capacity of enterprises

acquires European patent publications in text

format using APIs and web scraping techniques.

It integrates the extracted information with

statistical registers and surveys and produces new

statistical output by using text mining and machine-

learning techniques.

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Data Collection Process

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Data characteristics

❑ Name of the applicant, owner and inventor

❑ Localization information on the residence of the three

subjects

❑ Type of patent

❑ Date of publication of the patents

❑ Patent filing date

❑ IPC code (International Patent Classification)

The procedure collects the following macro

variables:

All data collected refers to the geographic origin of the

applicant/owner (country of residence).

Data collection methods to produce new enterprise variables

using new data sources

12 -14 June 2023

Data integration

Integration step is based on record linkage procedure to

match micro-data on patent application from the EPO

server with the data available from the Italian Official

Business Register (ASIA).

Availability of data on an annual basis is preliminary to

allow the subsequent integration phase.

For the match between the two sources it is necessary to

know the year of publication of the patent to identify

whether the company was active in the reference year.

Data collection procedure must to extract complete

information, without duplicates in order to allow unambiguously identification.

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Experimental results

In the case study 8000 URLs have been extracted from

EPO DB.

Each record is composed of about 40 variables:

proponent (applicant, owner, inventor), personal data,

type of patent, patent features, references, claims

Data refers to Italian patents

The procedure acquired the related patents from the

European Publication server.

Some output indicators: rate of proponent, rate of

patents, territorial distribution, thematic distribution

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Conclusions

The innovative capacity of enterprises and institutions,

can be filled with indicators that provide the profile of the

enterprises.

The patenting enterprises allow to produce new

information by linking with structural and economic

characteristics.

This automatic approach reduce the burden on

enterprises.

Patent statistics are effective proxies for measuring and

monitoring innovative activities spread across a territory.

It’s a difficult task because extracts text from website

and uses text mining and machine learning techniques

to produce new statistical variables in reasonable time.

Data collection methods to produce new enterprise variables using new data sources

12 -14 June 2023

Contacts: Scalfati Francesco ([email protected]) Bianchi Gianpiero ([email protected]) Sergio Salamone ([email protected])

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11

First analysis on reorganizing the data collection process of the Italian business account survey - Viviana De Giorgi, Valeria Tomeo, Alberto Canonico (Istat, Italy)

Languages and translations
English

Expert Meeting on

Statistical Data

Collection

12 – 14 June 2023

First analysis on reorganizing the data collection process of the business accounts survey Viviana De Giorgi1, Valeria Tomeo2 Alberto Canonico3

Draft – Extended abstract

The Business Accounts Survey conducted by Istat underwent significant changes in its data

collection process during 2021. Previously, the survey relied on two Excel questionnaires, one for

the Large Enterprises (LE) census survey, targeting enterprises with 250 or more persons employed,

and the other the Small and Medium-sized Enterprises (SME) survey, which sampled enterprises

with less than 250 persons employed. Istat transitioned from this Excel-based data collection

approach to a Computer-Assisted Web Interviewing (CAWI) method, utilizing a web-based

questionnaire.

As a basic part of the process, the survey administrators, in collaboration with subject matter

experts, reorganized the questionnaire items, in compliance with the internal user requirements

and the relevant European regulations for structural business statistics. Specifically, the

collaboration focused on the questions related to the income statement, the employment,

personnel costs and investments, aiming to make the details of principal variables more consistent

with the requirements. Previously unconsidered details were added, especially for the SME survey,

and some definitions were modified. Furthermore, the content of the two questionnaires, LE and

SME, was harmonized in terms of definitions, labels used, and coding structures. Indeed, this

solution made it possible to use the same edit process for both surveys, eliminating the need for

separate applications that operated on different systems. This resulted in significant time and

resource savings. Additionally, the ability to implement the same rules symmetrically for both

surveys further simplified the process.

Efforts were also made to define the underlying rules for questionnaire items, which was an

innovative approach compared to the Excel questionnaire, where rules were applied only to

collected data. This was achieved by introducing tool-tips to aid in understanding and completing

the questionnaire on line, incorporating checking rules to identify and address blocking and non-

blocking errors, and designing sections tailored to specific types of enterprises. In order to assist

respondents during the survey completion process and enhance the user experience, 209 tooltips

have been implemented. The questionnaire comprised a total of 76 coherence checks aimed at

ensuring the consistency of the responses provided by the participating enterprises. Within this set

1 Istat - Directorate for Economic Statistics – [email protected] 2 Istat - Directorate for Economic Statistics – [email protected] 3 Istat - Directorate for Data Collection – [email protected]

of checks, 52 were implemented as blocking measures, meaning that the failure to meet the

required criteria led to the interruption of the questionnaire completion process. The remaining 24

coherence checks were non-blocking, allowing respondents to continue with the questionnaire even

if inconsistencies were detected. These checks provide a preliminary verification of the responses

and may require further analysis and rectification. The LE survey, involving 3,991 enterprises for the

reference year 2021, received back 3,418 questionnaires, out of which 531 had non-blocking errors.

The SME survey, which sampled 77,604 enterprises in 2021, received back 33,613 questionnaires,

out of which 5,855 had non-blocking errors.

Another example of improvement is the possibility to append the data from both surveys and use

them together. In the past, with the old questionnaires, this procedure required a lot of work and

was complex to carry out. However, thanks to the new organization of items and the uniformity of

data collection, it has become possible to combine the data from both surveys more efficiently and

seamlessly. For example, respondents to the SME survey with more than 250 persons employed (no

longer within the scope of the SME survey) can be included in the dataset of respondents from the

LE survey

Additionally, considerable emphasis was placed on efficiently managing support tickets, regularly

updating Frequently Asked Questions (FAQs) to address technical issues and thematic inquiries.

The entire process not only ensured consistency in data treatment between the two surveys but

also resulted in improved data quality.

To evaluate the effectiveness of the new data collection system and its impact on data quality,

several indicators will be analysed. These indicators include accuracy, response rate, and the burden

placed on respondents. By assessing these metrics, Istat aims to determine the overall performance

of the revised data collection process and identify measures for further enhancing data quality.

Keywords: statistical data collection, monitoring data collection, data collection tools, data quality

improvement measures

The use of a multi-mode data collection system and of a web management system for the Italian Population Census: lessons learnt and future challenges - Novella Cecconi and Donatella Zindato (Istat, Italy)

Languages and translations
English

1

The use of a multi-mode data collection system and of a web management system for the Italian Population Census: lessons learnt and future challenges (working paper)

Novella Cecconi1, Donatella Zindato2

Draft

Abstract

The paper discusses the impact of the use of a mixed-mode data collection system on the organization of the Italian Population Census fieldwork, on dissemination timeliness and on data quality.

The multi-mode data collection system was first introduced on the occasion of the 2011 census in order to increase the spontaneous response rate. Mailed out paper questionnaires were collected by a multi-mode system, with web collection being introduced for the first time within a concurrent design. The need to manage information coming from different sources (on-line questionnaires, Post Offices monitoring system, MCCs and the information required to address the households’ list under coverage) required a web IT system constantly updated, enabling census staff to follow the status of every questionnaire over time and allowing directing enumerators only to non-responding households and potential under coverage addresses.

Thanks to the use of questionnaires mail out and of a multimode data collection system, the front-office staff was dramatically reduced in comparison to past censuses (about 40%) and a greater flexibility was allowed to respondents. Indeed, the so-called ‘spontaneous return’ went very well and web return was much higher than expected. On the contrary, the logistics of the mail out/mail back process proved very complex and with many points of failure, calling for a totally paperless census, implemented in the combined design of the Permanent Population and Housing Census (PPHC).

The evolution of the multi-mode data collection system and of the web management system are described from its first use for the 2011 census, through the first and second cycle of the PPHC. Critical issues and lessons learnt will be discussed, on which the continuous planning of the PPHC data collection is based.

1. The use of a mixed-mode data collection system in the 2011 Population Census

The mixed-mode data collection system was first introduced at Istat for the 2011 Population Census in order to reduce municipalities’ workload by increasing self-response rate. The planning of 2011 census had been based on the need to improve dissemination timeliness by at the same time reducing the statistical burden on respondents, enforcing Municipalities Census Offices (that in Italy are in charge of the census fieldwork) and containing costs.

The once "door-to-door" census thus became a register-supported census, implemented by means of questionnaires’ mail out to the households enrolled in the municipal population registers. Self-completed questionnaires were then collected by a mixed-mode data collection system allowing households to choose the way in which they preferred to complete and return the questionnaire, with web collection being introduced for the first time within a concurrent design, together with the possibility to return the paper questionnaire at any post office in Italy or to Municipal Collection Centres. A further mode i.e. the targeted recovery of non-response by enumerators was foreseen in the second phase of the enumeration, as concomitant to the ‘spontaneous’ ones which were still available until the end of the enumeration.

Thanks to the use of questionnaires mail out (instead of enumerators’ delivery) and of a multimode data collection system (where enumerators were just one of the possible return modes, and hierarchically the last one), the front office staff was dramatically reduced in comparison to past censuses (about 40%) and a greater flexibility was allowed to respondents. Indeed, the self-response or so-called ‘spontaneous return’

1 Istat, Data Collection Department 2 Istat, Population and Housing Census Division

2

went very well and web return was much higher than expected, accounting for one third of the total completed questionnaires. Indeed, although the communication campaign did not place particular emphasis on the web return mode (Istat's "preferred" return mode for obvious reasons of timeliness and data quality), and notwithstanding Italy being at the time the 4th country in Europe by number of people who had never had access to the Internet (37,2% of the population against the average 22,4%), around 8.5 million households chose to complete the online questionnaires. CAWI was therefore the most used channel (Figure 1), followed by the delivery of the questionnaire to the Municipal Collection Centres (31.7 percent). The percentage of households who chose to deliver the questionnaire to Post Offices was also significant (22.6 per cent), while only 12.3 per cent of the questionnaires were collected door to door by enumerators.

Figure 1 - Mixed mode data collection at the 2011 Population Census (percentage values)

If the decision to include the CAWI mode among the possible return options proved to be successful, more generally the flexibility allowed by the adoption of a multi-mode data collection system seemed to be the winning choice. In fact, the percentage of self-response altogether was very high, equal to 82.7 per cent3.

Quite different trends over time could be observed by mode. The absolute peak of CAWI occurred in the first week, while another peak was recorded in the last week of the spontaneous return phase, when the peak for the MCCs was also recorded. A more regular trend over time can be observed for questionnaires recovered by enumerators, all among the second phase until the end of December, to become predominant among the modes used in the first few months of 2012, when (as envisaged by the census plan) the enumeration was still ongoing only in the largest Municipalities.

More generally, the following elements emerged by the analysis of the time trends by mode:

 the Cawi was mainly used at the beginning. Those who chose to reply via the internet did it in the aftermath of the arrival of the letter from Istat;

 the delivery of questionnaires at MCCs followed a similar time trend to that of CAWI return, therefore it appeared as the "spontaneous" alternative for those who did not have the possibility to use CAWI (i.e.

3 The approximately 4.2 million questionnaires (equal to 17.3 per cent of the total returned questionnaires) counted as "not returned spontaneously" included: questionnaires delivered by enumerators (i.e. questionnaires which, for different reasons, was not possible to mail out); questionnaires for which at least one attempt of contact by the enumerator was recorded in the monitoring system; questionnaires actually collected by the enumerators.

33,4 31,8

22,6

12,2

0

5

10

15

20

25

30

35

40

CAWI Municipal Collection Centres Post Office Enumerators

3

didn’t have an internet connection or were not familiar enough with internet use as to choose to fill in the electronic questionnaire);

 the analysis of the trend of the questionnaires collected by enumerators showed that many Municipalities has chosen to send them to the field before the official start of the non-response follow- up phase.

Figure 2 - Share of questionnaires by return mode (Municipal Collection Centres, Post Offices, Enumerators, CAWI)

As expected on the basis of the data concerning households ICT access/usage in the different areas of the country, a great regional variability could be observed in the breakdown by return mode (see figure 2). In general, households living in the South and in the Centre of Italy favoured using the web to complete the questionnaire (chosen, respectively, by 38.3 per cent and 32.7 per cent of all households), while in the regions of the North, the preferred mode was the delivery at the MCCs (35.7 percent). More precisely, the percentage

4

of CAWI ranged from the 53.9 per cent in Trentino-Alto Adige4 to the 24.5 per cent in Tuscany5. Vice versa, Tuscany and Trentino-Alto Adige were the regions with, respectively, the highest (31.3 per cent) and the lowest (7.1 per cent) percentage of questionnaires delivered to Post Offices. Valle d'Aosta was the region with the highest percentage of questionnaires delivered to the MCCs and Lazio the one with the lowest percentage (18.3 per cent). Finally, Calabria was the region with the highest share of questionnaires collected by the enumerators (24.4 per cent) and Lombardy the one with the lowest share (6.1 per cent).

A significant role was also played by the municipality size (see Figure 3). The web was the preferred mode by households living in small municipalities (37 per cent), especially in the South (46 per cent), while households living in medium-sized and small-medium municipalities mostly opted for returning the questionnaire to Municipal Collection Centres (respectively, 40.2 per cent of households living in municipalities between 5,000 and 20,000 inhabitants and 35.1 per cent of households living in municipalities between 20,000 and 50,000 inhabitants), with higher percentages in northern municipalities (respectively 44, 5 percent and 37.9 percent). Finally, in the big cities, the delivery of the questionnaire to Post Offices was by far the preferred option (41.6 per cent of households living in municipalities with at least 100,000 inhabitants), probably due to their widespread territorial distribution which made them the most "sustainable" mode in the largest municipalities.

However, the remarkable differences in the territorial distribution by return mode have also to be explained by taking into account the strong influence of the field-work organization put in place by the different Municipal Census Offices (MCOs), since they had a large autonomy in promoting one or the other of the return modes, as completed questionnaires were payed differently according to the return mode6.

As far as the "choice" of the return mode is concerned, the difference between Italian and foreign households also appears to be relevant (with less than 24 per cent of households whose members are all foreigners completing the questionnaire on the web, while the share of questionnaires collected by enumerators is much higher than that recorded for Italian households - 28.7 per cent versus 11.1 per cent - most likely precisely because of linguistic difficulties)7.

Analysing the breakdown by return mode and presence of elderly household members, a higher use of MCCs can be noted by multi-person households whose members are all elderly (36,3 per cent) and a lesser use of the ‘enumerator mode’ (8 per cent).

Despite the higher propensity of youngsters to use the internet, a share of CAWI returns just 3 percent higher than the average is recorded for households with at least one member aged below 30, probably because completing the census questionnaire is considered a task reserved to adults.

Instead, the presence of at least one high educational qualification in the household seems to have an impact on the choice of the CAWI mode (chosen in the 39 per cent of the cases). The difference is even more relevant in the North (37.1 per cent against 28.9 per cent), where the overall percentage of households who chose the CAWI mode was lower than the national average.

4 In this region a different enumeration strategy was adopted, i.e. paper questionnaires were only available on request to the enumerator. 5 The regions where Cawi share was higher than the national average were, in order, Sardinia (44.9 per cent), Molise (41.3 per cent), Puglia (40.6 per cent), Campania (40.5 percent), Calabria (39.9 percent), Friuli-Venezia Giulia (39.2 percent), Abruzzo (38.4 percent), Lazio (38.1 percent), Marche (35.7 percent) and Basilicata (34.2 percent). 6 Questionnaires were paid differently according to the return mode: questionnaires returned to MCCs or to enumerators were paid

6 euros while internet questionnaires were paid 4 euros. In order to encourage promoting the CAWI mode by MCOs, Istat would pay 5,50 euros instead of 4 per each questionnaire returned via the internet if the overall CAWI percentage in the municipality would be above the 25% of the completed questionnaires. Therefore, in some small municipalities respondents who chose to return the questionnaire to the MCC were invited to fill in the online questionnaire, with the assistance of MCCs operators. 7 Despite specific actions targeting foreign citizens usually resident in Italy (Gallo et al., 2014)and the availability of translations of facsimiles of the questionnaire in 17 foreign languages, the insufficient linguistic mastery represented an obstacle to the autonomous compilation (and therefore to the spontaneous return) of the questionnaire. In fact, for logistical reasons, the electronic

questionnaire could only be completed in Italian (or in German or Slovenian, languages of the linguistic minorities protected by the

law).

5

These preliminary evidence was confirmed also by the analysis at the micro level, carried out to identify the household profiles with a greater propensity for web responses8.

Figure 3 - Breakdown by return mode and municipality size

2. Response management in the 2011 census

The need to manage information coming from different sources (on-line questionnaires, Post Offices monitoring system, MCCs and the information required to address the households’ list under coverage) required a web IT system (SGR, according to the Italian acronym) constantly updated, enabling census staff to follow the status of every questionnaire over time and allowing directing enumerators only to non- responding households and potential under coverage addresses.

Indeed, the management of a modular and flexible strategy represented a big challenge. On one hand it helped solving problems that traditionally had a great impact on census process, negatively affecting the timeliness of data dissemination. On the other hand, the introduction of this new strategy implied a higher level of complexity and a multiplication of risk factors9.

Such a web management system was in fact crucial to the performing and success of the entire census, being a complete instrument that guided and supported census operators during all the survey phases. It was designed to provide the different users of the system with: (i) up-to-date information at different aggregation levels, including single questionnaire level; (ii) a tool for cooperative working, guided through a forced workflow of questionnaire life-cycle.

Accessible online to all of the different levels of census staff, it enabled the status of every individual questionnaire to be followed in almost real time, thus allowing the targeted recovery of missing questionnaires. The availability of constantly updated information on the status of each questionnaire

8 See Zindato, 2017. 9 See Benassi et. al., 2013.

6

enabled enumerators to be directed only to households to which the questionnaire had been sent but not yet returned.

Furthermore, it was designed to automate back-office work and to guarantee flexibility to fieldwork organization within each Municipal Census Office. Municipal Census Offices managers had to assign an organisational role and a system profile to every user and allocate enumeration areas to enumerators. Each census office could thus freely decide how to distribute work in terms of assignment of enumeration areas to enumerators and back office work to operators. A hierarchical organisation could also be defined by setting dependency relationships between staff with a coordinator role and other staff and of enumerators to co-ordinators.

Finally, being as well a monitoring system, SGR also allowed to produce census progress reports. The census web based management and monitoring system was part of a general strategy aiming at minimising errors, reducing organizational workload and holding down costs.

3. Towards a totally paperless census: the 15th Population Census Pilot Survey

As already mentioned, the need of reducing municipalities’ workload and the burden on respondents had called for the use of new data collection techniques and new territorial instruments meant to improve coverage and quality of the enumeration. However, the innovations designed for 2011 Italian census were not sufficient to achieve a stable and enduring balance between census costs and benefits. In fact, costs remained high and too concentrated in time, while the use of administrative data was not up to the potential offered by the Italian context. Moreover, the supply of highly detailed geographic data remained only decennial so census data continued becoming quickly outdated.

Furthermore, the logistics of the mail out/mail back process (managed in outsourcing) proved very complex and with many points of failure, calling for a totally paperless census. Among the main problems have to be mentioned the long times necessary for moving around and the huge spaces needed for the storage of tens of millions of paper questionnaires, the high number of addresses from the municipal population registers which had not been recognised, thus making it necessary to deliver about 2 million questionnaires via enumerators. Furthermore, the monitoring system was not updated in real time on the questionnaires delivered to the Post Offices (which amounted to almost a quarter of the total completed questionnaires); as a result, the management of the fieldwork was problematic, as enumerators would be sent to collect questionnaires already returned.

These critical issues added up to the ones above mentioned concerning the ever least sustainability of the traditional census (intended as universal and simultaneous field-enumeration). For all of these reasons, the development of a completely different approach seemed necessary, based on a sequence of operations and surveys designed ad hoc in such a way to build a complete information system producing specified census output results at given times. The new census strategy would be “rolling” (later it would become the Permanent Census) and join a greater use of administrative sources with sample surveys rotating through a multi-year period of time, so to avoid big “one shot” activities and sunk costs. More precisely, two ad hoc sample surveys would be conducted annually:

 a C-sample short-form only survey designed as a statistical test on the entity of the coverage error of municipal population registers to determine the usual resident population

 a D-sample rolling survey designed to collect on the field information for variables non replaceable by administrative data, in order to produce the hypercube required by EU Regulation on Population and Housing Censuses.

A Pilot Survey has been conducted in 2015 in order to test the data collection techniques and the fieldwork organization to be adopted in the new Permanent Census. As to the data collection techniques, the main objective of the Pilot Survey was to test the sustainability of a totally paperless enumeration, i.e. a mixed- mode data collection strategy based only on the use of electronic questionnaires.

7

The C-sample was based on a capture-recapture methodology with the first capture being represented by the population registers and the second capture being conducted as a door-to-door enumeration with CAPI technique, while the D-sample would be conducted with a mixed-mode technique. More precisely, the C- sample survey was conducted by interviewers with hand-held devices such as tablet and laptop computers, which were used both for accessing enumeration areas maps and address lists, and as a means of data capture. On the other hand, the D-sample survey was conducted via a multimode data collection system where several paperless returning modes were concurrently offered to respondents: self-response by CAWI, the possibility of contacting a toll-free number and completing the questionnaire via telephone interview or to go to Municipal Survey Centres and fill out the questionnaire by using an internet station or ask for a CAPI interview. So, CAWI was again used within a concurrent design, in order to avoid the use of paper questionnaires (related to many critical issues of the 2011 strategy) and at the same time provide respondents with a choice of spontaneous return modes which would minimize recourse to enumerators (thus minimizing costs and municipalities’’ workload). Differently from the 2011 census, no paper questionnaire would be mailed out to households, which would receive only a letter with the questionnaire login details.

The main mode innovation with respect to the 2011 mixed-mode system concerned the possibility for the households to call the contact centre not only for information and for clarification but also to ask to be interviewed (so-called inbound CATI10). During the first phase, the households could complete the questionnaires by themselves, or go the Municipal Collection Centres or call the contact centre. During the second phase, the non-response follow up would be conducted by phone (as far as possible11) or in the field by enumerators Four different combinations of modes were tested, and some of these included also the non- response follow-up by enumerators, who would carry out the interview on the field by using a portable device (tablet or laptop.

This entailed the need for the Municipal Census Offices to guarantee assistance to the respondents at the MCCs for the entire duration of the survey and the need for the coordinators/enumerators/back office operators to be constantly updated through the monitoring system. It was therefore necessary to implement an integrated data acquisition and production process management system, within a Bring Your Own Device strategy. The management system was in fact the evolution of the management system developed for the 2011 Census.

The D-sample Pilot Survey was conducted on a sample of 148 municipalities including those where at the 2011 Census the lowest percentage of total self-response and of CAWI return had been recorded. As to the results, they were partially different from the 2011 ones. The CAWI was still the preferred mode but to a much higher extent accounting for almost half of the completed questionnaires (49.2 per cent). Of the total CAWI questionnaires, almost the 75% were self-completed by the households without any assistance, while the 17.7 per cent had requested the help of relatives or friends, and the 7.7 per cent of the Municipal Collection Centres.

The breakdown by mode was only partially comparable, as the Post Office and the inbound CATI mode were respectively present only in 2011 and in 2015 (see figure 4). However, a remarkable increase of the CAWI percentage can be noted (even when considering the increase in the use of the Internet by the households occurred since the Census), to be considered positively, especially considering the absence of the massive communication campaign put in place for the Census12. On the other hand, the percentage of questionnaires completed via the MCCs had decreased (though, given the fact that it was a sample survey, it was difficult for the municipalities to organize widespread centres on the territory). Finally, the share of questionnaires collected by enumerators had slightly grown (13.5 per cent versus 11.8 per cent).

10 The term CATI is used improperly, for the sake of brevity, with reference to the telephone interviews carried out by operators of the contact centre managed by Istat or by the MCCs operators of a few municipalities involved in the Pilot Survey. In fact, it was not implemented a specific CATI acquisition system but, in order to maximize the spontaneous return rate, operators were trained to carry out telephone interviews using the application developed for CAWI. 11 Only about the 20 per cent of landline phone numbers were available at Istat. 12 For the Pilot Survey only local communication events were foreseen, to be organized by the municipalities.

8

As for the new return mode (the inbound CATI), it was used to complete the 10 per cent of the total completed questionnaires. Focus groups were organized among the Istat personnel who had been providing the contact centre service, from which useful indications emerged both on the use of the toll-free number as a possible return mode and on the usability of the electronic questionnaire. In particular, the toll-free number seemed to be a viable return mode especially for the elderly and for those who were not familiar with new technologies (as it was confirmed also by the analysis performed on the characteristics of the respondents, which will be briefly reported further down in the text).

Figure 4 – Breakdown by return mode at 2011 Census, 2015 Pilot Survey, and at 2018 and 2022 PPHC waves

The territorial profile of the web respondents appeared significantly different from that observed in 2011. In fact, differently from the 2011 Census, during the Pilot Survey the greatest use of the CAWI mode was recorded in the municipalities of the North, and especially of the North-East (which chose CAWI in the 32.6 percent of cases, compared to a national average of 23.1 percent; while in the Island the percentage of households who used CAWI was just 17 percent). In general, the highest response rates via the web were recorded in the regions where the use of new technologies was more widespread13, even though the differences in the total response rate by municipality as well as the strategy (combination of modes) assigned to each municipality should be taken into account when evaluating the results.

Through a multivariate analysis (analysis of multiple correspondences and cluster analysis), it was possible to characterize the households who had used the toll-free number to fill in the questionnaire. The micro-level analysis of the characteristics of the households who had chosen the CAWI option ( instead of the CATI interview or of the MCC) confirmed to some extent the results of the 2011 analyses, especially with regard to education as a crucial factor for the internet choice. More generally, households showing a higher propensity to the use of the web mode for completing the questionnaire were those with at least 3 members, at least an elder, at least a member with a medium/high degree of education and a reference person normally using the web.

In summary, the results of the Pilot Survey did confirm the feasibility of the paperless choice and at the same time the need of a mixed-mode system offering a variety of possible return modes in order to reach different households profiles. In fact, the share of households choosing the CAWI option (both in an independent and supported way), albeit growing, did not allow foreseeing the CAWI as the sole return mode. Inter alia, the

13 See Eurostat, 2006.

33,4 31,7

22,6

12,3

49,2

29,0

13,5

10,0

49

13,8

30,8

1,5 3,7

1,2

48,2

8,9

0

10

20

30

40

50

60

CAWI MCCs Post Office Enumerator Inbound Cati

2011 2015 2018 2022

9

so-called inbound CATI i.e. the mode allowing households to call the toll free number and be interviewed, appeared as a viable alternative (if financially sustainable)14 for a variegated universe of households (mainly socio-economically disadvantaged households but also well-off families made up of both elderly and young people).

4. From the ‘door to door’ field enumeration to the Permanent Population Census

After the first Pilot Survey conducted in 2015 and a second one conducted in 2017, the rolling strategy of the new Population census was further tuned, in accordance with Istat modernization strategy, which places the integrated system of statistical registers at the core of statistical production. The role of field surveys in this system is to support registers, in the broad sense of assessing their quality and to add information that is missing, incomplete or of insufficient quality. The Population Census thus became the Permanent Population and Housing Census (PPHC), at the core of which is the Statistical Population Register (RBI according to the Italian acronym), whose main sources are the local population registers of Italian municipalities (administrative local registers). Together with the Register of Addresses and with the thematic registers on education and employment, RBI provides the basis to produce population census results while two sample surveys (Area survey and List survey) are conducted annually in self-representative municipalities and every 4 years, according to a rotation scheme, in smaller ones, to evaluate and correct the coverage errors of RBI and to collect data for variables not available (or only partially available) from the registers15.

The Area survey, still being conducted as a door-to-door enumeration with CAPI technique as the previous C-sample, is currently undergoing a redesign after the 2020 move to a fully register-based count estimation. No more used to measure and correct the Population register, it will be aimed at providing a measure of the error of such estimation (while at the beginning of the first cycle of the PPHC it was used within a capture- recapture model for direct estimates of the coverage errors of RBI). The redesign will concern both the survey methodology and sampling frame and the data collection techniques. The List survey, which represents the evolution of the D-sample survey, is based on a sample of households drawn from the population register and is conducted through a mixed-mode data collection system. The mixed mode system includes in the first phase (spontaneous response-only) self-response by CAWI, the possibility to go to the MCCs either to use an internet station or to be interviewed (CAPI) and the inbound CATI mode. The second phase includes the sme return options plus the non-response follow up by enumerators. The CAWI mode has seen a slight increase during the first cycle of the PPHC (2018-2021).

The management of such a complex and diverse enumeration strategy entails the need of a very flexible web management system, guiding and supporting census operators during all the survey phases, which has been developed as an evolution of the management system developed for the 2011 census, and generalized to become the management system of Istat social surveys.

Even though the breakdown by mode is only partially comparable with the past mixed-modes systems, we can observe that the share of households choosing to use the CAWI option is still close to half of the total responding households (see figure . We also note a further decreases the percentage of households going to the MCCs either for receiving assistance to complete the electronic questionnaire/use the MCC internet station or to receive a CAPI interview.

In Figure 4 the distribution by return mode at the 2011 census is compared with the ones at the 2015 Pilot Survey and at the first years of respectively the first (2018) and the second (2022) cycle. Altogether, if the CAWI share is more or less stable, is the self-response (i.e. independent response not needing a field follow- up) that shows a setback, therefore requiring a stronger effort of front-office fieldwork in order to keep the required high response rate (the share of questionnaires completed by CAPI interview performed by enumerators has therefore raised from the 12.3 per cent of the 2011 census to the 33,4 per cent of the 2022

14 Given the experimental purpose of the survey, the service was provided internally to Istat, by non-professional operators and only during working hours. Therefore its financial sustainability should be assessed in the case of a professional service, contracted in outsourcing for longer hours (as it should likely be in the case of the actual census). 15 A detailed description of the surveys is provided in Falorsi (2017).

10

List survey). Furthermore, if we look at the CAWI share along the four years series16 since the start of the PPHC (see figure 5), after slightly increasing during the first 3 years (but not as much as it could be expected in 2021 given the large use of digital technologies imposed by the pandemic), it clearly decreases in 2022. Furthermore, a relevant decrease can be observed as to the share of households going to MCCs.

Figure 5 - Breakdown by return mode and PPHC wave

RETURN MODE 2018 2019 2021 2022

CAWI 49,0 50,0 51,9 48,2

MCCs 13,8 13,0 12,1 8,9

INBOUND CATI17 1,5 1,3 2,3 2,8

CATI 3,7 2,8 7,6 6,8

ENUMERATOR 30,8 32,4 26,2 33,4

OTHER18 1,2 0,6

TOTAL 100,0 100,0 100,0 100,0

This trend reversal should be related to a number of factors among which the lack of a massive communication campaign as those traditionally put in place for traditional censuses. Even in the PPHC most resources were invested from a communication point of view in the launch of the new strategy in 2018, with progressively less funds available along the other years of the cycle as the census becomes one of many household sample surveys). Another major difference between the 2011 ‘traditional’ census and the yearly sample survey of the PHHC is related to the scope of the survey, involving only a sample of households in each municipality, thus not justifying for the Municipal Census Offices the organization of the widespread network of Municipal Collection Centres which played a quite significant role in order to reach the high share of autonomous response achieved in 2011. Finally, it should be mentioned the ever-growing respondents’ burden and respondents’ fatigue, which represent a challenge for official statistics, as keeping a high response rate is crucial to quality.

As to the territorial differences, a distribution by return mode similar to the one observed at the 2015 Pilot Survey has been recorded all along the PPHC 4-year series, with huge differences in the share of CAWI among Italy’s regions (see Figure 6). More precisely, the percentage of households having self-completed the questionnaire online ranges from being much above the national average (as high as the 62.5 per cent registered in Lombardy in 2022) to reaching a low of half the national average (with the 24.5% recorded in Calabria in the same year). Furthermore, it is worth noting that the distance between the two extremes is increasing (it was equal respectively to 60.8 per cent and 29.3 per cent in 2018). These data should also be analysed in the light of the total response rate, which, although still quite high, in some regions is decreasing more than in others. These dynamics should be further analysed at the micro level but some studies already performed at the micro level confirm education as having a crucial impact on the choice of the return mode, as long as variables such as the household composition by citizenship or the type of municipality where the household lives (according to the ‘inner area’ variable)19.

16 The break in the series is due to the Covid-19 pandemic and the subsequent withdrawal of all household surveys to be held in 2020. 17 In the PPHC this is not an official return mode i.e. it is not advertised as a possible return option, but in exceptional cases (households unable to access the CAWI and move to reach the MCCs) households are being interviewed on the phone. 18 ‘Other’ is referred to a small share of cases when, due to poor functioning of the devices or of the internet connection, enumerators were allowed/obliged to write down the answers on paper and later report them into an electronic questionnaire. 19 See Bussola M. et alii, 2023.

11

Figure 6 – Breakdown by region and return mode at the 2018 and 2022 PPHC waves

5. Lessons learned

The need for budget, timely and accurate census data along with the changes in technology and in society has guided the transition from the traditional ‘door-to-door’ census to the PPHC. This transformation reflects the extent to which digital data are changing the routines of production and use of statistics20.

The sustainability of a totally paperless strategy was tested through the experimental surveys conducted in 2015. Unsurprisingly, if the web spontaneous return rate was growing (accounting for almost half of the total completed questionnaires), it was still far from ensuring a successful enumeration if not complemented by other return modes. A similar mixed-mode system has been implemented for the List survey data collection, within a totally new census strategy based on the integration of administrative and survey data (the PPHC).

From a respondent’s point of view, the mixed-mode system is certainly welcome (and more and more expected) as it allows a greater flexibility, but it entails organizational challenges and needs a continuous technical support to the different field-work levels and an accurate training strategy, in order to reduce as much as possible non-sampling errors. In the context of the PPHC, the training organization plays a crucial role and has undergone dramatic changes due both to Istat modernization (and its transformation from the stovepipe to the matrix organizational model) and to the break-in of the digital technologies and to their potential (we refer e.g. to the use of distant learning versus face-to-face training).

The new census strategy allows a significant reduction of census costs, of respondents’ burden and of the organizational impact on municipalities (that are responsible for the census fieldwork) but to further achieve these goals, tailored communication strategies need to be put in place in order to raise the CAWI response rate.

The PPHC first cycle started in 2018 and ended in 2022 (with a suspension of the field surveys in 2020 due to

20 See Aragona and Zindato, 2016.

CAWI CAWI at

MCCs

CATI at MCCs INBOUND

CATI

CATI ENUMERATOR OTHER CAWI CAWI at MCCs CATI at MCCs INBOUND

CATI

CATI ENUMERATOR

Italia 49,0 2,4 11,5 1,5 3,7 30,8 1,2 48,2 0,5 8,4 2,8 6,8 33,4

Piemonte 55,0 4,4 14,6 2,0 4,3 19,1 0,8 53,7 1,0 11,8 3,1 8,5 21,9

Valle d'Aosta 52,7 3,4 9,0 0,7 5,4 28,5 0,3 51,4 1,9 7,2 1,9 8,4 29,2

Lombardia 60,8 2,5 12,6 1,4 3,6 18,3 0,8 62,5 0,5 11,3 3,1 7,8 14,8

Provincia autonoma di

Bolzano 53,7 6,4 12,6 0,9 2,4 23,2 0,8 58,0 1,7 8,4 1,2 5,1 25,6

Provincia autonoma di

Trento 58,9 1,2 5,0 0,6 1,2 32,8 0,2 53,2 0,5 4,5 1,4 5,4 35,0

Veneto 57,3 0,9 8,1 1,1 2,5 29,4 0,7 56,9 0,4 7,0 2,7 6,5 26,5

Friuli-Venezia Giulia 55,9 1,6 11,8 1,3 2,1 26,8 0,6 56,8 0,5 7,4 2,6 5,5 27,3

Liguria 54,1 2,6 10,0 1,6 3,9 26,5 1,3 51,5 0,6 7,6 3,7 6,7 29,9

Emilia-Romagna 54,2 1,1 9,4 1,7 3,6 27,9 2,1 54,5 0,3 6,1 4,1 8,7 26,3

Toscana 53,4 2,1 8,2 1,6 4,5 26,4 3,8 51,4 0,3 5,4 3,4 8,4 31,1

Umbria 52,1 3,0 12,7 2,3 6,1 22,2 1,6 51,8 0,3 6,6 4,3 10,4 26,5

Marche 51,6 1,8 13,0 1,7 3,5 27,3 1,0 49,7 0,4 9,0 3,8 7,6 29,5

Lazio 52,1 2,1 12,5 2,4 3,7 26,6 0,6 50,6 0,8 8,5 3,7 6,3 30,2

Abruzzo 46,4 1,9 10,6 1,1 3,5 35,7 0,8 44,1 0,8 9,4 1,8 5,6 38,2

Molise 40,5 2,8 11,1 0,7 2,9 41,5 0,4 34,9 0,4 6,1 1,5 5,6 51,4

Campania 34,3 2,1 15,6 1,5 3,8 42,0 0,6 34,0 0,5 9,3 1,7 4,1 50,3

Puglia 42,6 3,7 8,5 0,9 3,5 39,8 1,0 40,6 0,5 6,1 2,0 4,9 45,9

Basilicata 38,3 0,4 10,4 0,8 2,4 47,4 0,4 38,2 1,0 9,9 1,5 6,0 43,4

Calabria 29,3 3,0 12,2 1,2 3,7 49,3 1,3 24,5 0,7 7,9 1,8 5,1 59,9

Sicilia 32,7 3,0 12,7 1,7 4,4 44,6 0,9 28,3 0,2 7,9 2,3 5,9 55,3

Sardegna 41,8 1,3 15,0 1,2 4,1 35,7 0,8 39,3 0,4 9,3 2,2 4,9 43,9

Non-response REGION

Non-response follow upSpontaneous response Spontaneous response

20222018

12

the pandemic). The second cycle, started in 2022, shows still high response rates, but a slight decrease of the web spontaneous return rate, notwithstanding the expectations of further increase due to the ‘forced’ digitalisation undergone by many during the pandemic period and the boost of remote and smart working.

Analyses performed on the process data show peculiar territorial patterns in the choice of the different return modes and the impact of several individual variables such as education or citizenship of the household members. These findings will be further investigated in order to design adaptive survey strategies and to tackle different population targets with tailored communication campaigns.

If the specific CAWI response rate could benefit of such tailored strategies addressing the respondents’ segments potentially most prone to the use of digital technologies, different strategies could be put in place to enhance the overall spontaneous response rate. To this end, the feasibility of adopting the inbound CATI (as tested in the 2015 Pilot Survey) as a viable alternative return mode to the CAWI should be explored, as it could successfully reach some an important share of those left behind by the digital divide.

Finally, the need of designing the questionnaire in order to reduce the mode effect should be taken into account, as it has been designed based on the assumption of the self-administration (as it was, in fact, the case up to the 2011 Census), but due to the changes in the enumeration strategy is being more and more administered by enumerators.

References

Aragona, B., e D. Zindato. 2016. “Counting People in the Data Revolution Era: Challenges and Opportunities for Population Censuses.” International Sociological Review 26(3).

Benassi F., L. Cassata, G. Sindoni and D. Zindato. 2013. “Tales from the 2011 Italian Population Census. The use of a multi-mode data collection system: lessons learnt and future challenges”. Relazione presentata alla Conferenza: 5th Conference of the European Survey Research Association (ESRA). Ljubljana 15-19 July.

Benassi, F., Bruno, M., Giacummo, M., Silipo, M., Vaste, G., e D. Zindato. 2014. Managing census complexity through highly integrated web systems, Istat, Rivista di Statistica Ufficiale, n. 3. http://www.istat.it/it/files/2015/03/Art.3_Managing-census-complexity.pdf.

Bernardini, A., Chieppa, A., Cibella, N., Solari, F., Zindato D. 2022. Evolution of the Italian Permanent Population Census. Lessons learnt from the first cycle and the design of the Permanent Census beyond 2021, Twenty-fourth Meeting of the Group of Experts on Population and Housing Censuses, Geneva, Switzerland, 21-23 September 2022, https://unece.org/statistics/documents/2022/07/working-documents/evolution- italian-permanent-population-census.

Bussola M., Cecconi N., Donati E., Porciani L. 2023. Respondents and non respondents to population and housing census: some strategies for data collection design in the era of low response rate and high response

burden, paper presented at the conference LIX Riunione Scientifica SIEDS, NAPOLI, 25th May.

Cassata, L. and M.T. Tamburrano. 2011. The 15th Population Census Pilot Survey: how the register driven census changes the enumerators role in: S. Migani and M. Costa (Eds.) “Statistics in the 150 years from Italian unification”, Serie Ricerche n. 10, Università di Bologna, Dipartimento di Scienze Statistiche “Paolo Fortunati”, Bologna, http://amsacta.unibo.it/3202/1/Quaderni_2011_10_SIS2011_BookofShortPaper.pdf (April 2013).

Cecconi, N., e F. Cecconi. 2016. “Il profilo delle famiglie intervistate da Numero Verde. Un’analisi macro e micro dei questionari compilati.” Paper presented at the seminar “Verso il censimento permanente della popolazione: le rilevazioni sperimentali C-sample e D-sample 2015”, Rome, Istat, 26 January.

Dillman, D.A., Smyth, Jolene D., e Leah Melani Christian. 2009. Internet, Mail and Mixed-Mode Surveys: The Tailored Design Method, 3rd edition. Hoboken, NJ: John Wiley.

13

Eurostat. 2006–2016. Annual survey on ICT (Information and Communication Technologies) usage in households and by individuals. http://ec.europa.eu/eurostat/web/digital-economy- andsociety/publications/news.

Falorsi, S.: Census and Social Surveys Integrated System, 19th Meeting of the Group of Experts on Population and Housing Censuses, Geneva, Switzerland, 4-6 October (2017), https://unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.41/2017/Meeting-Geneva- Oct/WP23_ENG.pdf.

Gallo G., Paluzzi E., Benassi F., Ferrara R. 2014. “The 2011 Italian experience towards supported Census for measuring migration.” Economic Commission for Europe, Experience from 2010 round censuses for measuring migration, Working paper 7, 1–9.

Gallo, G. and Zindato, D. (2018). Annex H. Italy case study, in UNECE, Guidelines on the Use of Registers and Administrative Data for Population and Housing Censuses, Geneva, https://unece.org/guidelines -use- registers-and-administrative-data-population-and-housing-censuses-0.

Gallo, G. and Zindato, D. (2021). Italy: The combined use of survey and register data for the Italian Permanent Population Census count in UNECE,, Guidelines for Assessing the Quality of Administrative Sources for Use in Censuses (endorsed by the 69th plenary session of the Conference of European Statisticians), https://unece.org/statistics/publications/CensusAdminQuality.

Istat. 2009. A new strategy for the 2011 Italian Population Census. Product innovations and the compliance with CES Recommendations, UNECE/CES Group of Experts on Population and Housing Censuses, Twelth Meeting, Geneva, 28-30 October, http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.41/2009/5.e.pdf (April 2013).

Istat. 2012. Lessons learned from use of registers and geocoded databases in population and housing census, Sixtieth plenary session, Paris, 6-8 June, http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2012/22-IP_Italy.pdf (April 2013).

Tininini, L., e T. Virgillito. 2013. “The Design of the Online Questionnaire of the Italian Population Census.” NTTS - Conferences on New Techniques and Technologies for Statistics, Brussels, March 5–7.

Virgillito, A., e L. Tininini. 2012. “The Web-based Data Collection in the Italian Population and Housing Census.” Meeting on the Management of Statistical Information Systems, MSIS. http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.50/2012/18_Italy.pdf.

Zindato D (ed.), 2017, Dalla rilevazione ‘porta a porta’ alla rilevazione paperless: il Censimento della Popolazione, in Istat, L’utilizzo della tecnica CAWI nelle indagini su individui e famiglie, https://www.istat.it/it/files/2017/09/Lutilizzo-della-tecnica-Cawi.pdf.

Online recruitment on social media to reach and engage distrustful people - Simona Cafieri (Istat, Italy)

Languages and translations
English

Online recruitment on social media to reach and engage distrustful people

Istat | Directorate for Communication

Simona Cafieri

Outline

2 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

05

01

Register f Industrial Zonesion to vernments

02

03

04

New opportunities for survey research

New opportunities for survey research

How to enroll survey participants?

Results of an experimental survey

Social media recruitment compared to traditional

sampling approaches

Limits and opportunities of social media surveys

Background

3 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

In many countries NIS are

facing decreasing

response rates and

increasing survey costs.

Alternative sampling

and recruiting

approaches are

usually needed,

including non-

probability and

online sampling.

Data collection are more

complex when rare or hard-to

reach populations are to be

sampled and surveyed.

Because of the massive

popularity of online

social networks, data

about the users and their

communication offers

unprecedented

opportunities to examine

how human society functions at scale

Social media for research purposes

4 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

Register f Industrial Zonesion to vernments

 They represent a growing portion of the general population

 Allows the recruitment of rare and hard-to-reach populations

 Growing share of respondents participating via mobile devices

 Ads on s.m.platforms are rather inexpensive compared with ads

either elsewhere on the web

 Reducing the rate of dropouts between recruitment and actual

survey participation

 Large amount of meta information available on these platforms.

Twitter

5 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

Register f Industrial Zonesion to vernments  Is one of the social media platforms that social scientists rely on to

conduct research

 Allows access to its data via several API which allows qualitative and

quantitative research to be conducted with its members.

 With more than 400 million active monthly users that post 500 million tweets per

day is a huge database—both in number of users and amount of data—for

conducting large-scale studies of human behavior.

Facebook

UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, JUIN 13TH 2023

6

 Populations can be defined according to ' characteristics automatically assigned to a

user according to an algorithm, based on person’s interactions with the social

 Key demographic data ( gender, age, etc) can be used to define target populations.

 Is based on mutual relations (iconnected people are referred to as “friends”)

 Recruitment through Facebook facilitated diversity, with participants varying in

socioeconomic status, geographical location, educational attainment, and age

Experimental survey

7 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

 To assess the use of social media platforms as an

alternative recruitment tool for studying the hard-

to-reach (LGBT+) population, an experimental

survey was designed

 A team from Federico II University with an Istat

stagiaire and DiverCityNaples association initiated

an online convenience sample for which

participants were recruited via and

 The questionnaire was programmed using

 and its design was optimized for mobile devices

Experimental survey

8 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

 Recruitment using Social media by joining existing

community notice board groups (no-cost option).

 Enabled snowball sampling where users could like,

share, and circulate the social media post and

questionnaire link among others

 A recruiting campaign was launched on Juin 3, 2022

and closed on July 2, 2022.

 Ads were shown on and Timeline

 Ad sets were used to address different subgroups within

our target population

 Each ad was accompanied by a caption and a short text

informing the user about our survey and encouraging

them to take part in I and click on the questionnaire.

Survey target population and social media population

9 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

Experimental survey: results

10 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

The average number of

questionnaires for one

respondent

The average completion time for the

survey was 14 minutes.

77%

18%

5%

Smartphone Pc Tablet and other

Type of device used by respondentsSocial media as channels to reach respondents

Twitter Facebook

Survey results across the 30-day fieldwork period

11 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

 Facebook

provides

meta-

information

about the

performan

ce of a

campaign,

including,

the total

number of

individuals

reached

through an

ad on a

given day

Experimental survey

12 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

 In 2021 Istat carried out the survey on employment discrimination against LGBT+

people , addressed to all individuals living in Italy who, from the Municipal Registers

(LAC), on 1 January 2020 were in a civil union

 The survey was carried out using the CAWI-Computer Assisted Web Interviewing

technique and involved the self-compilation of an online questionnaire

 The sample size did not allow for regional comparisons and subgroup analyses, such

as the living conditions of “rainbow families”

 In order to learn about potential coverage error, key features of the composition of the social

media sample were compared with those of the ISTAT “ traditional “survey:

Experimental survey: results

13

LGBT+ survey

Social media

survey

ISTAT

survey

Age

18-34 72,2 14,7

35-49 19,9 41,7

50+ 7,9 43,6

Nationality

Italian 74,3 92,2

Other 25,7 7,8

Gender

Male 28,1 28,3

Female 59,3 40,8

Other 12,6 30,9

School

education

Low 10,2 34,8

Medium 34,1 34,4

high 55,7 29,2

Children in

household

Yes 25,4 7,8

No 74,6 91,2

Comparison of the

demographic composition

of both surveys:

 The social media

sample was much

younger

 The high average

educational level in the

sms

Outline

14 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

Register f Industrial Zonesion to vernments

• «More females»

• «Survey was aimed at recruiting rainbow families

Increase the use of administrative data sources  Feedback from the respondents

 The non probabilistic sample

biased.

•«Willingness of participants to be re-interviewed in the future or to take part in a panel study»

•HIgh rates of re-participation minimize the potential for nonresponse bias in the analyses of

survey data collected in subsequent waves.

Conclusions

15 UNECE EXPERT MEETING ON STATISTICAL DATA COLLECTION, Juin 13th 2023

New opportunities for survey research

How to enroll survey participants?

Results of an experimental survey

Social media recruitment compared

to traditional sampling approaches

Limits and opportunities of social media surveys

01

03

02

04

05

Traditional recruitment methods can be combined

with low-cost internet mediated recruitment methods

for a multi-modal recruitment strategy

Recruiting survey respondents via Twitter or Facebook

can offer a convenient and accessible approach

It’s essential to be aware of the potential biases and

limitations associated with this method.

.. Successful outcomes resulting from respondent-

focused strategies

Increased response rates and more inclusive data

sets through respondent involvement

[email protected]

Thank you!

The mobile response as determinant factor in mixed-device Cawi: the case of an Istat survey on students - Luciano Fanfoni, Sabrina Barcherini, Serena Liani and Fabio Massimo Rottino (Istat, Italy)

Languages and translations
English

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

UNECE Expert Meeting on Statistical Data Collection – Rethinking Data Collection

13 June 2023

Istat | Data Collection Directorate

Sabrina Barcherini

Istat | Information Technology Directorate

Limesurvey | Italian Community Leader

Luciano Fanfoni

Istat | Demographic Statistics and Population Census Directorate

Fabio Massimo Rottino

Istat | Data Collection Directorate

Serena Liani

Mobile devices as a resource for web surveys

2 THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Case study

o Survey on behaviors, attitudes and plans of people aged

between 11 and 18

o Self-completed web questionnaire

o LimeSurvey for designing a responsive questionnaire

o A link to the login page in advanced letters and reminders

o 100,000 survey units

Designing a responsive web questionnaire

3 THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Devices used by respondents

4

20,772 questionnaires (51.0%) 17,661 questionnaires (43.4%) 2,267questionnaires (5.6%)

40,700 completed questionnaires

Desktop Smartphone Tablet

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

913 partially compiled questionnaires

Failures to log in, multiple accesses and multiple devices

5 THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

24% of the internet

clients who visited the

login page (56,404)

did not reach the first

page of the

questionnaire

2,7% of the submitted

questionnaires

(40,700) was

completed with more

than one access

Only a few dozen

made the first attempt

with one device and

the last with another

type of device

Questionnaires' break off

6

Completed and not completed Not completed

Device Desktop or laptop 21,072 1.4%

Smartphone or tablet 20,541 3.0%

Citizenship Italian citizens 29,979 1.5%

Foreigner citizens 11,634 3.9%

Order of school Secondary lower school 15,110 2.6%

Secondary upper school (general) 14,580 1.5%

Secondary upper school (vocational) 11,923 2.6%

Total 41,613 2.2%

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Data quality check: the impact of data correction

7

All questions Only grid questions

Device Computer(desktop/laptop) 1.6% 2.1%

Mobile(smartphone/tablet) 2.1% 2.7%

Citizenship Italian citizens 1.1% 1.1%

Foreigner citizens 3.8% 5.7%

Order of school Secondary lower school 2.1% 3.1%

Secondary upper school (general) 1.2% 1.4%

Secondary upper school (vocational) 2.3% 2.8%

Total 1.8% 2.4%

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

The upcoming edition of the survey

8

o Simplified login credentials

o QR code in advanced letters and reminders

o Questionnaire in 8 other languages

o Close monitoring of the device used

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Improvements

Thank you!

LUCIANO FANFONI | [email protected]

SABRINA BARCHERINI | [email protected]

SERENA LIANI | [email protected]

FABIO MASSIMO ROTTINO | [email protected]

2023 UNECE Expert Meeting on Statistical Data Collection: 'Rethinking Data Collection' online

(12 - 14 June 2023)

Title:

The mobile response as determinant factor in mixed-device Cawi:

The case of an Istat survey on students

Authors:

Sabrina Barcherini (ISTAT - Data Collection Directorate)

Serena Liani (ISTAT - Data Collection Directorate)

Luciano Fanfoni (ISTAT - Information Technology Directorate)

Fabio Massimo Rottino (ISTAT - Demographic Statistics and Population Census Directorate)

Speaker:

Luciano Fanfoni

Extended abstract:

The widespread use of mobile devices has brought a change for web surveys, enabling access

to a wider respondent pool including children and teenagers. The case study discussed in this

contribution is about the Istat survey on behaviors, attitudes, and plans of people aged between

11 and 18. Due to the Covid-19 pandemic, the 2021 edition of this survey had to make relevant

changes in data collection process and questionnaire design compared to the previous edition.

Only the self-completed web questionnaire was used as survey mode. Respondents, or their

parents if they were minor, were sent advance and remind letters that included the login page

link and credentials.

When designing the questionnaire, we took into account the possibility of access and completion

with mobile devices. It was important to keep the questionnaire short, and simplify and reduce

the questions’ wording. The questionnaire consisted only of five sections each of which had a

dozen of questions and some branches; the completion time was about 15 minutes.

In addition, we took care of the display of questions on mobile devices. We used the LimeSurvey

open-source software (Community Edition installed on Istat web server) that allows designing a

responsive questionnaire and it is therefore useful for adapting questions to mobile devices. For

example, it allows grid questions, displayed horizontally, to be transformed into single questions,

displayed vertically on mobile devices to improve the usability.

Out of a sample of 100,000 survey units, 40,700 questionnaires were collected; among these,

51% were accessed using desktop or laptop, 43.4% through smartphones, and only 5.6%

through tablets.

Respondents seem to have encountered quite a challenge at the login page: 24% of the internet

clients who visited the login page (56,404) did not reach the first page of the questionnaire.

Furthermore, around 1,000 respondents who submitted the questionnaire completed it after

making more than one access attempt. Only a few dozen of respondents started to answer with

one device and finished with another one.

We analyzed the data to see if there was an association between questionnaire’s breakoff and

device used. We found a low break off rate (2.2%), with a higher propensity among those who

use mobile devices (3.0%). However, this effect is relatively lower when compared to the data

from the subgroup of foreign respondents and close to the data of those who attend a secondary

lower school or a secondary upper school.

To assess the quality of the responses, we analyzed the impact of data checks with deterministic

and probabilistic imputation. By comparing the initial and final datasets, the checks' impact was

calculated for each questionnaire as the ratio of the number of cells that changed after the

imputation to the total number of cells. For mobile device respondents the percentage of

imputation is higher than desktop or laptop ones (2.1% vs 1.6%). This gap is more significant

among foreign students (3.6%), while is relatively consistent across different school orders.

Additionally, lower data quality was noteworthy in grid questions across all analyzed groups.

The next edition, scheduled for autumn 2023, is currently being designed. Despite the end of

the Covid-19 pandemic and based on the positive results achieved, both the survey design and

the LimeSurvey software will remain the same. Some improvements are planned to ease the

questionnaire access and completion, regardless of the device used. The login credentials for

accessing the questionnaire will be simplified and a QR code will be included in the advanced

letter for direct access to the questionnaire without the need to manually enter username and

password. The questionnaire will be translated into 9 languages. Furthermore, there will be more

thorough monitoring of the respondent's device usage. The aim is to enhance the overall user

experience and ensure a smoother and easier data collection process.