Skip to main content

Italy

The mobile response as determinant factor in mixed-device Cawi: the case of an Istat survey on students - Luciano Fanfoni, Sabrina Barcherini, Serena Liani and Fabio Massimo Rottino (Istat, Italy)

Languages and translations
English

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

UNECE Expert Meeting on Statistical Data Collection – Rethinking Data Collection

13 June 2023

Istat | Data Collection Directorate

Sabrina Barcherini

Istat | Information Technology Directorate

Limesurvey | Italian Community Leader

Luciano Fanfoni

Istat | Demographic Statistics and Population Census Directorate

Fabio Massimo Rottino

Istat | Data Collection Directorate

Serena Liani

Mobile devices as a resource for web surveys

2 THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Case study

o Survey on behaviors, attitudes and plans of people aged

between 11 and 18

o Self-completed web questionnaire

o LimeSurvey for designing a responsive questionnaire

o A link to the login page in advanced letters and reminders

o 100,000 survey units

Designing a responsive web questionnaire

3 THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Devices used by respondents

4

20,772 questionnaires (51.0%) 17,661 questionnaires (43.4%) 2,267questionnaires (5.6%)

40,700 completed questionnaires

Desktop Smartphone Tablet

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

913 partially compiled questionnaires

Failures to log in, multiple accesses and multiple devices

5 THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

24% of the internet

clients who visited the

login page (56,404)

did not reach the first

page of the

questionnaire

2,7% of the submitted

questionnaires

(40,700) was

completed with more

than one access

Only a few dozen

made the first attempt

with one device and

the last with another

type of device

Questionnaires' break off

6

Completed and not completed Not completed

Device Desktop or laptop 21,072 1.4%

Smartphone or tablet 20,541 3.0%

Citizenship Italian citizens 29,979 1.5%

Foreigner citizens 11,634 3.9%

Order of school Secondary lower school 15,110 2.6%

Secondary upper school (general) 14,580 1.5%

Secondary upper school (vocational) 11,923 2.6%

Total 41,613 2.2%

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Data quality check: the impact of data correction

7

All questions Only grid questions

Device Computer(desktop/laptop) 1.6% 2.1%

Mobile(smartphone/tablet) 2.1% 2.7%

Citizenship Italian citizens 1.1% 1.1%

Foreigner citizens 3.8% 5.7%

Order of school Secondary lower school 2.1% 3.1%

Secondary upper school (general) 1.2% 1.4%

Secondary upper school (vocational) 2.3% 2.8%

Total 1.8% 2.4%

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

The upcoming edition of the survey

8

o Simplified login credentials

o QR code in advanced letters and reminders

o Questionnaire in 8 other languages

o Close monitoring of the device used

THE MOBILE RESPONSE AS DETERMINANT FACTOR IN MIXED-DEVICE CAWI: THE CASE OF AN ISTAT SURVEY ON STUDENTS

Improvements

Thank you!

LUCIANO FANFONI | [email protected]

SABRINA BARCHERINI | [email protected]

SERENA LIANI | [email protected]

FABIO MASSIMO ROTTINO | [email protected]

2023 UNECE Expert Meeting on Statistical Data Collection: 'Rethinking Data Collection' online

(12 - 14 June 2023)

Title:

The mobile response as determinant factor in mixed-device Cawi:

The case of an Istat survey on students

Authors:

Sabrina Barcherini (ISTAT - Data Collection Directorate)

Serena Liani (ISTAT - Data Collection Directorate)

Luciano Fanfoni (ISTAT - Information Technology Directorate)

Fabio Massimo Rottino (ISTAT - Demographic Statistics and Population Census Directorate)

Speaker:

Luciano Fanfoni

Extended abstract:

The widespread use of mobile devices has brought a change for web surveys, enabling access

to a wider respondent pool including children and teenagers. The case study discussed in this

contribution is about the Istat survey on behaviors, attitudes, and plans of people aged between

11 and 18. Due to the Covid-19 pandemic, the 2021 edition of this survey had to make relevant

changes in data collection process and questionnaire design compared to the previous edition.

Only the self-completed web questionnaire was used as survey mode. Respondents, or their

parents if they were minor, were sent advance and remind letters that included the login page

link and credentials.

When designing the questionnaire, we took into account the possibility of access and completion

with mobile devices. It was important to keep the questionnaire short, and simplify and reduce

the questions’ wording. The questionnaire consisted only of five sections each of which had a

dozen of questions and some branches; the completion time was about 15 minutes.

In addition, we took care of the display of questions on mobile devices. We used the LimeSurvey

open-source software (Community Edition installed on Istat web server) that allows designing a

responsive questionnaire and it is therefore useful for adapting questions to mobile devices. For

example, it allows grid questions, displayed horizontally, to be transformed into single questions,

displayed vertically on mobile devices to improve the usability.

Out of a sample of 100,000 survey units, 40,700 questionnaires were collected; among these,

51% were accessed using desktop or laptop, 43.4% through smartphones, and only 5.6%

through tablets.

Respondents seem to have encountered quite a challenge at the login page: 24% of the internet

clients who visited the login page (56,404) did not reach the first page of the questionnaire.

Furthermore, around 1,000 respondents who submitted the questionnaire completed it after

making more than one access attempt. Only a few dozen of respondents started to answer with

one device and finished with another one.

We analyzed the data to see if there was an association between questionnaire’s breakoff and

device used. We found a low break off rate (2.2%), with a higher propensity among those who

use mobile devices (3.0%). However, this effect is relatively lower when compared to the data

from the subgroup of foreign respondents and close to the data of those who attend a secondary

lower school or a secondary upper school.

To assess the quality of the responses, we analyzed the impact of data checks with deterministic

and probabilistic imputation. By comparing the initial and final datasets, the checks' impact was

calculated for each questionnaire as the ratio of the number of cells that changed after the

imputation to the total number of cells. For mobile device respondents the percentage of

imputation is higher than desktop or laptop ones (2.1% vs 1.6%). This gap is more significant

among foreign students (3.6%), while is relatively consistent across different school orders.

Additionally, lower data quality was noteworthy in grid questions across all analyzed groups.

The next edition, scheduled for autumn 2023, is currently being designed. Despite the end of

the Covid-19 pandemic and based on the positive results achieved, both the survey design and

the LimeSurvey software will remain the same. Some improvements are planned to ease the

questionnaire access and completion, regardless of the device used. The login credentials for

accessing the questionnaire will be simplified and a QR code will be included in the advanced

letter for direct access to the questionnaire without the need to manually enter username and

password. The questionnaire will be translated into 9 languages. Furthermore, there will be more

thorough monitoring of the respondent's device usage. The aim is to enhance the overall user

experience and ensure a smoother and easier data collection process.

Knowing by collecting data: A circular process. The case of the Istat’s Surveys on Antiviolence centers and Shelters for women victims of violence - Francesco Gosetti, Alessandra Battisti, Federica Pellizzaro and Lucilla Scarnicchia (Istat, Italy)

Languages and translations
English

Francesco Gosetti, Alessandra Battisti, Federica Pellizzaro, Lucilla Scarnicchia

Istat

Knowing by collecting data: A circular process.

The case of the Istat’s Surveys on Antiviolence centers and Shelters for women victims of violence.

Collecting data on violence against women, given the sensible and multi-faceted nature of the phenomenon

is not an easy task. Istat, in collaboration with the National Department for Equal Opportunities, developed

an integrated informative system, which adopts a multi-source approach to collect, analyze and publish data

from different sources, including administrative ones.

Within this framework and in line with the 3P Model (Prevention, Protection and Prosecution) of the Istanbul

Convention, a pivotal role is played by data underlining the services offered by specialized services for women

victims of violence, such as antiviolence centers and shelters. These data do not solely highlight the

“protection side” of the phenomenon. Rather, they have a valuable informative capacity also in terms of

exploring the characteristics of the phenomenon. To this aim, Istat, together with representatives of

Antiviolence Centers and Shelters, Associations and national and regional Institutions, implemented a set of

dedicated surveys. This paper aims to highlight the main procedural and methodological steps that have led

to the surveys design and the data collection management. In this regard, a particular focus will be devoted

to the role of the continuous collaboration and communication with respondent to update content-related

and methodological issues of the surveys.

Simplifying the respondent’s task: the effectiveness of the questionnaire usability optimization in the Permanent Census of Population and Housing - Barbara Lorè, Sabrina Barcherini, Katia Bontempi, Manuela Bussola and Simona Rosati (Istat, Italy)

Languages and translations
English

Evaluating computer-assisted questionnaire usability: the case of Permanent Census of Population and Housing

UNECE Expert Meeting on Statistical Data Collection

12 – 14 June 2023

Istat | Directorate for Data Collection

Barbara Lorè, Sabrina Barcherini, Katia Bontempi, Manuela Bussola, Simona Rosati

The case study: the Permanent census of population and housing

2

SAMPLE

Since 2018 the Census no longer involves all the Italian households, but

only a sample every year: about 1,400,000 resident households in 2,800

Italian municipalities

DATA COLLECTION DESIGN

A sequential mixed-mode data collection design, starting with CAWI

and including CAPI after a month. CAWI remains available until the

end

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

CA QUESTIONNAIRE

The questionnaire consists of:

• the list of the household members

• individual forms to collect information on each member

• a familiar section to collect the characteristics of the dwelling

Heads of the municipal census offices debriefing (year 2018)

3

HMCO feedback (% value)

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

2503 Heads of the Municipal Census

Offices (HMCO) reported at least one

difficulty on questionnaire usability from

fieldwork staff (88.9%) and from

households (79.5%)

Almost 40% of the difficulties were

related to the final submission of the

questionnaire, and around 10% were

about the transition from one part to

another

41.1

9.7

38.1

13.3

Final submission Transition from one part to

another

Difficulties from fieldwork staff

Difficulties from households

Help Desk Tickets (year 2019)

4

2,276 out of approximately

48,000 tickets, were about the

functionality and usability of the CA

questionnaire:

3/4 were about the final submission

1/4 concerned other issues:

o access to individual forms

o filling in the members list

o partial data saving Final

submission

76.7%

Accesso to

individual form

13.2%

Filling of the

member list

5.7%

Partial data saving

4.4%

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

Usability improvements: simplified questionnaire navigation

5

2019

2021

The navigation menu

guides the

completion.

The color changes as

the completion

progresses and new

sections are activated.

The gray line

gradually turns blue A dedicated button for the

submission

GUIDE TO

COMPLETING

THE SURVEY

FAMILY LIST AND

INDIVIDUAL SURVEY

HOUSING UNIT FINAL

INFORMATION

SUMMARY AND

SUBMISSION

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

2019

2021

Buttons have been graphically

standardized and made more

intuitive

BACK SAVE CONTINUE

6

Usability improvements: guided completion (final submission)

2019

2021 Lighter graphic

Standardized button

Optimization of

redundant text

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

CONTINUE

7

Usability improvements: guided completion (access to individual forms)

2019

2021

Simplified access to the

individual forms

Simplified operations to

add or delete family

members forms

Less buttons and more

intuitive labels

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

8

Fieldwork staff debriefing and help desk tickets analysys (year 2021)

Access to the preview of the questionnaire

98.5%

Warning prompts

96.8%

Questionnaire navigation 95.3%

Tooltip use

98.0%

Filling in the family members

list

98.4%

Access to the individual forms

98.5%

Navigation menu

97.3%

Final submission 97.9%

Visualization of the sections

summary

98.6%

Almost all fieldworkers (8207 respondents) reporeted no

difficulties with any of the usability aspects of the CA

questionnaire

Only 500 out of

the 100.000 tickets

collected by the

CC were related

to completion

difficulties

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

Respondent feedback questionnaire (years 2019, 2021)

9

Difficulties reported in 2021Main results 2019 vs 2021

The percentage of households responding in

CAWI has increased from 51.4 % in 2019 to

53.1% in 2021

from 44.5% to 45.7% if elderly

The percentage of households in which the

reference person or another person

belonging to the household completes the

questionnaire has increased from 86.9% in

2019 to 87.7% in 2021

from 57.9% to 60.5% if elderly

The percentage of households not needing

any help at all (not from friends or relatives,

not from help desk, not from Municipal Census

Offices, etc.) has risen from 78.2% in 2019 to

79.9% in 2021

from 50.0% to 52.0% if elderly

Difficulty with the final submission

8.6%

Difficulty with the navigation

21.7%

Difficulty with some of the questions

24.4%

Difficulty to access the questionnaire

42.1%

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

10

Next steps

From 19.1%

in 2019

to

23.3% in

2021

In the future...

Questionnaire redesign for

smartphone completion

Usability testing - mouse tracking

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

  • Diapositiva 1: Evaluating computer-assisted questionnaire usability: the case of Permanent Census of Population and Housing
  • Diapositiva 2: The case study: the Permanent census of population and housing
  • Diapositiva 3: Heads of the municipal census offices debriefing (year 2018)
  • Diapositiva 4: Help Desk Tickets (year 2019)
  • Diapositiva 5: Usability improvements: simplified questionnaire navigation
  • Diapositiva 6: Usability improvements: guided completion (final submission)
  • Diapositiva 7: Usability improvements: guided completion (access to individual forms)
  • Diapositiva 8: Fieldwork staff debriefing and help desk tickets analysys (year 2021)
  • Diapositiva 9: Respondent feedback questionnaire (years 2019, 2021)
  • Diapositiva 10: Next steps
  • Diapositiva 11: Thanks for listening!

Respondents and non-respondents to population and housing census: some strategies for data collection design in the era of low response rate and high response burden - Linda Porciani, Manuale Bussola, Novella Cecconi and Elena Donati (Istat, Italy)

Languages and translations
English

RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS. SOME STRATEGIES FOR DATA COLLECTION DESIGN IN THE ERA OF

LOW RESPONSE RATE AND HIGH RESPONSE BURDEN.

EXPERT MEETING ON STATISTICAL

DATA COLLECTION – RETHINKING

DATA COLLECTION

12-14 June 2023, online

Istat | Direzione centrale per la raccolta dati

Manuela Bussola, Novella Cecconi, Elena Donati, Linda Porciani

The Permanent Population and Housing Census (PPHC) in Italy

2 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

AIMS OF THE ANALYSIS

Finding some of the reasons behind the decrease of (CAWI) response rate

through the description of the profile of respondents / non respondents

to acquire elements for designing adaptive organizational and communication strategies

to respondents

in order to increase the response rate, especially web response rate

1

2

3

4

OS

QUALITY

CRITERIA

1

3

24

3 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

Why a decision tree model?

The method allows to have a meaningful classification through homogenous predictors starting from a

population very ethereogenous, such as the PPHC sampled households.

How the decision tree model was applied?

1. PPHC sampled households was divided into three macro aggregation (sampled, respondents and non

respondents)

2. Each subpopulation is the object of a specific decision tree model

3. The classification alghorythm is CHi-squared Automatic Interaction Detection (CHAID)

CHAID is a multiple segmentation tecniques based on Χ2 test.

A contingency table (Χ2 and p-value) was calculated for each explicative variable (by modalities) and dependent

variable . The comparison between the minimum p-value (p-min) and the stop value (α, that is the maximum

dimension of the tree or the maximum number of levels or the minimum frequency in a node) of a specific

attribute Xi could be:

- pmin < α; Xi is an included modality;

- pmin > α; Xi is an attribute of a leaf.

A Decision tree model for the profile of respondents/non respondents

4 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

HOUSEHOLD

Socio-demographic characteristics

TERRITORY

Features of living places of households

FIELD WORK ORGANIZATION

Internal organization of territorial census offices (Municipality)

C A

TE G

O R

IE S

Sampled households ⃰

(n. 939.588)

Respondents households

(n. 855.295)

Non Respondents households (n. 84.293)

A Decision tree model for the profile of respondents/non respondents

* Excluding off-target households (death households,

moving households….)

POPOLATION AND DIMENSION OF ANALYSIS

5 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

CitIzenship

Education level (the highest)

Generation# Members

Profession (more skilled

member)

HOUSEHOLD

Territorial aggregation

Region

Province

No. inhabitants

Altitude

area

Urbanization degree

Inner Areas

TERRITORY

Statistical Office

Field Workload

Participation to Census (once a year/yearly)

Traineed interviewers

FIELD WORK

ORGANIZATION

Other cross variables: Internet use | Institutions trustness | efficiency of postal solicits

SELECTED

VARIABLESHousehold by generation

Household composition by

citizenship

A Decision tree model for the profile of respondents/non respondents

PRELIMINARY ANALYSIS

OF VARIABLES

Designed by PoweredTemplate

ITALIAN +

NON ITALIAN

ITALIAN

HOUSEHOLD

NON ITALIAN

HOUSEHOLD

Yes 91%

No 9% Respondents (%)

(91,7%)

(4,9%)

(3,4%)

92,5

7,5

75,1

24,9

74,4

25.6

THE MOST SIGNIFICANT

VARIABLE

HOUSEHOLD BY

CITIZENSHIP

The tree model of sampled households

Designed by PoweredTemplate

MEDIUM EDUCATION LEVEL

HIGH

EDUCATION

LEVEL

LOW EDUCATION

LEVEL

CATI 9,6%

CAPI 41,7% Data collection tecniques

(24,5%)

(33,1%)

(42,3%)

64,5

7,7

CAWI 48,7%

27,8

51,0

9,6

39,4

34,0

11,0

55,0

C

A

W

I

C

A

P

I

The tree model of respondents

THE MOST SIGNIFICANT

VARIABLE

EDUCATION LEVEL

C

A

W

I

C

A

P

I

C

A

W

I

C

A

P

I

Designed by PoweredTemplate

BELT +

INTERM.

SINGLE

REMOTE +

ULTRAREMOTE

NO 66,7%

Contacted

by interviewer

(50,3%)

(6,8%)

(40,2%)

16,4

YES 23,3%

83,6 MULTI

20,2

79,8 (2,7%)

31,4

68,6 28,2

71,8

The tree model of non respondents

THE MOST

SIGNFICANT

VARIABLE

INNER AREAS

N

O

N

O

Y

E

S

N

O

N

O

Y

E

S

Designed by PoweredTemplate

• 1 member Non Italian

• Living in Single-

Municipality Service Center

 Improving the collaboration with local

foreigner associations

 Starting the data collection from 1 member

households and non Italian households

 Reducing the field workload through the

improvement of self-enumeration

SELF ENUMERATION

• High educational level

• Living in Single-Municipality

Service Center

• High professional status

INTERVIEWER

• Low educational level

• Living in remote areas

Adaptive respondents/non respondents stragegies in the survey process

POSSIBILE STRATEGIESPROFILES

NON RESPONDENTS

• Italian households

• Low field workload

RESPONDENTS

 Improving the communication campaign in

remote areas focused to the self enumeration

 Improving an additional tecnique, as smart

questionnaire (i.e. through a QR Code in the

informative letter)

• Living in Single and Multi

Single-Municipality Service

Center

• High field workload

 Reducing the field workload through the

improvement of self-enumeration

 Making available the mobile phone numbers

 Involving municipality census offices to

customize the communication campaign at

the local level

NOT CONTACTED

• Living in Belt areas

• Low field workload

CONTACTED

thanks MANUELA BUSSOLA | [email protected]

NOVELLA CECCONI | [email protected]

ELENA DONATI | [email protected]

LINDA PORCIANI | [email protected]

Simplifying the respondent’s task: the effectiveness of the questionnaire usability optimization in the Permanent Census of Population and Housing - Barbara Lorè, Sabrina Barcherini, Katia Bontempi, Manuela Bussola and Simona Rosati (Istat, Italy)

Languages and translations
English

Evaluating computer-assisted questionnaire usability: the case of Permanent Census of Population and Housing

UNECE Expert Meeting on Statistical Data Collection

12 – 14 June 2023

Istat | Directorate for Data Collection

Barbara Lorè, Sabrina Barcherini, Katia Bontempi, Manuela Bussola, Simona Rosati

The case study: the Permanent census of population and housing

2

SAMPLE

Since 2018 the Census no longer involves all the Italian households, but

only a sample every year: about 1,400,000 resident households in 2,800

Italian municipalities

DATA COLLECTION DESIGN

A sequential mixed-mode data collection design, starting with CAWI

and including CAPI after a month. CAWI remains available until the

end

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

CA QUESTIONNAIRE

The questionnaire consists of:

• the list of the household members

• individual forms to collect information on each member

• a familiar section to collect the characteristics of the dwelling

Heads of the municipal census offices debriefing (year 2018)

3

HMCO feedback (% value)

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

2503 Heads of the Municipal Census

Offices (HMCO) reported at least one

difficulty on questionnaire usability from

fieldwork staff (88.9%) and from

households (79.5%)

Almost 40% of the difficulties were

related to the final submission of the

questionnaire, and around 10% were

about the transition from one part to

another

41.1

9.7

38.1

13.3

Final submission Transition from one part to

another

Difficulties from fieldwork staff

Difficulties from households

Help Desk Tickets (year 2019)

4

2,276 out of approximately

48,000 tickets, were about the

functionality and usability of the CA

questionnaire:

3/4 were about the final submission

1/4 concerned other issues:

o access to individual forms

o filling in the members list

o partial data saving Final

submission

76.7%

Accesso to

individual form

13.2%

Filling of the

member list

5.7%

Partial data saving

4.4%

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

Usability improvements: simplified questionnaire navigation

5

2019

2021

The navigation menu

guides the

completion.

The color changes as

the completion

progresses and new

sections are activated.

The gray line

gradually turns blue A dedicated button for the

submission

GUIDE TO

COMPLETING

THE SURVEY

FAMILY LIST AND

INDIVIDUAL SURVEY

HOUSING UNIT FINAL

INFORMATION

SUMMARY AND

SUBMISSION

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

2019

2021

Buttons have been graphically

standardized and made more

intuitive

BACK SAVE CONTINUE

6

Usability improvements: guided completion (final submission)

2019

2021 Lighter graphic

Standardized button

Optimization of

redundant text

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

CONTINUE

7

Usability improvements: guided completion (access to individual forms)

2019

2021

Simplified access to the

individual forms

Simplified operations to

add or delete family

members forms

Less buttons and more

intuitive labels

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

8

Fieldwork staff debriefing and help desk tickets analysys (year 2021)

Access to the preview of the questionnaire

98.5%

Warning prompts

96.8%

Questionnaire navigation 95.3%

Tooltip use

98.0%

Filling in the family members

list

98.4%

Access to the individual forms

98.5%

Navigation menu

97.3%

Final submission 97.9%

Visualization of the sections

summary

98.6%

Almost all fieldworkers (8207 respondents) reporeted no

difficulties with any of the usability aspects of the CA

questionnaire

Only 500 out of

the 100.000 tickets

collected by the

CC were related

to completion

difficulties

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

Respondent feedback questionnaire (years 2019, 2021)

9

Difficulties reported in 2021Main results 2019 vs 2021

The percentage of households responding in

CAWI has increased from 51.4 % in 2019 to

53.1% in 2021

from 44.5% to 45.7% if elderly

The percentage of households in which the

reference person or another person

belonging to the household completes the

questionnaire has increased from 86.9% in

2019 to 87.7% in 2021

from 57.9% to 60.5% if elderly

The percentage of households not needing

any help at all (not from friends or relatives,

not from help desk, not from Municipal Census

Offices, etc.) has risen from 78.2% in 2019 to

79.9% in 2021

from 50.0% to 52.0% if elderly

Difficulty with the final submission

8.6%

Difficulty with the navigation

21.7%

Difficulty with some of the questions

24.4%

Difficulty to access the questionnaire

42.1%

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

10

Next steps

From 19.1%

in 2019

to

23.3% in

2021

In the future...

Questionnaire redesign for

smartphone completion

Usability testing - mouse tracking

EVALUATING COMPUTER-ASSISTED QUESTIONNAIRE USABILITY | BARBARA LORÈ, SABRINA BARCHERINI, KATIA BONTEMPI, MANUELA BUSSOLA, SIMONA ROSATI

  • Diapositiva 1: Evaluating computer-assisted questionnaire usability: the case of Permanent Census of Population and Housing
  • Diapositiva 2: The case study: the Permanent census of population and housing
  • Diapositiva 3: Heads of the municipal census offices debriefing (year 2018)
  • Diapositiva 4: Help Desk Tickets (year 2019)
  • Diapositiva 5: Usability improvements: simplified questionnaire navigation
  • Diapositiva 6: Usability improvements: guided completion (final submission)
  • Diapositiva 7: Usability improvements: guided completion (access to individual forms)
  • Diapositiva 8: Fieldwork staff debriefing and help desk tickets analysys (year 2021)
  • Diapositiva 9: Respondent feedback questionnaire (years 2019, 2021)
  • Diapositiva 10: Next steps
  • Diapositiva 11: Thanks for listening!

Respondents and non-respondents to population and housing census: some strategies for data collection design in the era of low response rate and high response burden - Linda Porciani, Manuale Bussola, Novella Cecconi and Elena Donati (Istat, Italy)

Languages and translations
English

RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS. SOME STRATEGIES FOR DATA COLLECTION DESIGN IN THE ERA OF

LOW RESPONSE RATE AND HIGH RESPONSE BURDEN.

EXPERT MEETING ON STATISTICAL

DATA COLLECTION – RETHINKING

DATA COLLECTION

12-14 June 2023, online

Istat | Direzione centrale per la raccolta dati

Manuela Bussola, Novella Cecconi, Elena Donati, Linda Porciani

The Permanent Population and Housing Census (PPHC) in Italy

2 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

AIMS OF THE ANALYSIS

Finding some of the reasons behind the decrease of (CAWI) response rate

through the description of the profile of respondents / non respondents

to acquire elements for designing adaptive organizational and communication strategies

to respondents

in order to increase the response rate, especially web response rate

1

2

3

4

OS

QUALITY

CRITERIA

1

3

24

3 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

Why a decision tree model?

The method allows to have a meaningful classification through homogenous predictors starting from a

population very ethereogenous, such as the PPHC sampled households.

How the decision tree model was applied?

1. PPHC sampled households was divided into three macro aggregation (sampled, respondents and non

respondents)

2. Each subpopulation is the object of a specific decision tree model

3. The classification alghorythm is CHi-squared Automatic Interaction Detection (CHAID)

CHAID is a multiple segmentation tecniques based on Χ2 test.

A contingency table (Χ2 and p-value) was calculated for each explicative variable (by modalities) and dependent

variable . The comparison between the minimum p-value (p-min) and the stop value (α, that is the maximum

dimension of the tree or the maximum number of levels or the minimum frequency in a node) of a specific

attribute Xi could be:

- pmin < α; Xi is an included modality;

- pmin > α; Xi is an attribute of a leaf.

A Decision tree model for the profile of respondents/non respondents

4 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

HOUSEHOLD

Socio-demographic characteristics

TERRITORY

Features of living places of households

FIELD WORK ORGANIZATION

Internal organization of territorial census offices (Municipality)

C A

TE G

O R

IE S

Sampled households ⃰

(n. 939.588)

Respondents households

(n. 855.295)

Non Respondents households (n. 84.293)

A Decision tree model for the profile of respondents/non respondents

* Excluding off-target households (death households,

moving households….)

POPOLATION AND DIMENSION OF ANALYSIS

5 RESPONDENTS AND NON RESPONDENTS TO POPULATION AND HOUSING CENSUS | BUSSOLA M, CECCONI N, DONATI E, PORCIANI L

CitIzenship

Education level (the highest)

Generation# Members

Profession (more skilled

member)

HOUSEHOLD

Territorial aggregation

Region

Province

No. inhabitants

Altitude

area

Urbanization degree

Inner Areas

TERRITORY

Statistical Office

Field Workload

Participation to Census (once a year/yearly)

Traineed interviewers

FIELD WORK

ORGANIZATION

Other cross variables: Internet use | Institutions trustness | efficiency of postal solicits

SELECTED

VARIABLESHousehold by generation

Household composition by

citizenship

A Decision tree model for the profile of respondents/non respondents

PRELIMINARY ANALYSIS

OF VARIABLES

Designed by PoweredTemplate

ITALIAN +

NON ITALIAN

ITALIAN

HOUSEHOLD

NON ITALIAN

HOUSEHOLD

Yes 91%

No 9% Respondents (%)

(91,7%)

(4,9%)

(3,4%)

92,5

7,5

75,1

24,9

74,4

25.6

THE MOST SIGNIFICANT

VARIABLE

HOUSEHOLD BY

CITIZENSHIP

The tree model of sampled households

Designed by PoweredTemplate

MEDIUM EDUCATION LEVEL

HIGH

EDUCATION

LEVEL

LOW EDUCATION

LEVEL

CATI 9,6%

CAPI 41,7% Data collection tecniques

(24,5%)

(33,1%)

(42,3%)

64,5

7,7

CAWI 48,7%

27,8

51,0

9,6

39,4

34,0

11,0

55,0

C

A

W

I

C

A

P

I

The tree model of respondents

THE MOST SIGNIFICANT

VARIABLE

EDUCATION LEVEL

C

A

W

I

C

A

P

I

C

A

W

I

C

A

P

I

Designed by PoweredTemplate

BELT +

INTERM.

SINGLE

REMOTE +

ULTRAREMOTE

NO 66,7%

Contacted

by interviewer

(50,3%)

(6,8%)

(40,2%)

16,4

YES 23,3%

83,6 MULTI

20,2

79,8 (2,7%)

31,4

68,6 28,2

71,8

The tree model of non respondents

THE MOST

SIGNFICANT

VARIABLE

INNER AREAS

N

O

N

O

Y

E

S

N

O

N

O

Y

E

S

Designed by PoweredTemplate

• 1 member Non Italian

• Living in Single-

Municipality Service Center

 Improving the collaboration with local

foreigner associations

 Starting the data collection from 1 member

households and non Italian households

 Reducing the field workload through the

improvement of self-enumeration

SELF ENUMERATION

• High educational level

• Living in Single-Municipality

Service Center

• High professional status

INTERVIEWER

• Low educational level

• Living in remote areas

Adaptive respondents/non respondents stragegies in the survey process

POSSIBILE STRATEGIESPROFILES

NON RESPONDENTS

• Italian households

• Low field workload

RESPONDENTS

 Improving the communication campaign in

remote areas focused to the self enumeration

 Improving an additional tecnique, as smart

questionnaire (i.e. through a QR Code in the

informative letter)

• Living in Single and Multi

Single-Municipality Service

Center

• High field workload

 Reducing the field workload through the

improvement of self-enumeration

 Making available the mobile phone numbers

 Involving municipality census offices to

customize the communication campaign at

the local level

NOT CONTACTED

• Living in Belt areas

• Low field workload

CONTACTED

thanks MANUELA BUSSOLA | [email protected]

NOVELLA CECCONI | [email protected]

ELENA DONATI | [email protected]

LINDA PORCIANI | [email protected]

Defining Products with Transaction Data: Aggregation Methods and Assessment of Their Impact, Italy

Languages and translations
English

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS

AND ASSESSMENT OF THEIR IMPACT

Geneva, 2023, June 7-9

UNECE CPI Expert Group meeting

Alessandro Brunetti, Istat (Italy) ([email protected])

Stefania Fatello, Istat (Italy) ([email protected])

Tiziana Laureti, University of Tuscia (Italy) ([email protected])

Federico Polidoro, World Bank ([email protected])

o Scanner data to estimate Italian inflation: the state of play

o Product definition and the relaunches problem

o The MARS approach

o The case study for the experimental application of MARS

o Results: Product match, product homogeneity and scores

o Results: GEKS Törnqvist on different stratifications

o Concluding remarks

Outline

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT2

o Since 2018, Istat has been using scanner data for grocery products (excluding fresh food) to compile CPIs.

o In 2023, scanner data for 4,238 outlets (483 hypermarkets, 1,577 supermarkets, 588 discounts, 1,066

outlets with surface between 100 and 400 s.m. and 569 specialist drug). These outlets belong to the main 21

RTCs and cover the entire national territory (agreement with RTCs, and Nielsen cooperation)

o Starting from 2020, a dynamic approach has been adopted by considering all the matched GTINs in two

consecutive months within each outlet and ECR4 market (representative of elementary aggregates).

o In the context of a dynamic approach, a procedure that manages the issue of relaunches has been

implemented and used in the current production process, but the relaunches detected are a few.

o During the last three years, we carried out some empirical research on the use of multilateral methods (ML),

whose results have been presented at the meetings of the dedicated Eurostat task force, at the UNECE

expert group meetings and finally at the last Ottawa Group meeting.

o The idea is to introduce the ML method into the production of CPIs in the next future.

Scanner data to estimate Italian inflation: the state of play

3 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Product definition and the relaunches problem

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT4

o Product specification has been recognized as a critical step that can strongly influence the performance of

different methods for transaction data (Lamboray, 2022). While scanner data helps reducing lower-level

substitution bias, other biases can appear because products are too tightly or too broadly defined.

o Tightly specified products may cause a bias as new and disappearing products in the two comparison

periods are not considered in a matched price index (De Haan and Krisinich, 2014).

o Broadly specified products may cause a bias as the underlying transactions that make up the individual

product may not be of the same quality, i.e. unit value bias (Dalen, 2017)

AIMS:

o This research work is aimed at testing in an experimental way the use of MARS (Chessa 2016, 2021) on

Italian scanner data, to find a compromise between homogeneity and stability over time, looking also at the

chance to better manage, through this way, the issue of relaunch.

o The GEKS-Törnqvist multilateral matched method is also applied to compile indices and to analyse the

impact of different grouping of GTINs.

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT5

o Several ways of partitioning a set of GTINs exist, each of which may have a different impact on product

match and homogeneity and consequently on price changes.

o MARS combines a measure of product match:

and a measure of product homogeneity:

Where &#x1d43e;0,&#x1d461; is a set of products that are sold both in a fixed base month 0 (December of the previous year) and

a second month t while Gt is the set of items sold in month t.

R squared and degree of product match are thus combined as follows to evaluate and rank item partitions

&#x1d707;&#x1d461; &#x1d43e; =

σ&#x1d458;∈&#x1d43e;0,&#x1d461; &#x1d45e;&#x1d461; &#x1d43e;

σ&#x1d456;∈&#x1d43a;&#x1d461; &#x1d45e;&#x1d456;,&#x1d461;

&#x1d445;&#x1d461; &#x1d43e; =

σ&#x1d458;∈&#x1d43e; &#x1d45e;&#x1d461; &#x1d43e; ҧ&#x1d45d;&#x1d461;

&#x1d43e; − ҧ&#x1d45d;&#x1d461; 2

σ&#x1d456;∈&#x1d43a;&#x1d461; &#x1d45e;&#x1d456;,&#x1d461; &#x1d45d;&#x1d456;,&#x1d461; − ҧ&#x1d45d;&#x1d461;

2

&#x1d440;&#x1d461; &#x1d43e; = &#x1d707;&#x1d461;

&#x1d43e;&#x1d445;&#x1d461; &#x1d43e;

The MARS approach

0 ≤ &#x1d440;&#x1d461; &#x1d43e; ≤ 1

The case study for the experimental application of MARS

6

o Three product aggregates have been selected for the present exercise:

• Rice; Chocolate; Products for the hygiene of the body.

o Data comes from the outlets of the province of Rome* and are referred to the period Dec-20: Apr-23.

o Transaction data are firstly aggregated by outlet type (hypermarkets, supermarket, discounts and specialist

drug) across chains and location.

o The other dimensions, which are taken into account to define homogeneous groups of GTINs, are:

• Brand; ECR markets; packaging volume (grams, centilitres).

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

* 127 outlets are included in the sample for year 2023: 57 supermarkets; 26 outlets with surface between 100 and 400 s.m.; 19 discounts; 9

hypermarkets; 16 specialist drug.

o Firstly, we tried to assess the impact of the different dimensions by testing if they significantly affect the price

levels. To this aim, regression models were used:

ln &#x1d45d;&#x1d456; &#x1d461; = &#x1d6fc; +෍

&#x1d458;

&#x1d6fd;&#x1d458;&#x1d465;&#x1d458;&#x1d456; +෍

&#x1d6fe;ℎ&#x1d466;ℎ&#x1d456; +෍

&#x1d457;

&#x1d6ff;&#x1d457;&#x1d467;&#x1d457;&#x1d456; +෍

&#x1d460;

&#x1d702;&#x1d460;&#x1d463;&#x1d460;&#x1d456; +෍

&#x1d461;

&#x1d703;&#x1d461;&#x1d464;&#x1d461; +&#x1d462;&#x1d456; &#x1d461;

Where &#x1d45d;&#x1d456; &#x1d461; is the price of the reference &#x1d456; in period &#x1d461; and &#x1d465;, &#x1d466;, &#x1d467;, &#x1d463;, &#x1d464; are dummies for: brand; outlet type; market;

packaging volume and time.

o As a second step, we consider different stratifications of GTINs, corresponding to different groups of

products:

S0) the narrowest defined groups of products. In this case, each group consists of a single GTIN per

outlet type (the single item is identified by the combination of a GTIN and an outlet type).

The case study for the experimental application of MARS

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT7

Data description

8

S1) In the second case, strata (homogenous groups of items) are defined by market; brand and size: for each

stratum, the price (quantity) is the unit value (total quantity) calculated considering all the GTINs of the same

market, brand and packaging volume.

S2) the broadly defined groups of products, which correspond to stratification of GTINs according to market

and packaging volume.

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Rice Chocolate Hygiene

products

ECR markets 12 15 8

Brands 92 340 387

Packaging volums 16 192 65

GTINs* 397 1.347 2.253

GTINs per outlet type* 798 2.664 4.184

Strata S1* 604 1.588 1.688

Strata S2* 143 715 345 * average (Dec.2020-Apr.2023)

It also shows the number of strata that

corresponds to the alternative definitions of

product groups.

The table shows the number of markets, brands,

different packaging volumes and GTINs for the

three products explored in our exercise.

Results: Product match, product homogeneity and scores

9 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

In 2021, the degrees of

product match for

options S0 and S1 are

relatively close to each

other. However, in

2022, the decline of

the S0 line is

significantly larger as

compared to S1.

The broadest definition

of product groups

implies a sharp

increase of the degree

of heterogeneity

Results: Product match, product homogeneity and scores

10 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

As a result, S0 seems to be the better option in 2021 but this is less evident as year 2022 is concerned.

Over the two years, the score of S2 remains far below those of the other two options.

Results: Product match, product homogeneity and scores

11 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

For personal hygiene

products the degree of

product match of S0

and S1 drops sharply

(especially in 2022).

Quite the opposite,

product match of S2 is

almost equal to 100%

in both years.

However, concerning

S1, product

homogeneity is almost

equal to 100% in both

years.

Results: Product match, product homogeneity and scores

12 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In this case, the S1 option dominates the alternative stratifications.

• As for rice, the score of S2 remains far below those of the other two options.

Results: Product match, product homogeneity and scores

13 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• Moving from S0 to

S1, the increase of

heterogeneity is very

limited. Even option

S2 exhibits quite a

high degree of

product homogeneity.

• In this case, the use

of item clustering

seems to have clear

advantages: while

reducing strongly the

number of strata

(groups) it keeps high

level of homogeneity

Results: Product match, product homogeneity and scores

14 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

In the case of chocolate, the degree of homogeneity remains above 90%: for that reason, S2 MARS

score is relatively close to S0 and S1.

Results: GEKS Tornqvist on different stratifications

15 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

In the case of rice, the higher degree of heterogeneity in the case of S2 seems to produce an upward

bias of the annual rates of change of the indices

Results: GEKS Tornqvist on different stratifications

16 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

The same conclusion seems to hold for the personal hygiene products. The differences between the different

stratification adopted S1, S2 and S0 are not negligible.

Results: GEKS Tornqvist on different stratifications

17 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Even if, in the case of chocolate, S2 introduces an upward bias in the annual rates of change of the

index, the differences are less pronounced.

Conclusive remarks and perspectives

SCANNER DATA TO COMPILE CPIs: DATA, AGGREGATION STRUCTURE AND THE WINDOW LINKING ISSUE 18

o The results confirm MARS to be a useful tools for finding a compromise between tightly, and broadly

specified products avoiding the bias that can derive from these two extreme approaches in particular when

there are new and disappearing products in the two comparison periods.

o If the variables to identify the homogeneous groups are correctly detected, match and homogeneity seem to

be obtained. Maybe it could be considered the possibility to calculate &#x1d440;&#x1d461; &#x1d43e; (the combination of R squared,

and degree of product match) as a weighted product that gives a wider importance to the homogeneity side

o The variables to identify the homogeneous groups should be refined. In particular, beyond brand and ECR

market (and outlet type), different classes of product size should be considered and not only the size of the

products as is

o This refinement should help better manage the problems of relaunches related to grocery products and the

issues related to shrinkflation, specifically when new packages are proposed, in parallel with the old ones

surviving, weakening the capacity to detect the cases of relaunches

o These will be the main points of the next research Istat program in sight of the implementation of

the multilateral methods for CPI compilation

o Chessa A. (2016), A new methodology for processing scanner data in the Dutch CPI, EURONA 1/2016.

o Chessa, A. G. (2021). A Product Match Adjusted R Squared Method for Defining Products with Transaction

Data. Journal of Official Statistics, 37(2), 411-432.

o Dalén, J. (2017). Unit Values and Aggregation in Scanner Data—Towards a Best Practice. In Fifteen Meeting

of the International Working Group on Price Indices, Eltville am Rhein, May.

o De Haan, J., & Krsinich, F. (2014). Transaction data and the treatment of quality change in nonrevisable

price indices. Journal of Business & Economic Statistics, 32(3), 341-358.

o Lamboray C. (2022) What impact does product specification have on a Fisher price index? Paper prepared

for the 17th Meeting of the Ottawa Group on Price Indices, 7-10 June 2022, Rome, Italy

References

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT19

  • Diapositiva 1: Defining products with transaction data: aggregation methods and assessment of their impact
  • Diapositiva 2: Outline
  • Diapositiva 3: Scanner data to estimate Italian inflation: the state of play
  • Diapositiva 4: Product definition and the relaunches problem
  • Diapositiva 5: The MARS approach
  • Diapositiva 6: The case study for the experimental application of MARS
  • Diapositiva 7: The case study for the experimental application of MARS
  • Diapositiva 8: Data description
  • Diapositiva 9: Results: Product match, product homogeneity and scores
  • Diapositiva 10: Results: Product match, product homogeneity and scores
  • Diapositiva 11: Results: Product match, product homogeneity and scores
  • Diapositiva 12: Results: Product match, product homogeneity and scores
  • Diapositiva 13: Results: Product match, product homogeneity and scores
  • Diapositiva 14: Results: Product match, product homogeneity and scores
  • Diapositiva 15: Results: GEKS Tornqvist on different stratifications
  • Diapositiva 16: Results: GEKS Tornqvist on different stratifications
  • Diapositiva 17: Results: GEKS Tornqvist on different stratifications
  • Diapositiva 18: Conclusive remarks and perspectives
  • Diapositiva 19: References
  • Diapositiva 20: Thank you

Compilation of Italian HICP by Different Groups of Households

Languages and translations
English

COMPILATION OF ITALIAN HICP BY DIFFERENT GROUPS OF HOUSEHOLDS

Geneva, 2023, June 7-9

UNECE CPI Expert Group meeting

Ilaria Arigoni, Istat (Italy) ([email protected])

Alessandro Brunetti, Istat (Italy) ([email protected])

Valeria de Martino, Istat (Italy)([email protected])

Federico Polidoro, World Bank ([email protected])

Outline

2

• Inflation in Italy and in the Euro area in 2022-2023: an overview

• Current Istat methodology to compile HICP by five groups of households

• Changing from expenditure to income the variable to identify the groups of

households: main outcomes

• The impact of inflation on the income-based groups of households

• Characteristics of the households in the extreme groups and comparison

between their distributions in the five groups by expenditure and by income

• Is it enough focusing on the weights to measure the actual impact of inflation

on the poorest people?

• Some concluding remarks and perspectives

UNECE CPI Expert Group meeting 2023, June 7-9

Inflation in Italy and in the Euro area in 2022-23: an overview

3

• 2022, as well the final part of 2021, have been characterized, in Italy, in the

European Union (EU) and in the world, by a sharp increase of the rates of change

of consumer price indices that are slowly decreasing in the first part of 2023

• Italian inflation measured by HICP has raised from +1.0% in July 2021 (+2.2% in

the Euro Area) to +12.5% in November 2022 (+10.1% in the Euro area), slowing

down respectively to +7.0% and +8.7% in April 2023.

• Given the impact of energy prices on the sharp raise and on the recent slowdown

of inflation, the overall HICP excluding energy has gone on speeding up (arriving in

March 2023 at +7.9% in the EA and at +6.9% in Italy) and starting declining only in

April 2023 (+7.4% the EA; +6.7% in Italy)

• Yearly rates of change of food prices are still very high (in April 2023 +13.5% in the

EA, +11.0% in Italy)

UNECE CPI Expert Group meeting 2023, June 7-9

Inflation in Italy and in the Euro area in 2022-23: an overview

4

Figure 1. HICP Indices and annual rates of change. Italy and Euro area. 2016 – 2023. Percentage values

7.0

8.7

123.13

121.4

80.0

85.0

90.0

95.0

100.0

105.0

110.0

115.0

120.0

125.0

130.0

-2.0

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

Ja n-1

6

Ap r-1

6

Ju l-1

6

Oc t-1

6

Ja n-1

7

Ap r-1

7

Ju l-1

7

Oc t-1

7

Ja n-1

8

Ap r-1

8

Ju l-1

8

Oc t-1

8

Ja n-1

9

Ap r-1

9

Ju l-1

9

Oc t-1

9

Ja n-2

0

Ap r-2

0

Ju l-2

0

Oc t-2

0

Ja n-2

1

Ap r-2

1

Ju l-2

1

Oc t-2

1

Ja n-2

2

Ap r-2

2

Ju l-2

2

Oc t-2

2

Ja n-2

3

Ap r-2

3

Euro area m/m-12 (left axis) Italy m/m-12 (left axis) Euro area index Italy Index

UNECE CPI Expert Group meeting 2023, June 7-9

Current Istat methodology to compile HICP by five groups of households

5

• Since 2005, Istat has been compiling and disseminating a measure of the impact of

the inflation on five different groups of households of equal dimension ordered by

their spending power (from the lowest of the first group to the highest of the fifth) used

as a proxy of their income conditions.

• Indices of consumer prices are compiled considering the different structure of

consumption expenditure of each group of households (summarized in the system of

weights).

• HICPs by population subgroups are “satellite” indices of HICP: they share the set of

basic information (basket of products and price elementary data) and the

methodology of Italian HICP, but they are different each other for the system of

weights used for their calculation.

UNECE CPI Expert Group meeting 2023, June 7-9

6

• Weights for the five subgroups of households based on HBS data

• To estimate the weights, consumption expenditures are equivalized by using an

appropriate equivalence scale (Carbonaro scale), that considers the effects of

economies of scale and makes them comparable to that of a two-member household,

and, as such, among different-size households

• Households ordered by equivalent consumption expenditure, are organized by specific

cut-point values and divided into five groups of equal size (equivalent-expenditure

fifths)

• In a situation of perfect equality, a share of 20% of the total expenditure sustained by

all the households would be placed in each fifth: actually, in 2021, in terms of

equivalent expenditure, that of the last fifth was about 5 times that of the first fifth

(inequality measure on expenditure side).

Current Istat methodology to compile HICP by five groups of households

UNECE CPI Expert Group meeting 2023, June 7-9

7

Figure 2. Expenditure weights by 5 households groups and main special aggregates in 2023 (HBS year 2022)

UNECE CPI Expert Group meeting 2023, June 7-9

Current Istat methodology to compile HICP by five groups of households

8

• The methodology adopted using households’ expenditure data has been transferred to

households’ income data derived from HBS

• In this case to detect the households’ groups, households’ income data are equivalized

by using OECD-modified equivalence scale

• Households ordered by equivalent income, are organized by specific cut-point values

and divided into five groups of equal size (equivalent-income fifths) from the poorest

one (the first) to the wealthiest one (the fifth)

• Thus, the weights are estimated using the expenditure data of each income group of

households

• The inter quintile ratio between the fifth and the first income group is equal to 3.9,

whereas the fifth group spends 2.07 times what the first group spends (in 2021)

From expenditure to income the variable to identify the groups of households: main outcomes

UNECE CPI Expert Group meeting 2023, June 7-9

9

Figure 3. Expenditure and income weights. First group of households. Main special aggregates (2022, HBS 2021)

UNECE CPI Expert Group meeting 2023, June 7-9

From expenditure to income the variable to identify the groups of households: main outcomes

219,419

112,662

145,527

207,524

100,205

115,613

0

50,000

100,000

150,000

200,000

250,000 P

ro c e ss

e d

fo o

d (

in c l.

a lc

o h

o l a n

d

to b

a c c o

)

U n

p ro

c e ss

e d

fo o

d

E n

e rg

y

N o

n -e

n e rg

y

in d

u st

ri a l

g o

o d

s

S e rv

ic e s

re la

te d

t o

h o

u si

n g

S e rv

ic e s

re la

te d

t o

c o

m m

u n

ic a ti

o

n

S e rv

ic e s

re la

te d

t o

re c re

a ti

o n

S e rv

ic e s

re la

te d

t o

tr a n

sp o

rt s

S e rv

ic e s

m is

c e ll a n

e o

u s

Exp weights first group

Income weights first group

10 UNECE CPI Expert Group meeting 2023, June 7-9

From expenditure to income the variable to identify the groups of households: main outcomes

Figure 4. Expenditure and income weights. Fifth group of households. Main special aggregates (2022, HBS 2021)

115,474

49,340 67,376

124,251

53,387 79,555

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000

P ro

c e ss

e d

f o

o d

( in

c l.

a lc

o h

o l a n

d t

o b

a c c o

)

U n

p ro

c e ss

e d

f o

o d

E n

e rg

y

N o

n -e

n e rg

y i n

d u

st ri

a l

g o

o d

s

S e rv

ic e s

re la

te d

t o

h o

u si

n g

S e rv

ic e s

re la

te d

t o

c o

m m

u n

ic a ti

o n

S e rv

ic e s

re la

te d

t o

re c re

a ti

o n

S e rv

ic e s

re la

te d

t o

tr a n

sp o

rt s

S e rv

ic e s

m is

c e ll a n

e o

u s

Exp weights fifth group

Income weights fifth group

11

• The structure of weights is similar but not the same between those referred to the

groups detected by expenditure data and those referred to the groups detected by

income data

• Specifically:

✓ For the first group (low income/low expenditure) the weights of the aggregates affected by

higher increase of consumer prices in 2022, decrease in relative terms (by 1.19

percentage points for unprocessed food, by almost 3 p.p. for Energy)

✓ For the fifth group (high income/high expenditure), vice versa the weights of the aggregates

affected by higher increase of consumer prices in 2022, increase in relative terms (by

0.88 percentage points for unprocessed food, by almost 1.22 p.p. for Energy)

From expenditure to income the variable to identify the groups of households: main outcomes

UNECE CPI Expert Group meeting 2023, June 7-9

The impact of inflation on the income-based groups of households

12 UNECE CPI Expert Group meeting 2023, June 7-9

14.5

10.7

17.9

9.9

-5.0

0.0

5.0

10.0

15.0

20.0

F eb

-1 8

A p

r- 18

Ju n

-1 8

A u

g -1

8

O ct

-1 8

D ec

-1 8

F eb

-1 9

A p

r- 19

Ju n

-1 9

A u

g -1

9

O ct

-1 9

D ec

-1 9

F eb

-2 0

A p

r- 20

Ju n

-2 0

A u

g -2

0

O ct

-2 0

D ec

-2 0

F eb

-2 1

A p

r- 21

Ju n

-2 1

A u

g -2

1

O ct

-2 1

D ec

-2 1

F eb

-2 2

A p

r- 22

Ju n

-2 2

A u

g -2

2

O ct

-2 2

D ec

-2 2

income weights first group Income weights fifth group Exp weights first group Exp weights fifht group

Figure 5. Inflation impact on the first and the fifth (by expenditure and income) groups of households. All-item

index. M/M-12 rate of change, January 2018 – December 2022

The impact of inflation on the poorest group of households in Italy

13 UNECE CPI Expert Group meeting 2023, June 7-9

• The differences highlighted in the weights between the extreme groups considered

either in terms of equivalent expenditure or in terms of equivalent income, do not

produce gap in terms on impact of inflation between the first and the fifth group in the

years when inflation is relatively low (2018 – 2020)

• As soon as the price increase becomes heterogeneous across the different product

aggregates, with energy prices sharply growing at rate strongly higher than that of

other aggregates, followed by food products, the inflation gap between the first and the

fifth group (considered both in terms of expenditure and income) starts enlarging (end

2020)

• Given the differences in the structure of weights of the 2 extreme groups, the gap

between the two groups detected by expenditure data becomes gradually wider than

that between the two groups detected by income data (in December 2022 it is equal to

8 p.p. in the first case and to 3.8 p.p. in the second case)

Characteristics of the households in the extreme groups and comparison between their

distributions in the five groups by expenditure and by income

14 UNECE CPI Expert Group meeting 2023, June 7-9

• In 2021 in the first fifth of households’ expenditure group (but not of income):

✓ about 72% of the households range from 2 to 4 members

✓ in 28.8% the breadth of households is equal to 2, are mainly couples without children in

which the reference person is elderly (14.6%) and retired from work (95.0%)

✓ Households of 3 or 4 members are mainly couples with two children

• In 2021 in the first fifth of households’ income group (but not of expenditure):

✓ there are mainly one-component households (37.1%), followed by those households of a

size equal to 2 (22.9%)

✓ A fifth of the households (54.7% of all single-component) are elderly alone, retired from

work (49.8%) or inactive but in other condition (different from retired) (46.3%)

✓ With respect to what is observed in households belonging to the first fifth of expenditure but

not income, in the first fifth of households’ income group, the over-represented are mainly

single people (1.8 times) and single-parent families (1.3 times)

Characteristics of the households in the extreme groups and comparison between their

distributions in the five groups by expenditure and by income

15 UNECE CPI Expert Group meeting 2023, June 7-9

Table 1. Cross distribution of households by expenditure/income fifths. Absolute value and percentage points. 2021 Income fifths

Exp fifths 1 2 3 4 5 Total

1

2,692,049 1,375,267 699,709 297,137 137,507 5,201,670

10.35 5.29 2.69 1.14 0.53 20

51.75 26.44 13.45 5.71 2.64

51.75 26.44 13.45 5.71 2.64

2

1,219,493 1,518,011 1,261,158 785,691 415,794 5,200,147

4.69 5.84 4.85 3.02 1.6 19.99

23.45 29.19 24.25 15.11 8

23.44 29.19 24.24 15.1 7.99

3

710,222 1,149,206 1,363,005 1,182,498 798,416 5,203,348

2.73 4.42 5.24 4.55 3.07 20.01

13.65 22.09 26.19 22.73 15.34

13.65 22.1 26.2 22.73 15.35

4

417,855 796,851 1,127,863 1,523,926 1,334,976 5,201,472

1.61 3.06 4.34 5.86 5.13 20

8.03 15.32 21.68 29.3 25.67

8.03 15.32 21.68 29.3 25.67

5

162,527 361,463 750,225 1,412,514 2,514,584 5,201,313

0.62 1.39 2.88 5.43 9.67 20

3.12 6.95 14.42 27.16 48.35

3.12 6.95 14.42 27.15 48.35

Total 5,202,147 5,200,798 5,201,960 5,201,766 5,201,277 26,010,000

20 20 20 20 20 100

F ifth

s o f in

co m

e b y fifth

s o f exp

en d

itu re

Fifths of expenditure by fifths of income

Characteristics of the households in the extreme groups and comparison between their

distributions in the five groups by expenditure and by income

16 UNECE CPI Expert Group meeting 2023, June 7-9

• In 2021:

✓ 21.8% of the first fifth of households by expenditure is allocated in the last three fifths of

households by income

✓ 24.8% of the first fifth of households by income is allocated in the last three fifths of

households by expenditure

✓ 24.5% of the fifth fifth of households by expenditure is allocated in the first three fifths of

households by income

✓ 26.0% of the fifth fifth of households by income is allocated in the first three fifths of

households by expenditure

Characteristics of the households in the extreme groups and comparison between their

distributions in the five groups by expenditure and by income

17 UNECE CPI Expert Group meeting 2023, June 7-9

• The heterogeneity of the allocation between the two groups of households at the basis of the

differences in the structure of weights

• In the first fifth of households by income there are households belonging to groups of

households by expenditure from the second (23.44%) to the fifth (3.12%)

• In the fifth group of households by income there are households belonging to groups of

households by expenditure from the second (26.44%) to the fifth (2.64%) group

• It means that the breakdown of expenditures in the first group of households by income is

different from that of the first group of households by expenditure, bringing behavior of

consumption typical of households that spend wider amount and reducing the relative weight of

food and energy products and vice versa for the fifth group of households by income

• This brings closer in 2022 the lines of inflation that affect the two extreme groups by income

(specifically lowering that related to the poorest) with respect those that affect the two extreme

groups by expenditure

Focusing only on the weights to measure the actual impact of inflation on the poorest?

18 UNECE CPI Expert Group meeting 2023, June 7-9

• Till now, in Italy, the analysis to estimate the differentiated impacts of inflation on groups of

households broken down by their economic condition has focused on the structure of weights

• In 2022 the government supports to poor households (detected in the basis of their income or

other indicators of their economic conditions) related to energy products (in particular, electricity

and gas) have been wide and in the form of reduction of prices

• This was considered in the compilation of the energy product consumer price indices that are

the results of weighted mean (with weights given by the number of households that have

benefited of the government support) of different inflation profile

• The aggregate consumer price indices of energy products are considered as such to estimate

the impact of overall inflation on the poorest and on the richest groups of households

• Should we start considering different profiles of inflation in addition to different structure of

weights?

• Moreover, how should we consider the impact of government support to households on the

weights of different groups, given the traditional temporal lag in the weights’ estimation?

Some concluding remarks and perspectives

19 UNECE CPI Expert Group meeting 2023, June 7-9

• The capacity of the extreme groups of households by expenditure to be a proxy of the poorest

and of the wealthiest households is mitigated by these results

• The outcomes of the estimation of the impact of inflation on different groups of households are

interesting and encourage further analysis. Specifically:

✓ Of the socio-demographic and socio-economic characteristics of the households’ groups by income

to be further analyzed

✓ Of the relationship between income and expenditure in the different groups

✓ Considering different profile of inflation for the 2 extreme groups to complement the approach based

exclusively on the weights

✓ Refining further the work on the weights given the effects on the structure of expenditure of the

government support to poorest households on energy products

• Starting the dissemination of an experimental statistics to open the debate (in 2024?)

• The new frame regulation on the social statistics that will harmonize HBS in the EU under a

common legal umbrella since 2026 will enhance the possibility to use HBS data to compare

across the European countries the impact of inflation on the different groups of households

Thank you

Ilaria Arigoni, Istat (Italy) ([email protected])

Alessandro Brunetti, Istat (Italy) ([email protected])

Valeria de Martino, Istat (Italy)([email protected])

Federico Polidoro, Istat (Italy) ([email protected])

Defining products with transaction data: aggregation methods and assessment of their impact, Italy

Languages and translations
English

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS

AND ASSESSMENT OF THEIR IMPACT

Geneva, 2023, June 7-9

UNECE CPI Expert Group meeting

Alessandro Brunetti, Istat (Italy) ([email protected])

Stefania Fatello, Istat (Italy) ([email protected])

Tiziana Laureti, University of Tuscia (Italy) ([email protected])

Federico Polidoro, World Bank ([email protected])

o Scanner data to estimate Italian inflation: the state of play

o Product definition and the relaunches problem

o The MARS approach

o The case study for the experimental application of MARS

o Results: Product match, product homogeneity and scores

o Results: GEKS Tornqvist on different stratifications

o Concluding remarks

o References

Outline

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT2

o Since 2018, Istat has been using scanner data of grocery products (excluding fresh food) to compile CPIs.

o In 2023, scanner data for 4,238 outlets (483 hypermarkets, 1,577 supermarkets, 588 discounts, 1,066

outlets with surface between 100 and 400 s.m. and 569 specialist drug). These outlets belong to the main 21

RTCs and cover the entire national territory (agreement with RTCs, and Nielsen cooperation)

o Starting from 2020, a dynamic approach has been adopted. Each month a sample of GTINs is selected

within each outlet and ECR4 market (representative of elementary aggregates).

o In the context of a dynamic approach, a procedure that manages the issue of relaunches has ben

implemented and used in the current production process, but the relaunches detected are a few

o During the last three years, we carried out some empirical research on the use of multilateral methods,

whose results have been presented at the meetings of the dedicated Eurostat task force, at the UNECE

expert group meetings and finally at the last year Ottawa Group meeting.

o The idea is to introduce the ML method into the production of CPIs in the next future.

Scanner data to estimate Italian inflation: the state of play

3 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Product definition and the relaunches problem

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT4

o Product specification has been recognized as a critical step that can strongly influence the performance of

different methods for transaction data (Lamboray, 2022). While scanner data helps reducing lower-level

substitution bias, other biases can appear because products are specified too tightly or too broadly defined.

o Tightly specified products may cause a bias as new and disappearing products in the two comparison

periods are not considered in a matched price index (De Haan and Krisinich, 2014).

o Broadly specified products may cause a bias as the underlying transactions that make up the individual

product may not be of the same quality, i.e. unit value bias (Dalen, 2017)

o This research work is aimed at testing in an experimental way the use of MARS (Chessa 2016, 2021) on

Italian scanner data, to find a compromise between homogeneity and stability of groups of products over

time, looking also at the chance to better manage, through this way, the issue of relaunch.

o The GEKS-Törnqvist multilateral matched method is also applied to compile indices

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT5

o Several ways of partitioning a set of GTINs exist, each of which may have a different impact on product

match and homogeneity and consequently on price changes.

o MARS (Chessa, 2021) combines a measure of product match:

and a measure of product homogeneity:

Where &#x1d43e;0,&#x1d461; is a set of products that are sold both in a fixed base month 0 (December of the previous year)

and a second month t while Gt is the set of items sold in month t.

R squared and degree of product match are thus combined as follows to evaluate and rank item partitions

&#x1d707;&#x1d461; &#x1d43e; =

σ&#x1d458;∈&#x1d43e;0,&#x1d461; &#x1d45e;&#x1d461; &#x1d43e;

σ&#x1d456;∈&#x1d43a;&#x1d461; &#x1d45e;&#x1d456;,&#x1d461;

&#x1d445;&#x1d461; &#x1d43e; =

σ&#x1d458;∈&#x1d43e; &#x1d45e;&#x1d461; &#x1d43e; ҧ&#x1d45d;&#x1d461;

&#x1d43e; − ҧ&#x1d45d;&#x1d461; 2

σ&#x1d456;∈&#x1d43a;&#x1d461; &#x1d45e;&#x1d456;,&#x1d461; &#x1d45d;&#x1d456;,&#x1d461; − ҧ&#x1d45d;&#x1d461;

2

&#x1d440;&#x1d461; &#x1d43e; = &#x1d707;&#x1d461;

&#x1d43e;&#x1d445;&#x1d461; &#x1d43e;

The case study for the experimental application of MARS

The case study for the experimental application of MARS

6

o Three products have been selected for the present exercise:

• Rice; Chocolate; Products for the hygiene of the body.

o Data comes from the outlets of the province of Rome* and are referred to the period Dec-20: Apr-23.

o Transaction data are firstly aggregated by outlet type (across chains and location). So the outlet dimensions

are ruled out of the scope of the present analysis.

o The other dimensions, which are taken into account to define homogeneous groups of GTINs, are:

• Brand; ECR markets; packaging volume (grams, centilitres).

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

* 127 outlets are included in the sample for year 2023: 57 supermarkets; 26 outlets with surface between 100 and 400 s.m.; 19 discounts; 9

hypermarkets; 16 specialist drug.

o Firstly, we tried to asses the impact of the different dimensions by testing if they significantly affect the price

levels. To this aim, regression models were used:

ln &#x1d45d;&#x1d456; &#x1d461; = &#x1d6fc; +෍

&#x1d458;

&#x1d6fd;&#x1d458;&#x1d465;&#x1d458; +෍

&#x1d6fe;ℎ&#x1d466;ℎ +෍

&#x1d457;

&#x1d6ff;&#x1d457;&#x1d467;&#x1d457; +෍

&#x1d460;

&#x1d702;&#x1d460;&#x1d463;&#x1d460; +෍

&#x1d461;

&#x1d703;&#x1d461;&#x1d464;&#x1d461; +&#x1d462;&#x1d456; &#x1d461;

Where &#x1d45d;&#x1d456; &#x1d461; is the price of the reference &#x1d456; in period &#x1d461; and &#x1d465;, &#x1d466;, &#x1d467;, &#x1d463;, &#x1d464; are dummies for: brand; outlet type; market;

packaging volume and time.

o As a second step, we consider different stratifications of GTINs, corresponding to different groups of

products:

S0) the narrowest defined groups of products. In this case, each group consists of a single GTIN per

outlet type (the single item is identified by the combination of a GTIN and an outlet type).

The case study for the experimental application of MARS

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT7

Data description

8

S1) In the second case, strata (homogenous groups of items) are defined by market; brand and size: for each

stratum, the price (quantity) is the unit value (total quantity) calculated considering all the GTINs of the same

market, brand and packaging volume.

S2) the broadly defined groups of products, which correspond to stratification of GTINs according to market

and packaging volume.

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Rice Chocolate Hygiene

products

ECR markets 12 15 8

Brands 92 340 387

Packaging volums 16 192 65

GTINs* 397 1.347 2.253

GTINs per outlet type* 798 2.664 4.184

Strata S1* 604 1.588 1.688

Strata S2* 143 715 345 * average (Dec.2020-Apr.2023)

It also shows the number of strata that

corresponds to the alternative definitions of

product groups

The table shows the number of markets, brands,

different packaging volumes and GTINs for the

three products explored in our exercise.

Results: Product match, product homogeneity and scores

9 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In 2021, the degrees

of product match for

options S0 and S1 are

relatively close to each

other. However in

2022, the decline of

the S0 line is

significantly larger as

compare to S1.

• The broadest

definition of product

groups implies a

sharp increase of the

degree of

heterogeneity

Results: Product match, product homogeneity and scores

10 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• As a result, S0 seems to be the better option in 2021 but this is less evident where year 2022 is

concerned.

• Over the two years, the score of S2 remains far below those of the other two options.

Results: Product match, product homogeneity and scores

11 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• For Hygiene products

the degree of product

match of S0 and S1

drops sharply

(especially in 2022).

On the other side,

product match of S2

is almost equal to

100% in both years.

• Also in this case the

broadest definition

of product groups

implies a sharp

increase of the

degree of

heterogeneity

Results: Product match, product homogeneity and scores

12 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In this case, the S1 option dominates the alternative stratifications.

• As for rice, the score of S2 remains far below those of the other two options.

Results: Product match, product homogeneity and scores

13 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Moving from S0 to

S1, the increase of

heterogeneity is very

limited. Even option

S2 exhibits quite a

high degree of

product homogeneity.

In this case, the use

of clustering seems to

have clear

advantages

Results: Product match, product homogeneity and scores

14 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In the case of chocolate, the degree of heterogeneity remains above 90%: for that reason, S2

MARS score is relatively close to S0 and S1.

Results: GEKS Tornqvist on different stratifications

15 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In the case of rice, the higher degree of heterogeneity seems to produce an upward bias on the

annual rates of change of the indices

Results: GEKS Tornqvist on different stratifications

16 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• The same conclusion seems to hold for the Hygiene products for the body. The differences are

even higher than in the previous case

Results: GEKS Tornqvist on different stratifications

17 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• Even if, in the case, S2 introduces an upward bias in the annual rates of change of the index, the

differences are less pronounced.

o The results confirm MARS to be a useful tools for finding a compromise between the measures of product

match and product homogeneity.

o Neverthless, the evidence coming from this preliminary experimental use of MARS suggests that the bias

related to the heterogenity plays a major role in affecting price changes.

o Next steps:

✓ Carry out sensitivity analysis by varying the weights for the two measures that contribute to the score;

✓ Analyze the role of packaging volumes for the definition of homogeneous products groups. Different

classes of product size should be considered and not only the size of the products as is

Concluding remarks

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT18

o Chessa A. (2016), A new methodology for processing scanner data in the Dutch CPI, EURONA 1/2016.

o Chessa, A. G. (2021). A Product Match Adjusted R Squared Method for Defining Products with Transaction

Data. Journal of Official Statistics, 37(2), 411-432.

o Dalén, J. (2017). Unit Values and Aggregation in Scanner Data—Towards a Best Practice. In Fifteen Meeting

of the International Working Group on Price Indices, Eltville am Rhein, May.

o De Haan, J., & Krsinich, F. (2014). Transaction data and the treatment of quality change in nonrevisable

price indices. Journal of Business & Economic Statistics, 32(3), 341-358.

o Lamboray C. (2022) What impact does product specification have on a Fisher price index? Paper prepared

for the 17th Meeting of the Ottawa Group on Price Indices, 7-10 June 2022, Rome, Italy

References

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT19

Defining products with transaction data: aggregation methods and assessment methods and assessment of their impact, Italy

Languages and translations
English

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS

AND ASSESSMENT OF THEIR IMPACT

Geneva, 2023, June 7-9

UNECE CPI Expert Group meeting

Alessandro Brunetti, Istat (Italy) ([email protected])

Stefania Fatello, Istat (Italy) ([email protected])

Tiziana Laureti, University of Tuscia (Italy) ([email protected])

Federico Polidoro, World Bank ([email protected])

o Scanner data to estimate Italian inflation: the state of play

o Product definition and the relaunches problem

o The MARS approach

o The case study for the experimental application of MARS

o Results: Product match, product homogeneity and scores

o Results: GEKS Tornqvist on different stratifications

o Concluding remarks

o References

Outline

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT2

o Since 2018, Istat has been using scanner data of grocery products (excluding fresh food) to compile CPIs.

o In 2023, scanner data for 4,238 outlets (483 hypermarkets, 1,577 supermarkets, 588 discounts, 1,066

outlets with surface between 100 and 400 s.m. and 569 specialist drug). These outlets belong to the main 21

RTCs and cover the entire national territory (agreement with RTCs, and Nielsen cooperation)

o Starting from 2020, a dynamic approach has been adopted. Each month a sample of GTINs is selected

within each outlet and ECR4 market (representative of elementary aggregates).

o In the context of a dynamic approach, a procedure that manages the issue of relaunches has ben

implemented and used in the current production process, but the relaunches detected are a few

o During the last three years, we carried out some empirical research on the use of multilateral methods,

whose results have been presented at the meetings of the dedicated Eurostat task force, at the UNECE

expert group meetings and finally at the last year Ottawa Group meeting.

o The idea is to introduce the ML method into the production of CPIs in the next future.

Scanner data to estimate Italian inflation: the state of play

3 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Product definition and the relaunches problem

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT4

o Product specification has been recognized as a critical step that can strongly influence the performance of

different methods for transaction data (Lamboray, 2022). While scanner data helps reducing lower-level

substitution bias, other biases can appear because products are specified too tightly or too broadly defined.

o Tightly specified products may cause a bias as new and disappearing products in the two comparison

periods are not considered in a matched price index (De Haan and Krisinich, 2014).

o Broadly specified products may cause a bias as the underlying transactions that make up the individual

product may not be of the same quality, i.e. unit value bias (Dalen, 2017)

o This research work is aimed at testing in an experimental way the use of MARS (Chessa 2016, 2021) on

Italian scanner data, to find a compromise between homogeneity and stability of groups of products over

time, looking also at the chance to better manage, through this way, the issue of relaunch.

o The GEKS-Törnqvist multilateral matched method is also applied to compile indices

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT5

o Several ways of partitioning a set of GTINs exist, each of which may have a different impact on product

match and homogeneity and consequently on price changes.

o MARS (Chessa, 2021) combines a measure of product match:

and a measure of product homogeneity:

Where &#x1d43e;0,&#x1d461; is a set of products that are sold both in a fixed base month 0 (December of the previous year)

and a second month t while Gt is the set of items sold in month t.

R squared and degree of product match are thus combined as follows to evaluate and rank item partitions

&#x1d707;&#x1d461; &#x1d43e; =

σ&#x1d458;∈&#x1d43e;0,&#x1d461; &#x1d45e;&#x1d461; &#x1d43e;

σ&#x1d456;∈&#x1d43a;&#x1d461; &#x1d45e;&#x1d456;,&#x1d461;

&#x1d445;&#x1d461; &#x1d43e; =

σ&#x1d458;∈&#x1d43e; &#x1d45e;&#x1d461; &#x1d43e; ҧ&#x1d45d;&#x1d461;

&#x1d43e; − ҧ&#x1d45d;&#x1d461; 2

σ&#x1d456;∈&#x1d43a;&#x1d461; &#x1d45e;&#x1d456;,&#x1d461; &#x1d45d;&#x1d456;,&#x1d461; − ҧ&#x1d45d;&#x1d461;

2

&#x1d440;&#x1d461; &#x1d43e; = &#x1d707;&#x1d461;

&#x1d43e;&#x1d445;&#x1d461; &#x1d43e;

The case study for the experimental application of MARS

The case study for the experimental application of MARS

6

o Three products have been selected for the present exercise:

• Rice; Chocolate; Products for the hygiene of the body.

o Data comes from the outlets of the province of Rome* and are referred to the period Dec-20: Apr-23.

o Transaction data are firstly aggregated by outlet type (across chains and location). So the outlet dimensions

are ruled out of the scope of the present analysis.

o The other dimensions, which are taken into account to define homogeneous groups of GTINs, are:

• Brand; ECR markets; packaging volume (grams, centilitres).

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

* 127 outlets are included in the sample for year 2023: 57 supermarkets; 26 outlets with surface between 100 and 400 s.m.; 19 discounts; 9

hypermarkets; 16 specialist drug.

o Firstly, we tried to asses the impact of the different dimensions by testing if they significantly affect the price

levels. To this aim, regression models were used:

ln &#x1d45d;&#x1d456; &#x1d461; = &#x1d6fc; +෍

&#x1d458;

&#x1d6fd;&#x1d458;&#x1d465;&#x1d458; +෍

&#x1d6fe;ℎ&#x1d466;ℎ +෍

&#x1d457;

&#x1d6ff;&#x1d457;&#x1d467;&#x1d457; +෍

&#x1d460;

&#x1d702;&#x1d460;&#x1d463;&#x1d460; +෍

&#x1d461;

&#x1d703;&#x1d461;&#x1d464;&#x1d461; +&#x1d462;&#x1d456; &#x1d461;

Where &#x1d45d;&#x1d456; &#x1d461; is the price of the reference &#x1d456; in period &#x1d461; and &#x1d465;, &#x1d466;, &#x1d467;, &#x1d463;, &#x1d464; are dummies for: brand; outlet type; market;

packaging volume and time.

o As a second step, we consider different stratifications of GTINs, corresponding to different groups of

products:

S0) the narrowest defined groups of products. In this case, each group consists of a single GTIN per

outlet type (the single item is identified by the combination of a GTIN and an outlet type).

The case study for the experimental application of MARS

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT7

Data description

8

S1) In the second case, strata (homogenous groups of items) are defined by market; brand and size: for each

stratum, the price (quantity) is the unit value (total quantity) calculated considering all the GTINs of the same

market, brand and packaging volume.

S2) the broadly defined groups of products, which correspond to stratification of GTINs according to market

and packaging volume.

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Rice Chocolate Hygiene

products

ECR markets 12 15 8

Brands 92 340 387

Packaging volums 16 192 65

GTINs* 397 1.347 2.253

GTINs per outlet type* 798 2.664 4.184

Strata S1* 604 1.588 1.688

Strata S2* 143 715 345 * average (Dec.2020-Apr.2023)

It also shows the number of strata that

corresponds to the alternative definitions of

product groups

The table shows the number of markets, brands,

different packaging volumes and GTINs for the

three products explored in our exercise.

Results: Product match, product homogeneity and scores

9 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In 2021, the degrees

of product match for

options S0 and S1 are

relatively close to each

other. However in

2022, the decline of

the S0 line is

significantly larger as

compare to S1.

• The broadest

definition of product

groups implies a

sharp increase of the

degree of

heterogeneity

Results: Product match, product homogeneity and scores

10 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• As a result, S0 seems to be the better option in 2021 but this is less evident where year 2022 is

concerned.

• Over the two years, the score of S2 remains far below those of the other two options.

Results: Product match, product homogeneity and scores

11 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• For Hygiene products

the degree of product

match of S0 and S1

drops sharply

(especially in 2022).

On the other side,

product match of S2

is almost equal to

100% in both years.

• Also in this case the

broadest definition

of product groups

implies a sharp

increase of the

degree of

heterogeneity

Results: Product match, product homogeneity and scores

12 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In this case, the S1 option dominates the alternative stratifications.

• As for rice, the score of S2 remains far below those of the other two options.

Results: Product match, product homogeneity and scores

13 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

Moving from S0 to

S1, the increase of

heterogeneity is very

limited. Even option

S2 exhibits quite a

high degree of

product homogeneity.

In this case, the use

of clustering seems to

have clear

advantages

Results: Product match, product homogeneity and scores

14 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In the case of chocolate, the degree of heterogeneity remains above 90%: for that reason, S2

MARS score is relatively close to S0 and S1.

Results: GEKS Tornqvist on different stratifications

15 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• In the case of rice, the higher degree of heterogeneity seems to produce an upward bias on the

annual rates of change of the indices

Results: GEKS Tornqvist on different stratifications

16 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• The same conclusion seems to hold for the Hygiene products for the body. The differences are

even higher than in the previous case

Results: GEKS Tornqvist on different stratifications

17 DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT

• Even if, in the case, S2 introduces an upward bias in the annual rates of change of the index, the

differences are less pronounced.

o The results confirm MARS to be a useful tools for finding a compromise between the measures of product

match and product homogeneity.

o Neverthless, the evidence coming from this preliminary experimental use of MARS suggests that the bias

related to the heterogenity plays a major role in affecting price changes.

o Next steps:

✓ Carry out sensitivity analysis by varying the weights for the two measures that contribute to the score;

✓ Analyze the role of packaging volumes for the definition of homogeneous products groups. Different

classes of product size should be considered and not only the size of the products as is

Concluding remarks

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT18

o Chessa A. (2016), A new methodology for processing scanner data in the Dutch CPI, EURONA 1/2016.

o Chessa, A. G. (2021). A Product Match Adjusted R Squared Method for Defining Products with Transaction

Data. Journal of Official Statistics, 37(2), 411-432.

o Dalén, J. (2017). Unit Values and Aggregation in Scanner Data—Towards a Best Practice. In Fifteen Meeting

of the International Working Group on Price Indices, Eltville am Rhein, May.

o De Haan, J., & Krsinich, F. (2014). Transaction data and the treatment of quality change in nonrevisable

price indices. Journal of Business & Economic Statistics, 32(3), 341-358.

o Lamboray C. (2022) What impact does product specification have on a Fisher price index? Paper prepared

for the 17th Meeting of the Ottawa Group on Price Indices, 7-10 June 2022, Rome, Italy

References

DEFINING PRODUCTS WITH TRANSACTION DATA: AGGREGATION METHODS AND ASSESSMENT OF THEIR IMPACT19