Developing reproducible analytical pipelines for the transformation of consumer price statistics: rail fares, UK

7.4 UK_un_systems_railfares_paper.pdf (application/pdf, 399.15 KB)

At the Office for National Statistics, we are transforming our consumer price statistics by introducing alternative data sources such as scanner data. This paper discusses the process of developing a new production system that can integrate these data as part of our consumer price index production. Firstly, we will discuss the choices made for the infrastructure of the project, including choice of platform and language, and how this has been designed to aid research as well as ensuring an efficient and user-friendly production system. We then discuss how best practice guidance for reproducible analytical pipelines (for example, code structure, testing, version control and code deployment) have been implemented in the project. Finally, we focus on more specific areas within the end-to-end system. This includes the steps taken for data engineering, including the standardisation of data. We will also cover the choices we have made to implement these within our existing production round, including the steps needed on an annual basis to reset the basket and allow for the introduction of new consumption segments.