Skip to main content

Identifying and mitigating misclassification: A case study of the Machine Learning lifecycle in price indices with web-scraped clothing data, Canada

Languages and translations
File type1

While the application of Supervised Machine Learning (ML) to automate the classification of alternative data for official price indices has been widely demonstrated, the impact of misclassification within the ML lifecycle, from initial annotation of the training data to retraining models due to data drift, has been understudied in the literature. To support National Statistical Offices in understanding how to apply ML to support at-scale production needs, our research provides an empirical case study of how misclassification could be present at major stages of a ML lifecycle, its impact on elementary price indices and ways it can be mitigated through model retraining or validation processes.