Obtaining prices and product characteristics from the web allows statistical institutes to automatize data collection and to improve temporal coverage as well as representativeness if more products are taken into account in the Consumer Price Index (Eurostat 2020). STATEC has been collecting prices and product characteristics from a major national ecommerce website for household appliances and consumer electronics in a bulk format for over one year using an API and expects to collect data from another similar website in the near future. STATEC is currently using only some of the collected prices by dubbing a manual price collection. The aim of this paper is to present our analysis on finding an appropriate index compilation method that exploits the full database, which includes for each month over 9000 price quotes of various household appliances and consumer electronics. We explain how the assortment and price change dynamics let us conclude that the multilateral GEKS-Jevons method is an appropriate method for some product categories, including small household appliances, but not for all. We propose to improve representativeness of the sample by only taking into account products currently in stock and excluding those that are not in stock although customers can still order them online. We expect to introduce the multilateral GEKS-Jevons method for these products in 2024. We also show that for some product categories, especially large consumer electronics, systematic downward trend in prices combined with high churn rates requires explicit quality adjustment methods. We test several price imputations methods, such as hedonic linear regression based and tree based price imputation methods. We conclude that random forest based price imputation method provides the most accurate results.
Languages and translations