The HLG-MOS project is an important mechanism to accelerate international collaboration on emerging technologies and ideas. The project ideas are generated from horizon scanning throughout the year and discussed at the annual HLG-MOS modernization workshops. The projects are conducted normally with a fixed time frame of 1-2 years.
Participation is open to staff from statistical organisations and others interested in official statistics. Please contact the UNECE secretariat if you wish to participate in the project.
- Generative AI (2024-25)
-
CONTEXT
The capabilities of artificial intelligence (AI) have made a significant leap forward in the last few years with the advance of large language models (LLM) that can process natural language and generate texts, and there is a growing recognition of the transformative potential of LLMs in the statistical community.
Responding to the increasing interest, HLG-MOS modernization groups – the Blue Skies Thinking Network and the Applying Data Science and Modern Methods Group – started an initiative draft a white paper on LLMs in the context of official statistics which was completed in a relatively short period of time of 4-month. The paper (https://unece.org/sites/default/files/2023-11/HLG2023%20LLM%20Paper.pdf) explored the opportunities and implications of LLMs for official statistics, associated risks, and provided recommendations and strategic considerations.
Building on the LLM white paper, the project aims to further investigate the potential of generative AI, a broader category of advanced AI system that encompass LLMs (e.g., image generation), strategic considerations arising when statistical organizations want to use generative AI effectively and responsibly (e.g., governance, open models), as well as identify opportunities to actually co-develop concrete solutions.
PROJECT OBJECTIVES
The project will start with initial scoping, after which, the following three main activities are planned:
- Sharing use cases, experiences and lessons learned. The scope of use cases is not limited to the production area but goes beyond to include other corporate areas such as HR and finance. This activity will help statistical organizations in prioritizing areas most promising and feasible for official statistics;
- Co-development of solution(s) on areas that are of common interest for many statistical organizations (e.g., prompt-engineering, co-piloting, chatbots, enhanced web searches); and
- Compiling practices and concrete recommendations based on the first and second activities as well as the LLM white paper. It is essential to focus on a few key themes that are particularly relevant and important to statistical organizations (e.g., confidentiality, security, quality assurance). The activity could also include a development of common protocols containing requirements from the official statistics perspectives, which can then be used for the engagement with technology companies.
OUTPUT:
- Statistical Open-Source Software (2024)
-
CONTEXT
Given the increasing need to become more open, transparent and efficient, many statistical organizations are undergoing a transition from traditional propriety software to open-source software. This transition, however, has challenges concerning support, maintenance, training, sharing conditions and legal aspects. The topic of open-source was discussed at the 71st CES Plenary Session and the CES Bureau has asked HLG-MOS to work on the topic.
The purpose of the Statistical Open-Source Software (SOSS) project is to develop a better common understanding of the pros and cons, as well as the dos and don’ts of moving forwards to a more comprehensive use of open-source software for official statistics production, with an aim to make it a cornerstone of said production.
PROJECT OBJECTIVES
After a preliminary activity on scoping, the project aims to work on:
- Generic aspects of the systematic use of open source-based approaches for official statistic, covering issues such as the organization of maintenance, support and training; standards and principles; legal aspects and liabilities/responsibilities; licensing models and fair distribution of costs; community building, communication and external engagement (e.g., the scientific community and private sector); and the incubation process (from ideation to production). The user perspective (users of existing open-source solutions) and producer perspective (producers of new open-source solutions) may require different emphasis with regards to these aspects; and
- Analysis of concrete open source-related use cases in the data collection, analysis and processing, and dissemination domains. The use cases to be covered can be determined separately from the above work package or jointly. Findings from the analysis can be used to support the top-down work on open-source technology defined above by suggesting concrete open-source technology and approaches.
- Cloud for Official Statistics (2023)
-
CONTEXT
Many NSOs are adopting Cloud approaches to a varied degree. It can directly contribute to modernising statistical production, and complements themes explored previously/currently under the HLG, such as Big Data, privacy-preserving techniques and governance, and may have synergies with current work on Data Science and Machine Learning. NSOs will benefit from informed approaches.
PROJECT OBJECTIVES
The objective is to develop a set of guidelines and recommendations, across multiple themes, to assist each statistical organisation on their cloud adoption journey. The initial five themes that were identified are:
- Common set of Considerations needed relating to the procurement of Cloud Services, assessing areas such as intellectual property, migration to another provider / vendor lock-in/ exit strategy / terms and conditions.
- Understanding the behavioural nudges needed to adopt cloud. This theme will review indigenous/minority people’s perspectives on Cloud, public perception, Data Sovereignty, challenges relating to convincing an organisation’s executive board to approve the use of Cloud Services, the impact of cloud use on the official statistics brand.
- The types of Cloud service models and services which exist, and which are suitable for organisations in which context. Topics for consideration include Infrastructure as a Service, Platform as a service, Software as a Service, Hybrid Cloud, Public Cloud, Private Cloud
- Explore the security and privacy considerations relating to the use of Cloud which may enhance or inhibit its adoption across statistical organisations
- The skillsets needed for the utilisation Cloud. Topics for review will include staff retraining efforts needed, the challenge for public sector organisations in a competitive marketplace for Cloud skills, and how could knowledge be shared between organisations.
OUTPUT: Cloud for official statistics
- Data Governance for Interoperability Framework (2022-23)
-
CONTEXT
With statistical organisations increasingly engaging with new data sources and accelerating efforts in sharing and re-using data, the governance and management of data have become crucial. The interoperability across different data assets and metadata can greatly facilitate data exchange and help statistical organisations address new data needs (e.g., through data integration). Unfortunately, a large part of information in many statistical organisations is managed and governed in silos, making information semantically and synthetically non-interoperable. There is not enough work done on the governance side (e.g., policy, process, capability) which is indispensable to institutionalise the interoperability across the entire organisation.
PROJECT OBJECTIVES
The main goal of the Statistical Data Governance Framework project is to produce a document describing a reference framework containing the main elements needed to implement a governance program focused on achieving data interoperability. This framework will provide the ability to create, exchange, and use data while preserving its meaning and context independently from a given system or a set of systems. The purpose of this project is to develop a framework describing a set of data governance elements, recommendations, and guidelines to achieve statistical information interoperability.
The main output of the project will be a document containing as its main sections:
- Glossary of core terms that could facilitate the communication and collaboration in the fields of data governance and interoperability
- A framework describing a set of data governance elements required to achieve statistical interoperability which includes (but not limited): organisational elements; data and metadata management; business and legal considerations; data quality; data analysis and dissemination needs; documentation and transparency; capabilities, culture and skills, and Information Technologies aspects
- Recommendations and guidelines on how to start achieving interoperability in statistical organisations and national statistical systems; in particular how we can apply the existing models and standards (e.g., GSBPM, GSIM, CSPA, SDMX, DDI, CSDA, COOS, FAIR) to achieve interoperability
OUTPUT: Data Governance Framework for Statistical Interoperability
- ModernStats Carpentries (2023)
-
CONTEXT
The ModernStats Carpentries project draws from the lessons learned in the context of the 2022 Meta Academy project (see the short Annex explaining the main takeaways from the Meta Academy project, including a definition of the gaps and potential capabilities that would make for a Meta Academy, and how the Carpentries initiative is well positioned to support them).
PROJECT OBJECTIVES
The purpose of the project is to pilot a partnership with the Carpentries organization to create the ModernStats Carpentry. The Carpentries are a non-profit organization, registered in the US, funded by membership and workshop fees, and grants from donors. Their vision is to be the “leading inclusive community teaching data and coding skills.” In order to engagE with the Carpentries, the HLG-MOS and/or member organizations will need to pay a membership fee; however, in the context of a ModernStats Carpentry, participating organisations could organise as many trainings as they wish (within national context or in the context of a cross-national initiative) at no fee; also, all Carpentries contents and training materials (data samples, codes, documents, etc.) are open and free under CC license, stored in the open Github platform, and can be reused at no cost
The Carpentries business model addresses several of the needs identified in the Meta Academy project in the following ways:
A common understanding of the training needs, a shared methodology or pedagogic approach to create learning content: A forum or community for ‘academy managers’ or ‘trainers’-
WP 1 will focus on repurposing existing Carpentries content for select key personas within statistical agencies as well as exploring how to put traditional official statistics courses into the Carpentries framework. WP2 will explore membership, collaboration and organizational models between the HLG-MOS and the Carpentries.
- A forum and method to ensure training content and delivery evolve with the industry.
OUTPUT
-
- Meta-Academy for the Modernization of Official Statistics (2022)
-
CONTEXT
Moving from innovation to implementation keeps being a major challenge. The purpose of the Meta-Academy for the Modernisation of Official Statistics is to remove barriers to co-creation of training and reuse of content at an international level, which will ultimately unleash the creation and use, at scale, of open digital assets to boost the National Statistical Office (NSO) upskilling necessary for modernization.
PROJECT OBJECTIVES
This project intends to raise the standards of virtual learning on topics necessary for the modernization of statistics but are missing or inconsistent from academic, commercial or in-house offerings. The meta-academy project sets out to create a benchmark to better map existing initiatives and offerings in order to better coordinate efforts, reduce duplication and fill in training gaps. This project will facilitate sharing of skills strategies, as well as catalogues of contents and pedagogical artefacts, and more generally good practices and standards in that space, so that scopes for reuse or co-creation in learning capabilities can be more easily and more systematically spotted and leveraged by all NSOs.
- WP1: Benchmarking
- WP2: Co-create capacity building content
- WP3: Finalizing the framework for virtual learning, co-creating and reusing content
OUTPUT
- Input Privacy-preserving Techniques (2020-22)
-
CONTEXT
The 2021 project on input privacy-preserving techniques (IPPT), proved that such techniques can play an important role in making external data sources accessible when there are confidentiality concerns. This allows for analysing or integrating external data sources and producing statistics without revealing the microdata to the external partner. It was concluded that a continued collaboration was needed to further develop the performed experiments and to better understand the environment that is required for IPPT as well as to get a better understanding of the methodological challenges.
PROJECT OBJECTIVES
The objective is to expand and continue the existing collaboration between the involved participating organizations and to further explore and broaden the applicability of input privacy preservation techniques. This will allow NSOs to become part of or leading in data ecosystems by allowing the use of private data between NSOs and, more generally, between organizations.
- WP1. Deepening practical experiments
- WP2. Document use cases and provide guidelines for implementation
- WP3. Create user community
OUTPUT: Input Privacy-Preservation for Official Statistics Project outcome (wiki)
- Synthetic Data Guide (2021)
-
CONTEXT
Data has become a valuable commodity, providing information for statisticians, economists, and data scientists to generate more timely and granular insights. National statistical offices (NSOs) are striving to provide greater transparency and openness and so are looking to expand safely sharing of data, expertise and best practices both internally as well as with external partners. In addition, different types of users are increasingly searching for quality data sets to support testing, evaluation, education and development purposes. These aspects provide more value to users and bring the need to uphold data integrity and confidentiality to the forefront.
The demands for timely, integrated data compiled from ever-growing sources of increased complexity, along with the unequivocal commitment to trusted data protection call for a modernized, interoperable approach to mobilizing these large and complex data sources. Synthetic data can be a solution to providing rich data while respecting integrity and confidentiality imperatives.
PROJECT OBJECTIVES
The 'practical guide to Synthetic Data’ project sets out to develop a hands-on guide for creating and using synthetic data primarily geared towards data protection and disclosure control. The target audience of this guide includes NSOs as well as their clients such as academia, the private sector and the general public. The guide will focus on how to use synthetic data in practical applications, considerations for implementation, and important aspects to share with users. This guide can serve as the foundation for future standards as synthetic data is more broadly adopted within NSOs and by their users.
The project is divided into four work packages, with the scoping work already completed through the Working Group on Synthetic Data.
- WP1: Use cases for synthetic data
- WP2: Recommended methods for creating synthetic data
- WP3: Utility and Disclosure Risk Measures
- WP4: Experimenting with the recommendations
OUTPUT: Synthetic Data for Official Statistics - A Starter Guide
- Machine Learning (2019-20)
-
CONTEXT
The interest in the use of Machine Learning (ML) for official statistics is rapidly growing. For the processing of some secondary data sources (including administrative sources, big data and Internet of Things) it seems essential to look into opportunities offered by modern ML techniques, while also for primary data ML techniques might offer added value, as illustrated in the ML position paper mentioned above. Although ML seems promising there is only limited experience with concrete applications in the UNECE statistical community, and some issues relating to e.g. quality and transparency of results obtained from ML still have to be solved. The second year of the Machine Learning Project
PROJECT OBJECTIVES
Based on mutual interest and building on existing national developments, the objective of the project is to advance the research, development and application of machine learning techniques (ML) to add value (relevance, timeliness, quality, efficiency) to the production of official statistics. To achieve this objective the Machine Learning (ML) will aim in year two, to:
- Report on the various Pilot Studies to demonstrate the value-added of ML.
- Identify and share best practices in the implementation of ML techniques.
- Share knowledge, tools and best practices on implementing the ML techniques, and how National Statistical Organisations (NSOs) are organized to move them quickly to the production processes.
- Propose a quality framework components for evaluating ML processes and statistics produced using them, as well as to bridge the gap between these components and those in existing frameworks.
OUTPUT:
- Strategic Communication (2018-19)
-
CONTEXT
Within the context of today’s ever-changing data environment, many statistical organizations are in the process of developing or reviewing their strategic objectives and their business models – leading to the articulation or a review of their mission and/or vision statements. More and more statistical organizations are involved in government-wide data strategy formulation. For statistical organizations to become strategic partners in the development of a national data strategy and for the successful development of a solid business model or the transition to a new business model, the vision must resonate with staff at all levels. For mission and vision statements to resonate with employees, staff need to be engaged.
PROJECT OBJECTIVES
The objective of the Strategic Communication Framework Project is to guide statistical offices in the development of a strategic approach to protect, enhance and promote the organization’s reputation and brand. Phase 2 of the Project will build on the experience and momentum gained in Phase 1 and will focus on developing a strategic approach to internal communications and stakeholder management/analysis in support of two priority topics for 2019 identified by HLG-MOS - Communicating our value and Setting the vision. It will also explore the experience of national statistical organizations in the development of government-wide data strategies in support of a third HLG priority – National Data Strategies.
The project will focus on:
- Developing organizational vision and strategic staff engagement strategies
- Developing effective stakeholder engagement management strategies
- Statistical organizations engagement in Government-wide data strategies
- Data Architecture (2017-18)
-
CONTEXT
Statistical organisations deal with many different data sources – each with their own set of characteristics. Statistical organisations need to find, acquire and integrate data from both traditional and new types of data sources in an ever increasing pace and under ever stricter budget constraints, while taking care of security and data ownership. The 2017 HLG-MOS Data Architecture project developed the first version of the Common Statistical Data Architecture (CSDA). This Reference Architecture is a template for NSOs in the development of their own Enterprise Data Architectures. The project will focus on providing a more robust version of the Common Statistical Data Architecture as a result of validation against a number of use-cases and integration with the outcomes from other related groups. It will also provide guidance on implementing the architecture.
PROJECT OBJECTIVES
The objectives of this project are:
- To complete the development of the Common Statistical Data Architecture, testing the reference architecture defined in 2017 against other use-cases
- To apply and validate the Data Architecture against the outcomes from other groups like UN-GWG, Data Integration project and groups working on statistical ontologies.
- To provide guidelines to support statistical organisations in using the Common Statistical Data Architecture.
- Implementing ModernStats Standards (2016)
-
CONTEXT
HLG-MOS has been jointly developing common models and vocabularies to prevent each organization from developing their own using different vocabularies for the same concepts . Linked open metadata provides the next step. Instead of each organization having to maintain and update their individual vocabularies, this would be made available and managed in a centralized way. This not only reduces costs but also prevents discrepancies in structural and reference metadata and semantic heterogeneity.
PROJECT OBJECTIVES
The main objective of the project is to demonstrate the usefulness of linked metadata for the statistical community and to acquire hands-on experience in that field. It is proposed to fulfil this objective by constructing two concrete examples of linked metadata-based information systems: one aimed at improving the way that we disseminate core structural metadata, the other at supporting the advancement of the HLG vision by creating an harmonized and semantically enhanced information system grouping the main CSPA models and standards in a coherent and machine-actionable form. This will be achieved through three Work Packages:
- WP 1: Build a dissemination system for core structural metadata
- WP 2: Build an information system supporting the HLG vision
- WP 3: Project evaluation and sustainability plan
OUTPUT: Modernisation Maturity Model and the Roadmap for Implementing Modernstats Standards (wiki)
- Data Integration (2016-17)
-
CONTEXT
There are many new opportunities created by data sources such as Big Data and Administrative data. These sources have the potential to provide more timely, more disaggregated statistics at higher frequencies than traditional survey and census data.
It is clear that NSOs are challenged by the capacities needed to incorporate new data sources in their statistical production process while at the same time companies have appeared exploiting these new sources to provide alternative statistics. If official statistics can't find an answer to this, we are at risk of losing our unique position. We can, however, join forces and keep or even increase our value proposition by providing relevant, reliable and comparable data of high quality. NSOs are particularly well placed to integrate data from various sources and to use them to satisfy the needs of policy makers and other partners for data. It is thus time to intensify our efforts and commence working on it within the framework of an HLG project.
PROJECT OBJECTIVES
For 2017, the project proposes to develop an online, adaptive, practical guide to Data Integration for Official Statistics which supports successful data integration projects; using lessons learnt within the project and in related work. Furthermore, to undertake more joint experiments in high priority practical interest areas. The project has identified a number of areas where working together should bring faster results than working alone. The following activities were identified:
- WPA: Develop an online, adaptive, practical guide to Data Integration for Official Statistics
- WP0-5: Further work on joint experiments in priority areas:
- WP0: Data sets for common approaches
- WP1: Integrating Survey and Administrative Sources
- WP2: New data sources (such as big data) and traditional sources
- WP3: Integrating geospatial and statistical information
- WP4: Micro-Macro integration
- WP5: Validating Official Statistics
- Align approaches for applying new data sources to integrated price measurement (WP0 and WP2)
- Create synthetic datasets for sandbox experiments (WP0)
- Develop practical guidance for integrating survey, administrative data and big data (including case studies) (WP1 and WP2)
- Develop practical guidance on integrating geospatial and statistical information (WP3)
- Develop practical guidance on using additional sources to validate official statistics (WP5)
OUTPUT
- Big Data (2014-15)
-
CONTEXT
To built on the momentum gained during the 2014 project a common shared Sandbox Computing environment was proposed to engage in collaborative research activities using various Big Data sources. Continuation of the experiments started in 2014 will allow to consolidate the technical skills. It will allow to test the production of multi-national statistics only basing on Big Data sources in a common environment.
PROJECT OBJECTIVES
The main goals of the 2015 project are:
- Publish a set of international statistics based on Big Data, before the end of the year
- Conclude 2014 experiments on the sandbox
- Testing new models of partnership
OUTPUT
- CSPA Implementation (2014-15)
-
CONTEXT
A review of the 2014 CSPA project has identified that the technical implementation governance and support is a significant area for improvement, the AWG is proposing a HLG project for 2015 which would see the expansion of the role of the governance and support offered by AWG to cover implementation and the establishment of a Technical Coordination Committee to support NSI’s and NSO’s who are developing or implementing CSPA compliant statistical services.
PROJECT OBJECTIVES
The project has three main objectives:
- To extend the governance and support offered by the AWG to the implementation of CSPA compliant statistical services.
- To establish and maintain a new Technical Coordination Committee which will provide full technical guidance to implementing organizations and put in place technical implementation communities.
- To facilitate the transitioning of CSPA governance from HLG project governance arrangements to the Modernization Committee for Production and Methods, currently this is an identified risk.
OUTPUT: CSPA (wiki)
- Previous projects
-
- Frameworks and Standards (2013)
- CSPA Development (2013)
- GSIM Development (2012)