Skip to main content

Germany

Who-to-whom matrices (ECB)

Who-to-whom matrices (European Central Bank)

  1. Who-to-whom concept
  2. Who-to-whom: main data sources
  3. Compilation of who-to-whom: two cases
  4. Who-to-whom balancing in practice
  5. Main features euro area/national who-to-whom tables
  6. Data access and visualisation
  7. Exercise 1 & 2
Languages and translations
English

www.ecb.europa.eu ©

Who-to-whom matrices

Pierre Sola European Central Bank

Workshop on Financial Accounts 9 to 11 October 2023

9 to 11 October 2023 - Brussels

This document should not be reported as representing the views of the European Central Bank (ECB). The views expressed are those of the authors and do not necessarily reflect those of the ECB.

www.ecb.europa.eu ©

Overview

2

1

2 Who-to-whom: main data sources

3 Compilation of who-to-whom: two cases

“Who-to-whom” concept

5 Main features of euro area/national who-to-whom tables

6 Data access and visualisation

7 Exercises

4 Who-to-whom balancing in practice

www.ecb.europa.eu ©

1. Who-to-whom concept

3

Financial accounts basic data show total assets and liabilities by sector, instrument by instrument:

S1 S11 S12K S124 S12O S128 S129 S13 S1M S2 S1 S11 S12K S124 S12O S128 S129 S13 S1M S2

F1 Monetary gold and SDRs F2 Currency and deposits F3 Debt securities F4 Loans F51 Equity F52 Investment fund shares F62 Life insurance F6O Standardized guarantees F6N Pension schemes F7 Derivatives

F8 Other accounts, trade credit

ASSETS LIABILITIES

S1: all resident sectors; S11: non-financial corporations; S12K Monetary Financial Institutions; S124: investment funds; S12O: other financial institutions; S128: insurance corporations; S129: pension funds; S13: general government; S1M: households and non-profit institutions serving households; S2: Rest-of-the-world

For each instrument, the sum of assets held by all sectors is equal to the sum of liabilities (in stocks and flow data)

www.ecb.europa.eu © 4

With who-to-whom data, positions and flows (transactions, revaluations, others) are broken down by counterpart sectors

Liabilities -> Assets

S11 S12 S13 S1M S2 Total

S11 S12 S13 S1M S2 Total

1. Who-to-whom concept

www.ecb.europa.eu © 5

Columns break down a sector’s liabilities by counterparty.

Rows break down its assets.

1. Who-to-whom concept

www.ecb.europa.eu © 6

• In other words, who-to-whom data identify creditors (=holders) and debtors (=issuers) simultaneously.

• They therefore provide a complete overview on sectoral interlinkages for the entire economy, consistent with macroeconomic aggregates.

• Only resident counterpart sectors are identified, i.e. non-resident counterparts are aggregated into one sector [which has some drawbacks, in the context of globalisation)]

1. Who-to-whom concept

www.ecb.europa.eu © 7

How useful is who-to-whom • It adds analytical value to the accounts, as showing the relations between

sectors (e.g. MFI lending to NFCs)

• In 2009, the International Monetary Fund (IMF) and the Financial Stability Board (FSB) issued The Financial Crisis and Information Gaps report => to explore information gaps and provide appropriate proposals for strengthening data collection (IMF and FSB, 2009).

This initial Data Gaps Initiative (DGI-I), endorsed by the G-20, comprised 20 recommendations focusing on three key statistical domains: i) the build-up of risks in the financial sector; ii) international financial network connections; and iii) vulnerabilities to shocks.

1. Who-to-whom concept

www.ecb.europa.eu © 8

Role of financing providing sectors in the external financing of euro area NFCs By type of financial instrument (left panel) and as a share in total euro area NFC liabilities (right panel)

(percent; annual data; 2013 to 2019)

Sources: ECB (EEA) and ECB calculations.

0% 5%

10% 15% 20% 25% 30% 35% 40% 45% 50%

D eb

t s ec

ur iti

es Lo

an s

Li st

ed s

ha re

s D

eb t s

ec ur

iti es

Lo an

s Li

st ed

s ha

re s

D eb

t s ec

ur iti

es Lo

an s

Li st

ed s

ha re

s D

eb t s

ec ur

iti es

Lo an

s Li

st ed

s ha

re s

D eb

t s ec

ur iti

es Lo

an s

Li st

ed s

ha re

s D

eb t s

ec ur

iti es

Lo an

s Li

st ed

s ha

re s

D eb

t s ec

ur iti

es Lo

an s

Li st

ed s

ha re

s

MFIs NFCs Other OFIs IFs ICPFs RoW Other

2013 2014 2015 2016

2017 2018 2019

0%

2%

4%

6%

8%

10%

12%

14%

16%

MFIs NFCs Other OFIs IFs ICPFs RoW Other

2013 2014 2015 2016

2017 2018 2019

1. Who-to-whom concept Example of use in monetary analysis: financing of non-financial corporations

www.ecb.europa.eu © 9

Data collection perspective • A new dimension: in business accounting, the institutional sector of the

counterparty is not specified

• From a compilation point of view, it entails a further challenge, but also an opportunity for enhancing quality

• For who-to-whom to be feasible, source data need to keep track of the sector of the counterparty

1. Who-to-whom concept

www.ecb.europa.eu ©

Example: transactions in long term debt securities

S12K: MFIs including Eurosystem; S12O: other financial sub-sectors; S1M households and non-profit institutions serving households

1. Who-to-whom concept

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX S 66.2 -79.0 -1.4 79.1 0.7 0.0 184.5 0.0 388.3 638.4

S11 1.5 2.9 -6.4 0.0 -0.3 0.2 0.0 5.9 0.0 -0.8 0.0 S12K 394.2 72.6 -4.7 0.0 3.6 -1.2 0.0 341.7 0.0 -17.7 -0.1 S124 365.5 19.7 24.8 0.0 32.9 1.5 0.0 -36.1 0.0 322.7 0.0 S12O 50.9 -2.5 0.0 0.0 35.5 0.1 0.0 -24.7 0.0 42.5 0.0 S128 19.8 13.0 -14.4 0.0 -2.4 0.7 0.0 1.4 0.0 21.5 0.0 S129 69.2 1.2 5.7 0.0 -1.3 0.0 0.0 38.4 0.0 25.2 0.0

S13 -26.9 -0.8 -0.7 0.0 -7.8 -0.5 0.0 -14.2 0.0 -2.9 0.0 S1M -91.7 -6.7 -65.5 0.0 -5.9 -0.1 0.0 -11.3 0.0 -2.2 0.0

S2 -144.2 -33.2 -17.8 -1.4 24.8 0.0 0.0 -116.6 0.0 0.0 SX 638.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.1

www.ecb.europa.eu ©

2. Who-to-whom: main data sources

11

Loans (F4) and deposits (F2M) can be obtained to a large extent via who-to-whom data from banks

Actively traded securities, i.e. F3 listed shares, F511 listed shares, F52 investment fund shares, may be obtained from securities holdings statistics and securities issues statistics

Currently, most EU countries show no who-to-who data for other instruments, including: - F21 Currency - F6 insurance, pensions and standardized guarantee schemes - F7 Financial derivatives - F8 Other accounts receivable/payable

www.ecb.europa.eu ©

3. Compilation of who-to-whom: two cases

12

Case 1: full information on bilateral links

Totals are the simple sum of the components

Liabilities -> Assets

S11 S12 S13 S1M S2 Total

S11 S12 S13 S1M S2 Total

This is mostly the case in euro area accounts for deposits and loans.

www.ecb.europa.eu © 13

Case 2: no full information on bilateral links

Sometimes only totals are known for some rows or columns

and/or totals do not come from the same source as components

and/or some bilateral links are missing

=> Need to estimate

3. Compilation of who-to-whom: two cases

www.ecb.europa.eu © 14

This is the case in euro area accounts, covering (i) short and long term debt securities, (ii) listed shares, and (iii) investment funds shares

In particular:

- Full details matching with totals are available only for the MFI sector

- Totals are available for Government assets and liabilities

- Details (from NCB reporting, generally based on Securities Holdings Statistics) are available for other components but they do not necessarily match with the other available totals

3. Compilation of who-to-whom: two cases

www.ecb.europa.eu © 15

4. Who-to-whom balancing in practice

Starting point: already balanced, but still possibly wrong!

CASE 1: totals are the simple sum of the interior

S S11 S12K S124 S12O S128 S129 S13 S1M S2 S 110.3 0.0 -20.5 -139.2 -4.7 5.4 78.3 53.2 -213.7 -131.1

S11 -20.6 13.8 0.0 -0.1 -9.6 0.2 0.0 -1.1 0.1 -24.0 0.0 S12K 20.6 116.0 0.0 -4.6 -70.4 -4.3 -1.3 55.2 48.9 -118.9 0.0 S124 -17.0 -0.9 0.0 0.9 -6.7 0.0 0.0 3.1 1.4 -14.8 0.0 S12O -117.4 6.2 0.0 -2.9 -76.9 1.8 6.5 1.2 1.3 -54.6 0.0 S128 1.7 0.5 0.0 0.0 -0.6 2.9 0.1 -1.2 0.8 -0.8 0.0 S129 1.4 0.3 0.0 0.0 4.5 0.0 0.0 -4.4 0.3 0.7 0.0

S13 41.2 17.8 0.0 0.1 2.2 0.2 0.1 21.7 0.4 -1.4 0.0 S1M 1.3 1.1 0.0 0.0 0.5 -0.1 0.0 -0.1 -0.1 0.0 0.0

S2 -42.2 -44.5 0.0 -14.0 17.8 -5.4 0.0 3.8 0.0 0.0 SX -131.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Loans: wrong version

www.ecb.europa.eu © 16

4. Who-to-whom balancing in practice Be cautious in particular with:

- inter-company loans - sector allocation from each data source (esp. where estimations are made) - instrument allocation: e.g. trade credits versus loans

Loans: improved version S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 145.8 0.0 -22.4 -125.9 -6.4 5.4 82.8 58.1 -152.6 -15.1 S11 16.2 -6.6 0.0 0.1 -23.4 0.3 0.1 -0.9 0.1 46.5 0.0

S12K 15.7 114.3 0.0 -4.5 -70.8 -3.6 -1.3 58.5 49.0 -125.9 0.0 S124 2.6 -0.6 0.0 1.0 -3.9 0.0 0.0 2.8 3.3 0.0 0.0 S12O -89.9 -2.0 0.0 -3.5 -24.3 0.3 6.5 0.9 3.2 -70.9 0.0 S128 4.0 2.8 0.0 -0.1 0.3 2.0 0.1 -1.1 1.8 -1.8 0.0 S129 1.7 0.3 0.0 0.0 4.5 0.0 0.0 -4.3 0.3 0.8 0.0

S13 40.4 17.9 0.0 0.1 0.4 0.1 0.1 22.9 0.3 -1.3 0.0 S1M 0.7 0.4 0.0 0.0 0.4 -0.1 0.0 0.0 0.1 0.0 0.0

S2 -6.6 19.1 0.0 -15.3 -9.1 -5.4 0.0 4.1 0.0 0.0 SX -15.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

www.ecb.europa.eu © 17

S S11 S12K S124 S12O S128 S129 S13 S1M S2

S 18.5 23.0 1.0 -59.1 -0.5 0.0 63.5 0.0 69.5 Liab:116.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 1.4 0.0 0.0 35.7 0.0 30.4 -35.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -81.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 -11.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX Total assets: 55.7 1.2 3.0 0.0 11.2 -0.4 0.0 0.1 0.0 0.0

Sum of interior components: 101.0

4. Who-to-whom balancing in practice

Starting point: components and totals from various sources

CASE 2: totals and interior have different sources

Debt securities

www.ecb.europa.eu © 18

4. Who-to-whom balancing in practice

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -59.1 -0.5 0.0 63.5 0.0 69.5 116.0

S11 0.9

S12K 54.9

S124 37.8

S12O -81.0

S128 20.0

S129 4.5

S13 -4.3

S1M -7.3

S2 30.4

SX 55.7

First step: balancing total assets and liabilities by sector

www.ecb.europa.eu © 19

4. Who-to-whom balancing in practice

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 0.0 0.0 0.0 -20.0 0.0 0.0 0.0 0.0 -15.0 -35.0

S11 0.0

S12K 0.0

S124 0.0

S12O 20.0

S128 0.0

S129 0.0

S13 0.0

S1M 0.0

S2 0.0

SX 20.0

First step: balancing total assets and liabilities by sector – 3 adjustments

www.ecb.europa.eu © 20

4. Who-to-whom balancing in practice First step: balancing total assets and liabilities by sector – 3 adjustments

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -79.1 -0.5 0.0 63.5 0.0 54.5 81.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 1.4 0.0 0.0 35.7 0.0 30.4 -35.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -61.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 9.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX 75.7 1.2 3.0 0.0 -8.8 -0.4 0.0 0.1 0.0 -15.0

www.ecb.europa.eu © 21

4. Who-to-whom balancing in practice First step: balancing total assets and liabilities by sector – 3 adjustments

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -79.1 -0.5 0.0 63.5 0.0 54.5 81.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 1.4 0.0 0.0 35.7 0.0 30.4 -35.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -61.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 9.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX 75.7 1.2 3.0 0.0 -8.8 -0.4 0.0 0.1 0.0 -15.0

www.ecb.europa.eu © 22

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 0.0 0.0 0.0 -20.0 0.0 0.0 0.0 0.0 -15.0 -35.0

S11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S12K 0.0 0.0 0.0 0.0 -15.0 0.0 0.0 0.0 0.0 -15.0 30.0

S124 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S12O 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0

S128 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S129 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S1M 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

SX 20.0 0.0 0.0 0.0 -5.0 0.0 0.0 0.0 0.0 0.0

4. Who-to-whom balancing in practice Second step: adjustments within the matrix, where main gaps are identified

www.ecb.europa.eu © 23

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -79.1 -0.5 0.0 63.5 0.0 54.5 81.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 -13.6 0.0 0.0 35.7 0.0 15.4 -5.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -61.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 9.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX 75.7 1.2 3.0 0.0 6.2 -0.4 0.0 0.1 0.0 0.0

4. Who-to-whom balancing in practice Outcome of the second step: only small gaps remain

www.ecb.europa.eu © 24

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S -0.1 0.0 0.0 -0.7 0.0 0.0 0.0 0.0 -1.1 -1.9

S11 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.6 -1.3

S12K 0.0 0.0 0.0 0.0 -0.8 0.0 0.0 -4.1 0.0 -0.8 5.7

S124 1.4 0.3 0.3 0.0 0.2 0.0 0.0 0.8 0.0 -0.1 0.0

S12O 0.1 0.2 0.4 0.0 7.1 -0.4 0.0 0.8 0.0 1.0 -9.0

S128 0.7 0.5 0.6 0.0 -0.1 0.0 0.0 1.3 0.0 -1.6 0.0

S129 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 -0.1 0.0

S13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.1

S1M 0.0 0.0 0.3 0.0 -0.1 0.0 0.0 -0.1 0.0 -0.1 0.0

S2 1.1 0.1 1.4 0.0 -0.9 0.0 0.0 0.5 0.0 0.0

SX 3.4 -1.2 -3.0 0.0 -6.2 0.4 0.0 -0.1 0.0 0.0

4. Who-to-whom balancing in practice Third step: automated adjustments to close remaining discrepancies

www.ecb.europa.eu © 25

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.4 23.0 1.0 -79.7 -0.5 0.0 63.5 0.0 53.4 79.1

S11 0.9 -0.7 0.0 0.0 1.0 0.0 0.0 1.1 0.0 -0.4 0.0

S12K 54.9 6.8 16.3 0.0 -14.4 0.0 0.0 31.6 0.0 14.6 0.0

S124 39.2 8.5 9.0 0.0 2.7 -0.2 0.0 -0.1 0.0 19.3 0.0

S12O -60.9 1.3 -3.1 0.0 -80.7 -0.2 0.0 6.5 0.0 15.3 0.0

S128 20.8 0.5 3.4 0.0 6.2 -0.1 0.0 8.3 0.0 2.4 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.6 0.0 1.5 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.0 0.0 -0.8 0.0

S1M -7.3 -0.3 -4.4 0.0 -1.3 0.0 0.0 -2.7 0.0 1.5 0.0

S2 31.4 1.7 0.5 1.0 8.0 0.0 0.0 20.2 0.0 0.0 0.0

SX 79.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

4. Who-to-whom balancing in practice Outcome of the third step: balanced who-to-whom table

Remark: some algorithms can also be used to balance matrices – with some caution

www.ecb.europa.eu © 26

4. Who-to-whom balancing in practice CASE 3: balancing price/other changes

Listed shares

Consider ratio price change/initial position

But take into account possible wrong allocation of volume changes

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX S 716.9 58.7 0.0 123.9 21.3 0.0 0.0 0.0 440.4 1,361.2

S11 181.4 176.2 1.2 0.0 2.7 1.3 0.0 0.0 0.0 0.0 0.0 S12K 18.0 7.2 2.4 0.0 3.3 0.9 0.0 0.0 0.0 4.2 0.0 S124 485.1 96.4 8.0 0.0 6.0 5.5 0.0 0.0 0.0 369.2 0.0 S12O 63.8 35.9 7.3 0.0 18.2 1.4 0.0 0.0 0.0 1.0 0.0 S128 19.4 9.9 0.5 0.0 1.8 0.8 0.0 0.0 0.0 6.3 0.0 S129 31.2 5.8 0.3 0.0 0.6 0.2 0.0 0.0 0.0 24.3 0.0

S13 41.1 34.1 2.4 0.0 1.2 0.4 0.0 0.0 0.0 3.0 0.0 S1M 131.1 74.4 9.1 0.0 4.9 10.4 0.0 0.0 0.0 32.3 0.0

S2 390.2 276.9 27.7 0.0 85.4 0.2 0.0 0.0 0.0 0.0 0.0 SX 1,361.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

POSITIONS S S11 S12K S124 S12O S128 S129 S13 S1M S2 S 4,986.7 386.1 0.0 1,037.3 177.4 NA 0.2 NA 2,958.3

S11 1,514.0 1157.7 29.5 0.0 54.8 9.7 NA 0.0 NA 105.2 S12K 247.4 137.1 19.6 0.0 27.2 7.9 NA 0.0 NA 232.1 S124 3,195.6 764.7 72.0 5.0 68.8 37.8 NA 0.0 NA 2168.0 S12O 867.1 223.8 24.5 0.0 114.1 5.3 NA 7.8 NA 388.6 S128 182.9 101.6 4.6 0.0 14.6 20.1 NA 0.0 NA 54.5 S129 217.9 49.8 2.3 0.0 3.9 1.6 NA 0.2 NA 167.3

S13 295.9 240.4 25.0 0.0 9.4 3.3 NA 0.0 NA 16.0 S1M 891.2 570.8 53.1 0.0 84.2 31.0 NA 0.0 NA 174.6

S2 2,729.4 1830.5 132.4 0.0 548.3 23.1 NA 0.1 NA PERCENTAGES S S11 S12K S124 S12O S128 S129 S13 S1M S2

S 14.4 15.2 0.0 11.9 12.0 NA 0.0 NA 14.9 S11 12.0 15.2 3.9 NA 4.9 13.5 NA -62.3 NA 0.0

S12K 7.3 5.3 12.0 NA 12.1 11.3 NA 18.8 NA 1.8 S124 15.2 12.6 11.0 0.0 8.7 14.6 NA 0.4 NA 17.0 S12O 7.4 16.0 29.7 NA 16.0 25.9 NA 0.0 NA 0.3 S128 10.6 9.8 12.0 NA 12.0 4.2 NA NA NA 11.6 S129 14.3 11.7 12.4 NA 14.2 15.0 NA 0.0 NA 14.5

S13 13.9 14.2 9.5 NA 12.5 12.3 NA 0.0 NA 18.9 S1M 14.7 13.0 17.1 NA 5.8 33.7 NA NA NA 18.5

S2 14.3 15.1 20.9 NA 15.6 1.0 NA 0.0 NA

www.ecb.europa.eu ©

5. Main features euro area/national who-to-whom tables

27

Data types Stocks, transactions, other changes

Holder residency Euro area, and 27 EU countries

Holder sectors 11 to 12 sectors (central banks are only available for some instruments)

Issuer residency Euro area / non-euro area

Issuer sectors 10 to 11 sectors for euro area issuers No sector detail for non-euro area issuers

Instruments Securities (except unlisted shares), loans and deposits

Series length 13Q4 to 23Q1 (securities) 99Q1 to 23Q1 (loans and deposits)

Timeliness T+120 (securities – euro area accounts) T+102: country data T+ 94 (deposits and loans – euro area accounts)

www.ecb.europa.eu © 28

Quarterly press release on euro area economic and financial developments by institutional sector - Full release - Annex Table 2.2 (for households) and Table 3.2 (for non-financial corporations) http://www.ecb.europa.eu/press/pr/stats/ffi/html/index.en.html

6. Data access and visualisation

Who-to-whom data lead to a significant increase of data volume. This requires statisticians / institutions to develop data visualisation tools to help the users

www.ecb.europa.eu ©

6. Data access and visualisation

29

Euro Area Accounts Report in SDW: http://sdw.ecb.europa.eu/reports.do?node=1000005335

www.ecb.europa.eu © 30

Which questions may be answered by who-to-whom data?

• Did Government lend much to NFCs in 2020?

• Did Government issue large amount of debt securities?

• Did NFCs obtain much financing from non-banking financial institutions?

• Did Households increase their deposits in 2020?

7. Exercise 1

www.ecb.europa.eu ©

Transactions in long term debt securities 7. Exercise 2

S11: NFCs; S12K: Banking sector including Central Bank; S124: Investment Funds; S12O: Other financial sub-sectors; S128: Insurance corporations; S129: Pension Funds; S13 Government; S1M Households and non-profit institutions serving households; S2: RoW

S S11 S12K S124 S12O S128 S129 S13 S1M S2

S 64.0 -78.4 -1.2 84.3 0.7 0.0 186.1 0.0 363.0 S11 -10.2 3.0 -6.4 0.0 -1.5 0.2 0.0 1.1 0.0 -6.6

S12K 410.1 70.8 -4.1 0.0 32.5 -1.0 0.0 332.9 0.0 -21.0 S124 348.0 16.0 31.6 0.0 30.9 1.5 0.0 -47.2 0.0 315.1 S12O 32.6 2.7 1.3 0.0 33.8 -0.1 0.0 -42.1 0.0 36.9 S128 10.7 8.3 -24.8 0.0 -1.8 0.8 0.0 11.9 0.0 16.3 S129 70.6 1.2 5.6 0.0 -1.2 0.0 0.0 39.8 0.0 25.1

S13 -26.5 -1.0 -0.7 0.0 -7.1 -0.4 0.0 -15.3 0.0 -1.9 S1M -80.9 -3.1 -79.7 0.0 -1.1 -0.3 0.0 4.1 0.0 -0.8

S2 -135.9 -34.1 -1.3 -1.2 -0.3 0.0 0.0 -99.1 0.0 0.0

www.ecb.europa.eu © 32

Questions on the table in the previous slide: • Which sectors have been the main net buyers of debt securities, and which

were the (euro area) sectors from whom they bought?

• Conversely, which sectors have been net sellers of debt securities over this period?

• Have net sellers sold to net buyers?

• How much was issued over this period by euro area residents?

• How large were the purchases of euro area debt securities by euro area residents?

7. Exercise 2

www.ecb.europa.eu © 33

Answers:

• Banks (including central banks) have purchased high amounts of securities, issued mainly by government.

• Investment funds (S124) purchased a lot of long term debt securities, mainly issued by non-euro area residents (S2)

• Conversely, non-euro area investors and households (S1M) have been net sellers of debt securities over this period.

7. Exercise 2

www.ecb.europa.eu © 34

Answers:

• Non-residents sold mainly government debt securities, while households sold mainly securities issued by the banking sector.

• However, we do not know which sectors “transacted” with whom

• Total net issues of debt securities by euro area residents reached EUR 255 billion, while net purchases by residents of securities issued by euro area residents reached EUR 391 billion

7. Exercise 2

www.ecb.europa.eu © 35

  • Who-to-whom matrices
  • Overview
  • 1. Who-to-whom concept
  • 1. Who-to-whom concept
  • 1. Who-to-whom concept
  • 1. Who-to-whom concept
  • 1. Who-to-whom concept
  • Slide Number 8
  • 1. Who-to-whom concept
  • 1. Who-to-whom concept
  • 2. Who-to-whom: main data sources
  • 3. Compilation of who-to-whom: two cases
  • 3. Compilation of who-to-whom: two cases
  • Slide Number 14
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 4. Who-to-whom balancing in practice
  • 5. Main features euro area/national who-to-whom tables�
  • 6. Data access and visualisation�
  • 6. Data access and visualisation�
  • 7. Exercise 1
  • 7. Exercise 2
  • 7. Exercise 2
  • 7. Exercise 2
  • 7. Exercise 2
  • Slide Number 35
Russian

www.ecb.europa.eu ©

Матрицы «от кого к кому»

Пьер Сола Европейский центральный банк

Рабочее совещание по финансовым счетам9 to 11 October 2023

9 – 11 октября 2023 г. – Брюссель

Настоящий документ не должен рассматриваться как отражающий точку зрения Европейского центрального банка (ЕЦБ). Высказанные мнения принадлежат авторам и не обязательно отражают точку зрения ЕЦБ.

www.ecb.europa.eu ©

Резюме

2

1

2 «От кого к кому»: основные источники данных

3 Составление «От кого к кому»: два случая

Концепт «От кого к кому»

5 Основные особенности таблиц зоны евро / национальных таблиц «От кого к кому»

6 Доступ к данным и визуализация

7 Упражнения

4 Балансирование «От кого к кому» на практике

www.ecb.europa.eu ©

1. Концепт «От кого к кому»

3

Основные данные финансовых счетов показывают суммарные активы и обязательства по секторам, инструменты по их типам:

S1: все секторы резиденты; S11: нефинансовые корпорации; S12K: денежно-кредитные финансовые учреждения; S124: инвестиционные фонды; S12O: другие финансовые учреждения; S128: страховые корпорации; S129: пенсионные фонды; S13: органы государственного управления; S1M: домашние хозяйства и некоммерческие организации, обслуживающие домохозяйства; S2: остальной мир

По каждому инструменту сумма активов, принадлежащих всем секторам, равна сумме обязательств (в данных по запасам и потокам)

S1 S11 S12K S124 S12O S128 S129 S13 S1M S2 S1 S11 S12K S124 S12O S128 S129 S13 S1M S2

F1 Монетарное золото и СПЗ F2 Наличная валюта и депозиты F3 Долговые ценные бумаги F4 Ссуды F51 Акционерный капитал F52 Аакции инвестиционных фондов F62 Срахование жизни F6O Стандартизированные гарантии F6N Пенсионные програмы F7 Производные финансовые

инструменты F8 Прочая задолженность,

коммерческие кредиты и авансы

АКТИВЫ ОБЯЗАТЕЛЬСТВА

www.ecb.europa.eu © 4

С помощью данных «от кого к кому» позиции и потоки ( транзакции, переоценки и др.) разбиваются по секторам-контрагентам

1. Концепт «От кого к кому»

Обязательства -> Активы

S11 S12 S13 S1M S2 Всего

S11 S12 S13 S1M S2 Всего

www.ecb.europa.eu © 5

В столбцах представлены обязательства сектора в разбивке по контрагентам. Строки в разбивке по их активам.

1. Концепт «От кого к кому»

www.ecb.europa.eu © 6

• Иными словами, данные «от кого к кому» позволяют одновременно идентифицировать кредиторов (=держателей) и должников (=эмитентов).

• Таким образом, они дают полную картину секторальных взаимосвязей для всей экономики, соответствующую макроэкономическим агрегатам.

• При этом идентифицируются только сектора-контрагенты являющиеся резидентами, т.е. нерезидентные контрагенты объединяются в один сектор [что имеет ряд недостатков в условиях глобализации].

1. Концепт «От кого к кому»

www.ecb.europa.eu © 7

Насколько полезно «от кого к кому» • Он придает аналитическую ценность отчетности, поскольку показывает

взаимосвязи между секторами (например, ДФУ кредитирует НФК)

• В 2009 году Международный валютный фонд (МВФ) и Совет по финансовой стабильности (СФС) выпустили доклад «Финансовый кризис и информационные пробелы» => с целью изучения информационных пробелов и выработки соответствующих предложений по усилению сбора данных (IMF and FSB, 2009).

Эта первоначальная Инициатива по устранению пробелов в данных (DGI-I), одобренная G-20, включала 20 рекомендаций, сосредоточенных на трех ключевых областях статистики: i) нарастание рисков в финансовом секторе; ii) связи международных финансовых сетей; iii) уязвимость к шокам.:

1. Концепт «От кого к кому»

www.ecb.europa.eu © 8

Роль секторов, предоставляющих финансирование, во внешнем финансировании НФК зоны евро По видам финансовых инструментов (левая панель) и как доля в общем объеме обязательств НФК зоны евро (правая панель)

(процентов; данные за год; 2013 - 2019 гг.)

Источники: ЕЦБ (ЕЭЗ) и расчеты ЕЦБ.

1. Концепт «От кого к кому» Пример использования в монетарном анализе: финансирование

нефинансовых корпораций

0%

2%

4%

6%

8%

10%

12%

14%

16%

ДФУ НФК Другие ДФП ИФ СКПФ ОМ Другие

2013 2014 2015 2016

2017 2018 2019

Presenter Notes
Presentation Notes
ЕЭЗ – Европейская экономическая зона; ИФ – инвестиционные фонды; ДФП - другие финансовые посредники; СКПФ – страховые компании и пенсионные фонды; ОМ – остальной мир;

www.ecb.europa.eu © 9

Перспектива сбора данных • Новое измерение: в финасовом (бухгалтерском) учете

институциональный сектор контрагента не указывается.

• С точки зрения составления данных это создает дополнительные трудности, но также открывает возможности для повышения качества

• Для того чтобы можно было определить «От кого к кому», исходные данные должны содержать информацию о секторе контрагента

1. Концепт «От кого к кому»

www.ecb.europa.eu ©

Пример: операции с долгосрочными долговыми ценными бумагами S S11 S12K S121 S12T S124 S12O S128 S129 S13 S1M S2

S 66.2 -79.0 0.0 -79.0 -1.4 79.1 0.7 0.0 184.5 0.0 388.3 S11 1.5 2.9 -6.4 0.0 -6.4 0.0 -0.3 0.2 0.0 5.9 0.0 -0.8

S12K 394.2 72.6 -4.7 0.0 -4.7 0.0 3.6 -1.2 0.0 341.7 0.0 -17.7 S121 696.7 63.2 61.8 0.0 61.8 0.0 59.6 0.3 0.0 496.5 0.0 15.4 S12T -302.5 9.4 -66.5 0.0 -66.5 0.0 -56.0 -1.5 0.0 -154.8 0.0 -33.1 S124 365.5 19.7 24.9 0.0 24.9 0.0 32.9 1.5 0.0 -36.1 0.0 322.7 S12O 50.9 -2.5 0.0 0.0 0.0 0.0 35.5 0.1 0.0 -24.7 0.0 42.5 S128 19.8 13.0 -14.4 0.0 -14.4 0.0 -2.4 0.7 0.0 1.5 0.0 21.5 S129 69.2 1.2 5.7 0.0 5.7 0.0 -1.3 0.0 0.0 38.4 0.0 25.2

S13 -26.9 -0.8 -0.7 0.0 -0.7 0.0 -7.8 -0.5 0.0 -14.2 0.0 -2.9 S1M -91.7 -6.7 -65.5 0.0 -65.5 0.0 -6.0 -0.1 0.0 -11.3 0.0 -2.2

S2 -144.2 -33.2 -17.8 0.0 -17.8 -1.4 24.8 0.0 0.0 -116.6 0.0 0.0

S12K: ДФУ, включая Eurosystem; S12O: другие финансовые подсектора; S1M Домашние хозяйства и некоммерческие организации, обслуживающие домохозяйства

1. Концепт «От кого к кому»

www.ecb.europa.eu ©

2. «От кого к кому»: основные источники данных

11

Данные о ссудах (F4) и депозитах (F2M) в значительной степени могут быть получены на основе данных банков «от кого к кому»

Активно продаваемые ценные бумаги, т.е. F3 акции, включенные в листинг, F511 акции, включенные в листинг, F52 акции инвестиционных фондов, могут быть получены из статистики владения ценными бумагами и статистики выпусков ценных бумаг

В настоящее время в большинстве стран ЕС отсутствуют данные по другим инструментам, включая: - F21 Наличная валюта - F6 Программы страхования, пенсионного обеспечения и стандартизированных гарантийных схем - F7 Производные финансовые инструменты - F8 Прочая дебиторская/кредиторская задолженность

www.ecb.europa.eu ©

3. Составление «От кого к кому»: два случая

12

Случай 1: полная информация о двусторонних связях

Итоговые показатели представляют собой простую сумму компонентов

В основном это касается счетов зоны евро по депозитам и ссудам.

Обязательства -> Активы

S11 S12 S13 S1M S2 Всего

S11 S12 S13 S1M S2 Всего

www.ecb.europa.eu © 13

Случай 2: отсутствие полной информации о двусторонних связях

Иногда известны только итоговые значения для некоторых строк или столбцов

и/или итоговые значения не из того же источника, что и компоненты

и/или отсутствуют некоторые двусторонние связи

=> Необходима оценка

3. Составление «От кого к кому»: два случая

www.ecb.europa.eu © 14

Это касается счетов зоны евро, охватывающих (i) краткосрочные и долгосрочные долговые ценные бумаги, (ii) акции, внесенные в листинг, и (iii) акции инвестиционных фондов

В частности:

- Полные данные, совпадающие с итоговыми показателями, доступны только для сектора МФО

- Итоговые данные доступны для правительственных активов и обязательств

- По другим компонентам имеются подробные данные (из отчетности НЦБ, как правило, основанные на статистике владения ценными бумагами), но они не всегда совпадают с другими имеющимися итоговыми данными

3. Составление «От кого к кому»: два случая

www.ecb.europa.eu © 15

4. Балансирование «От кого к кому» на практике

Исходная точка: уже сбалансировано, но все еще возможно ошибиться!

СЛУЧАЙ 1: итоговые показатели это простая сумма внутренних компонентов

S S11 S12K S124 S12O S128 S129 S13 S1M S2 S 110.3 0.0 -20.5 -139.2 -4.7 5.4 78.3 53.2 -213.7 -131.1

S11 -20.6 13.8 0.0 -0.1 -9.6 0.2 0.0 -1.1 0.1 -24.0 0.0 S12K 20.6 116.0 0.0 -4.6 -70.4 -4.3 -1.3 55.2 48.9 -118.9 0.0 S124 -17.0 -0.9 0.0 0.9 -6.7 0.0 0.0 3.1 1.4 -14.8 0.0 S12O -117.4 6.2 0.0 -2.9 -76.9 1.8 6.5 1.2 1.3 -54.6 0.0 S128 1.7 0.5 0.0 0.0 -0.6 2.9 0.1 -1.2 0.8 -0.8 0.0 S129 1.4 0.3 0.0 0.0 4.5 0.0 0.0 -4.4 0.3 0.7 0.0

S13 41.2 17.8 0.0 0.1 2.2 0.2 0.1 21.7 0.4 -1.4 0.0 S1M 1.3 1.1 0.0 0.0 0.5 -0.1 0.0 -0.1 -0.1 0.0 0.0

S2 -42.2 -44.5 0.0 -14.0 17.8 -5.4 0.0 3.8 0.0 0.0 SX -131.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Ссуды: неправильная версия

www.ecb.europa.eu © 16

4. Балансирование «От кого к кому» на практике Особенно осторожно следует подходить к: - межфирменных ссуды - распределение по секторам из каждого источника данных (особенно в тех случаях, когда делаются оценки) - распределение инструментов: например, торговые кредиты против ссуд

Ссуды: улучшенная версия S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 145.8 0.0 -22.4 -125.9 -6.4 5.4 82.8 58.1 -152.6 -15.1 S11 16.2 -6.6 0.0 0.1 -23.4 0.3 0.1 -0.9 0.1 46.5 0.0

S12K 15.7 114.3 0.0 -4.5 -70.8 -3.6 -1.3 58.5 49.0 -125.9 0.0 S124 2.6 -0.6 0.0 1.0 -3.9 0.0 0.0 2.8 3.3 0.0 0.0 S12O -89.9 -2.0 0.0 -3.5 -24.3 0.3 6.5 0.9 3.2 -70.9 0.0 S128 4.0 2.8 0.0 -0.1 0.3 2.0 0.1 -1.1 1.8 -1.8 0.0 S129 1.7 0.3 0.0 0.0 4.5 0.0 0.0 -4.3 0.3 0.8 0.0

S13 40.4 17.9 0.0 0.1 0.4 0.1 0.1 22.9 0.3 -1.3 0.0 S1M 0.7 0.4 0.0 0.0 0.4 -0.1 0.0 0.0 0.1 0.0 0.0

S2 -6.6 19.1 0.0 -15.3 -9.1 -5.4 0.0 4.1 0.0 0.0 SX -15.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

www.ecb.europa.eu © 17

S S11 S12K S124 S12O S128 S129 S13 S1M S2

S 18.5 23.0 1.0 -59.1 -0.5 0.0 63.5 0.0 69.5 Обяз:116.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 1.4 0.0 0.0 35.7 0.0 30.4 -35.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -81.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 -11.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX Всего активов: 55.7 1.2 3.0 0.0 11.2 -0.4 0.0 0.1 0.0 0.0

Сумма внутренних компонентов: 101.0

4. Балансирование «От кого к кому» на практике

Исходная точка: компоненты и итоговые показатели из различных источников

ПРИМЕР 2: итоговые и внутренние показатели имеют разные источники

Долговые ценные бумаги

www.ecb.europa.eu © 18

4. Балансирование «От кого к кому» на практике

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -59.1 -0.5 0.0 63.5 0.0 69.5 116.0

S11 0.9

S12K 54.9

S124 37.8

S12O -81.0

S128 20.0

S129 4.5

S13 -4.3

S1M -7.3

S2 30.4

SX 55.7

Первый шаг: балансировка суммарных активов и обязательств по секторам

www.ecb.europa.eu © 19

4. Балансирование «От кого к кому» на практике

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 0.0 0.0 0.0 -20.0 0.0 0.0 0.0 0.0 -15.0 -35.0

S11 0.0

S12K 0.0

S124 0.0

S12O 20.0

S128 0.0

S129 0.0

S13 0.0

S1M 0.0

S2 0.0

SX 20.0

Первый шаг: балансировка суммарных активов и обязательств по секторам – 3 корректировки

www.ecb.europa.eu © 20

4. Балансирование «От кого к кому» на практике Первый шаг: балансировка суммарных активов и обязательств по секторам – 3 корректировки

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -79.1 -0.5 0.0 63.5 0.0 54.5 81.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 1.4 0.0 0.0 35.7 0.0 30.4 -35.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -61.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 9.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX 75.7 1.2 3.0 0.0 -8.8 -0.4 0.0 0.1 0.0 -15.0

www.ecb.europa.eu © 21

4. Балансирование «От кого к кому» на практике Первый шаг: балансировка суммарных активов и обязательств по секторам – 3 корректировки

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -79.1 -0.5 0.0 63.5 0.0 54.5 81.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 1.4 0.0 0.0 35.7 0.0 30.4 -35.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -61.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 9.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX 75.7 1.2 3.0 0.0 -8.8 -0.4 0.0 0.1 0.0 -15.0

www.ecb.europa.eu © 22

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 0.0 0.0 0.0 -20.0 0.0 0.0 0.0 0.0 -15.0 -35.0

S11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S12K 0.0 0.0 0.0 0.0 -15.0 0.0 0.0 0.0 0.0 -15.0 30.0

S124 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S12O 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0

S128 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S129 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S1M 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

S2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

SX 20.0 0.0 0.0 0.0 -5.0 0.0 0.0 0.0 0.0 0.0

4. Балансирование «От кого к кому» на практике Второй этап: корректировка в рамках матрицы, где выявляются основные пробелы

www.ecb.europa.eu © 23

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.5 23.0 1.0 -79.1 -0.5 0.0 63.5 0.0 54.5 81.0

S11 0.9 -0.8 -0.1 0.0 0.9 0.0 0.0 0.4 0.0 -1.0 1.3

S12K 54.9 6.8 16.3 0.0 -13.6 0.0 0.0 35.7 0.0 15.4 -5.7

S124 37.8 8.2 8.7 0.0 2.5 -0.2 0.0 -0.9 0.0 19.4 0.0

S12O -61.0 1.1 -3.5 0.0 -87.8 0.2 0.0 5.7 0.0 14.3 9.0

S128 20.0 0.1 2.8 0.0 6.3 -0.1 0.0 7.0 0.0 4.0 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.5 0.0 1.6 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.1 0.0 -0.8 0.1

S1M -7.3 -0.3 -4.7 0.0 -1.2 0.0 0.0 -2.5 0.0 1.5 0.0

S2 30.4 1.6 -0.9 1.0 8.9 0.0 0.0 19.7 0.0 0.0

SX 75.7 1.2 3.0 0.0 6.2 -0.4 0.0 0.1 0.0 0.0

4. Балансирование «От кого к кому» на практике Итоги второго этапа: остались лишь небольшие пробелы

www.ecb.europa.eu © 24

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S -0.1 0.0 0.0 -0.7 0.0 0.0 0.0 0.0 -1.1 -1.9

S11 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.6 -1.3

S12K 0.0 0.0 0.0 0.0 -0.8 0.0 0.0 -4.1 0.0 -0.8 5.7

S124 1.4 0.3 0.3 0.0 0.2 0.0 0.0 0.8 0.0 -0.1 0.0

S12O 0.1 0.2 0.4 0.0 7.1 -0.4 0.0 0.8 0.0 1.0 -9.0

S128 0.7 0.5 0.6 0.0 -0.1 0.0 0.0 1.3 0.0 -1.6 0.0

S129 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 -0.1 0.0

S13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.1

S1M 0.0 0.0 0.3 0.0 -0.1 0.0 0.0 -0.1 0.0 -0.1 0.0

S2 1.1 0.1 1.4 0.0 -0.9 0.0 0.0 0.5 0.0 0.0

SX 3.4 -1.2 -3.0 0.0 -6.2 0.4 0.0 -0.1 0.0 0.0

4. Балансирование «От кого к кому» на практике Третий этап: автоматическая корректировка для устранения оставшихся расхождений

www.ecb.europa.eu © 25

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX

S 18.4 23.0 1.0 -79.7 -0.5 0.0 63.5 0.0 53.4 79.1

S11 0.9 -0.7 0.0 0.0 1.0 0.0 0.0 1.1 0.0 -0.4 0.0

S12K 54.9 6.8 16.3 0.0 -14.4 0.0 0.0 31.6 0.0 14.6 0.0

S124 39.2 8.5 9.0 0.0 2.7 -0.2 0.0 -0.1 0.0 19.3 0.0

S12O -60.9 1.3 -3.1 0.0 -80.7 -0.2 0.0 6.5 0.0 15.3 0.0

S128 20.8 0.5 3.4 0.0 6.2 -0.1 0.0 8.3 0.0 2.4 0.0

S129 4.5 0.6 0.6 0.0 0.2 0.0 0.0 1.6 0.0 1.5 0.0

S13 -4.3 0.1 0.8 0.0 -1.4 0.0 0.0 -3.0 0.0 -0.8 0.0

S1M -7.3 -0.3 -4.4 0.0 -1.3 0.0 0.0 -2.7 0.0 1.5 0.0

S2 31.4 1.7 0.5 1.0 8.0 0.0 0.0 20.2 0.0 0.0 0.0

SX 79.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

4. Балансирование «От кого к кому» на практике Результат третьего этапа: сбалансированная таблица «от кого к кому»

Замечание: некоторые алгоритмы могут быть использованы и для балансировки матриц – с определенной осторожностью

www.ecb.europa.eu © 26

4. Балансирование «От кого к кому» на практике ПРИМЕР 3: балансирую щая цена/другие изменения

Акции, включенные в листинг

Учитывать соотношение изменение цены/началь ная позиция Но учитывайте возможное неправильное распределение изменений объема

S S11 S12K S124 S12O S128 S129 S13 S1M S2 SX S 716.9 58.7 0.0 123.9 21.3 0.0 0.0 0.0 440.4 1,361.2

S11 181.4 176.2 1.2 0.0 2.7 1.3 0.0 0.0 0.0 0.0 0.0 S12K 18.0 7.2 2.4 0.0 3.3 0.9 0.0 0.0 0.0 4.2 0.0 S124 485.1 96.4 8.0 0.0 6.0 5.5 0.0 0.0 0.0 369.2 0.0 S12O 63.8 35.9 7.3 0.0 18.2 1.4 0.0 0.0 0.0 1.0 0.0 S128 19.4 9.9 0.5 0.0 1.8 0.8 0.0 0.0 0.0 6.3 0.0 S129 31.2 5.8 0.3 0.0 0.6 0.2 0.0 0.0 0.0 24.3 0.0

S13 41.1 34.1 2.4 0.0 1.2 0.4 0.0 0.0 0.0 3.0 0.0 S1M 131.1 74.4 9.1 0.0 4.9 10.4 0.0 0.0 0.0 32.3 0.0

S2 390.2 276.9 27.7 0.0 85.4 0.2 0.0 0.0 0.0 0.0 0.0 SX 1,361.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

ПОЗИЦИИ S S11 S12K S124 S12O S128 S129 S13 S1M S2 S 4,986.7 386.1 0.0 1,037.3 177.4 NA 0.2 NA 2,958.3

S11 1,514.0 1157.7 29.5 0.0 54.8 9.7 NA 0.0 NA 105.2 S12K 247.4 137.1 19.6 0.0 27.2 7.9 NA 0.0 NA 232.1 S124 3,195.6 764.7 72.0 5.0 68.8 37.8 NA 0.0 NA 2168.0 S12O 867.1 223.8 24.5 0.0 114.1 5.3 NA 7.8 NA 388.6 S128 182.9 101.6 4.6 0.0 14.6 20.1 NA 0.0 NA 54.5 S129 217.9 49.8 2.3 0.0 3.9 1.6 NA 0.2 NA 167.3

S13 295.9 240.4 25.0 0.0 9.4 3.3 NA 0.0 NA 16.0 S1M 891.2 570.8 53.1 0.0 84.2 31.0 NA 0.0 NA 174.6

S2 2,729.4 1830.5 132.4 0.0 548.3 23.1 NA 0.1 NA ПРОЦЕНТЫ S S11 S12K S124 S12O S128 S129 S13 S1M S2

S 14.4 15.2 0.0 11.9 12.0 NA 0.0 NA 14.9 S11 12.0 15.2 3.9 NA 4.9 13.5 NA -62.3 NA 0.0

S12K 7.3 5.3 12.0 NA 12.1 11.3 NA 18.8 NA 1.8 S124 15.2 12.6 11.0 0.0 8.7 14.6 NA 0.4 NA 17.0 S12O 7.4 16.0 29.7 NA 16.0 25.9 NA 0.0 NA 0.3 S128 10.6 9.8 12.0 NA 12.0 4.2 NA NA NA 11.6 S129 14.3 11.7 12.4 NA 14.2 15.0 NA 0.0 NA 14.5

S13 13.9 14.2 9.5 NA 12.5 12.3 NA 0.0 NA 18.9 S1M 14.7 13.0 17.1 NA 5.8 33.7 NA NA NA 18.5

S2 14.3 15.1 20.9 NA 15.6 1.0 NA 0.0 NA

www.ecb.europa.eu ©

5. Основные особенности таблиц зоны евро / национальных таблиц «От кого к кому»

27

Типы данных Запасы, транзакции, прочие изменения

Резидентство держателя

Зона евро,и страны ЕС 27

Сектор держателя 11-12 секторов (центральные банки доступны только для некоторых инструментов)

Резидентство эмитента Зона евро / зона не-евро

Сектор эмитента 10-11 секторов для эмитентов зоны евро Для эмитентов, не входящих в зону евро, детализация по секторам отсутствует

Инструменты Ценные бумаги (кроме акций, не включенных в листинг), ссуды и депозиты

Длина серии От 2013Q4 до 2023Q1 (ценные бумаги) От 1999Q1 до 2023Q1 (ссуды и депозиты)

Своевременность T+120 (ценные бумаги - счета зоны евро) T+102: данные по странам T+ 94 (депозиты и ссуды - счета зоны евро)

www.ecb.europa.eu © 28

Ежеквартальный пресс-релиз по экономическим и финансовым изменениям в зоне евро по институциональным секторам - Полный выпуск - Приложение Таблица 2.2 (для домашних хозяйств) и Таблица 3.2 (для нефинансовых корпораций) http://www.ecb.europa.eu/press/pr/stats/ffi/html/index.en.html

6. Доступ к данным и визуализация

Данные «от кого к кому» приводят к значительному увеличению объема данных. Это требует от статистиков / институций разработки средств визуализации данных, которые помогут пользователям

www.ecb.europa.eu ©

6. Доступ к данным и визуализация

29

Отчет по счетам зоны евро в SDW: http://sdw.ecb.europa.eu/reports.do?node=1000005335 ЗОНА ЕВРО Детализация «от кого к кому»

4.1.2 Краткосрочные долговые ценные бумаги в разрезе секторов-контрагентов (млрд. евро в текущих ценах)

1. Транзакции

Presenter Notes
Presentation Notes
SDW - Хранилище статистических данных ЕЦБ

www.ecb.europa.eu © 30

На какие вопросы могут ответить данные «от кого к кому»?

• Много ли правительство предоставило кредитов НФК в 2020 году?

• Выпустило ли правительство большое количество долговых ценных бумаг?

• Получили ли НФК значительный объем финансирования от небанковских финансовых институтов?

• Увеличили ли домохозяйства свои депозиты в 2020 г.?

7. Упражнение 1

www.ecb.europa.eu ©

Операции с долгосрочными долговыми ценными бумагами 7. Упражнение 2

S11: НФК; S12K: Банковский сектор, включая Центральный банк; S124: Инвестиционные фонды; S12O: Другие финансовые подсектора; S128: Страховые корпорации; S129: Пенсионные фонды; S13 Органы госуправления; S1M Домашние хозяйства и некоммерческие организации, обслуживающие домохозяйства; S2: Остальной мир

S S11 S12K S124 S12O S128 S129 S13 S1M S2

S 64.0 -78.4 -1.2 84.3 0.7 0.0 186.1 0.0 363.0 S11 -10.2 3.0 -6.4 0.0 -1.5 0.2 0.0 1.1 0.0 -6.6

S12K 410.1 70.8 -4.1 0.0 32.5 -1.0 0.0 332.9 0.0 -21.0 S124 348.0 16.0 31.6 0.0 30.9 1.5 0.0 -47.2 0.0 315.1 S12O 32.6 2.7 1.3 0.0 33.8 -0.1 0.0 -42.1 0.0 36.9 S128 10.7 8.3 -24.8 0.0 -1.8 0.8 0.0 11.9 0.0 16.3 S129 70.6 1.2 5.6 0.0 -1.2 0.0 0.0 39.8 0.0 25.1

S13 -26.5 -1.0 -0.7 0.0 -7.1 -0.4 0.0 -15.3 0.0 -1.9 S1M -80.9 -3.1 -79.7 0.0 -1.1 -0.3 0.0 4.1 0.0 -0.8

S2 -135.9 -34.1 -1.3 -1.2 -0.3 0.0 0.0 -99.1 0.0 0.0

www.ecb.europa.eu © 32

Вопросы по таблице на предыдущем слайде: • Какие секторы были основными чистыми (нетто) покупателями

долговых ценных бумаг, и у каких секторов (зоны евро) они их покупали?

• И наоборот, какие секторы были чистыми продавцами долговых ценных бумаг в этот период?

• Продавали ли чистые продавцы чистым покупателям?

• Сколько было эмитировано (выпущено) за этот период резидентами зоны евро?

• Насколько велики были покупки долговых ценных бумаг зоны евро резидентами зоны евро?

7. Упражнение 2

www.ecb.europa.eu © 33

Ответы:

• Банки (включая центральные банки) приобрели большое количество ценных бумаг, выпущенных в основном правительством.

• Инвестиционные фонды (S124) приобрели большое количество долгосрочных долговых ценных бумаг, выпущенных в основном резидентами стран не входящих в зону евро (S2).

• И наоборот, инвесторы и домашние хозяйства (S1M), стран не входящих в зону евро, в течение этого периода были чистыми продавцами долговых ценных бумаг.

7. Упражнение 2

www.ecb.europa.eu © 34

Ответы:

• Нерезиденты продавали в основном государственные долговые бумаги, а домохозяйства – ценные бумаги, выпущенные банковским сектором.

• Однако мы не знаем, какие секторы с кем «совершали сделки».

• Общий объем чистых эмиссий долговых ценных бумаг резидентами зоны евро достиг 255 млрд. евро, а чистых покупок резидентами ценных бумаг, выпущенных резидентами зоны евро, - 391 млрд. евро

7. Упражнение 2

www.ecb.europa.eu © 35

  • Матрицы�«от кого к кому»
  • Резюме
  • 1. Концепт «От кого к кому»
  • 1. Концепт «От кого к кому»�
  • 1. Концепт «От кого к кому»�
  • 1. Концепт «От кого к кому»�
  • 1. Концепт «От кого к кому»
  • Slide Number 8
  • 1. Концепт «От кого к кому»�
  • 1. Концепт «От кого к кому»�
  • 2. «От кого к кому»: основные источники данных
  • 3. Составление «От кого к кому»: два случая�
  • 3. Составление «От кого к кому»: два случая
  • Slide Number 14
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике�
  • 4. Балансирование «От кого к кому» на практике�
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 4. Балансирование «От кого к кому» на практике
  • 5. Основные особенности таблиц зоны евро / национальных таблиц «От кого к кому»
  • 6. Доступ к данным и визуализация��
  • 6. Доступ к данным и визуализация��
  • 7. Упражнение 1
  • 7. Упражнение 2
  • 7. Упражнение 2
  • 7. Упражнение 2
  • 7. Упражнение 2
  • Slide Number 35

Overview of the AnigeD Project and Potentials of Dataset Synthetization for Official Statistics and Research, DESTATIS

georeferenced data, integrated data, confidentiality procedures, complex datasets, dataset synthetization

Languages and translations
English

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

Expert Meeting on Statistical Data Confidentiality

26-28 September 2023, Wiesbaden

Overview of the AnigeD Project and Potentials of Data Synthesis for

Official Statistics and Research

Yannik Garcia Ritz, Safiyye Aydin, Jannek Mühlhan, Markus Zwick, (Federal Statistical Office, Germany)

[email protected]

[email protected]

Jannek. Mü[email protected]

[email protected]

Abstract

Statistical Disclosure Control for integrated and georeferenced data is a new challenge for statistical institutes. New digital

data in combination with traditional data offer many new analysis possibilities. Moreover, these complex datasets are

usually georeferenced in a very fine-grained way. Traditional confidentiality procedures reach their limits here. Destatis,

the German Federal Statistical Office, is working together with various universities on the further development of existing

procedures in order to ensure the protection of individuals even for complex data. The lecture will present the first results

of the project "Anonymization for integrated and georeferenced Data" (AnigeD) funded by the German Ministry of

Research.

As part of AnigeD, Destatis further deals with dataset synthetization of population as well as economic statistics to

examine probable potentials for data provision to the research community as answer to the growing scientific interest in

official data. Confidentiality issues lead to several measures to ensure confidentiality of respondent units. Consequently,

there is a diametral relationship between level of anonymity and analytic potential of provided datasets which may not

completely satisfy the needs of the scientific community. Research on data synthetization is currently suspecting synthetic

datasets to be a probable solution to this problem due to their artificial nature. With the following research an official

statistics dataset will be synthesized and evaluated regarding analytical utility as well as the level of confidentiality.

Furthermore, an evaluation regarding the ease of use of certain provision methods of synthetic datasets will be presented.

2

1 Introduction

Data-based information plays a central role in politics, business, science and public life. With digitization and the

exponential growth of stored data, as well as new analytical methods such as machine learning, the possibilities

for evidence-based decision making have expanded and evolved significantly.

The COVID-19 crisis highlighted that many valuable data sets exist in principle, but are often held in a

decentralized manner in different silos by different actors, whether in companies or public institutions. At the

same time, advances in big data, also referred to as non-traditional data, have shown that the greatest value comes

especially when different non-traditional data sources are combined with traditional data, such as surveys and

administrative data. Individual data sets are often only pieces of a puzzle, unable to paint a complete picture.

A key challenge in integrating disparate data sets from different data custodians is the protection of personal

privacy and trade secrets within organizations. This currently hinders both the wider use of data as a product and

the use of integrated data in policy advice and scientific research. Methods for anonymization and statistical

confidentiality face the challenge of finding a compromise. On the one hand, they need to protect the information

of the data subjects, while on the other hand, the chosen methods should still offer sufficient analysis and

information potential for the anonymized data. Anonymization and confidentiality of individual data go hand in

hand with information reduction.

In the past it has been shown that common anonymization strategies for individual data in economic statistics led

to de facto or absolutely anonymized data sets, which were severely limited for scientific analyses due to the

reduced or even distorted information potential. Anonymization and pseudonymization of data, which limits the

risk of detection to an acceptable level while preserving sufficient analytical potential, is therefore essential for

wider use and value creation.

The AnigeD competence cluster is part of the "Research Network Anonymization for Secure Data Use" of the

German Federal Ministry of Education and Research (BMBF) within the framework of the Federal Government's

IT security research program "Digital. Secure. Sovereign". It is funded by the European Union –

NextGenerationEU. The thematic focus, which is supported by various research strands, is the further and new

development of strategies for the protection of personal and company-related data when using complex integrated

data sets. Not only the integration of different data via direct identifiers or probabilities is relevant, but also the

integration and linking of data via regional information in the form of georeferencing.

The AnigeD competence cluster is divided into the following research areas

• Formalization of substantive criteria for the success of anonymization provided by the legal system.

• Anonymization through synthetic data

• Anonymization of georeferenced data

• Evaluation of anonymized data according to formal criteria.

• Open software tools for anonymization

3

The thematic focus, which is supported by various research strands, is the further and new development of

strategies for the protection of personal and company-related data when using complex integrated data sets. Not

only the integration of different data via direct identifiers or probabilities is relevant, but also the integration and

linking of data via regional information in the form of georeferencing.

The present paper presents insight from the research area of anonymization through data synthesis. Therefore, by

synthesizing parts of the Structure of Earnings Survey 2018, a non-georeferenced database is chosen to gain

further insights into the potentials of anonymization and provision of the data synthesis of official databases.

Previous research of Loske & Wolfanger (2019), Hafner & Lenz (2011) dealt with the synthesis of official data

structural files, while Templ (2017) synthesized simulated data of the SES 2014. In contrast to the mentioned

prior research approaches, the present work deals with the partial data synthesis of the on-site material of the

company and employee datasets of the Structure of Earnings Survey 2018.

In addition to increasing the anonymity of respondent units through data synthesis, the potentially provided

synthetic data needs to comply with high-quality requirements placed on official data (Zwick, 2016). Thus, the

generated partially synthetic data material is evaluated concerning the attained global and analytic utility. Finally,

this paper will provide a weighted evaluation of the data synthesis with respect to the anonymization and analysis

potentials.

2 Background

Official data products and statistics face a steady increase in demand by the scientific community and the public,

in general (Allin, 2021). Anonymising data in such a way that the remaining information does not allow any

conclusions to be drawn about individual data subjects (be they persons, households or companies), but still

contains sufficient information potential, is a core concern of every data producer, whether private or public. In

addition to various legal regulations (EU-DSGVO, BDSG, BStatG), the quality of the data products is of

particular importance. Methods for anonymization or statistical confidentiality have to resolve a conflict of

objectives. On the one hand, the information provided by the data subjects must be protected; on the other hand,

the procedures must be chosen in such a way that the anonymized data still have sufficient potential for analysis

or information. Anonymizing and guaranteeing the confidentiality of individual data generally involves a

reduction of information and thus a loss of information. The Federal Statistical Office already has extensive

experience in anonymizing large amounts of data (i.e., Ronning et al. (2005), Hundepool et al. (2012), Templ

(2017)). In general, provided data is anonymized to a greater or lesser extent. In case of application of less

anonymization measures the way of data access is made more difficult (Rothe, 2015).

The demand for greater availability and transparency of data, while maintaining confidentiality and data

protection, can only be met by innovative methods of data processing, preparation and delivery. The use of

classical anonymization methods reaches its limits with increasing complexity and number of data usage requests.

Synthetic data offer opportunities to optimize the aspects of anonymization because respective measures can be

integrated into the synthesis process (Drechsler & Haensch, 2023).

4

For on-site access, the full data material, except for direct identifiers, is provided to the scientific community.

However, scientific data users must use the data either physically (safe center usage) or virtually (remote

execution) at the providing institution. In contrast, off-site data can be used in the scientific institution of the

contractor (Rothe, 2015).

The methodology of synthetic data generation is based on the principles of multiple imputation (Rubin, 1993).

However, instead of only estimating values to replace missing values, false declarations etc., estimations are used

to replace some or all variables of the original dataset (Little, 1993). Thus, there is a methodological distinction

between the concepts of full (Rubin, 1993) and partial synthesis (Little, 1993). Only sensitive variables or

variables which increase the reidentification risks are synthesized as part of partial synthesis. Little (1993) argues

that focusing only on sensitive or reidentification risk increasing variables should prevent the analysis quality

from being reduced too much by the estimation character of the synthesis approach. The possible, integrable

anonymization measures can make an important contribution to balancing the protection of the respondents and

ensuring of the analysis potential (Drechsler & Haensch, 2023).

The synthetic data should reflect the structure and relationships of the original data as closely as possible.

Simultaneously, the level of anonymity should be increased in comparison to the original on-site material (Reiter,

2023). The Statistical Offices of the Federation and the Federal States are obliged to protect the respondent units

according to Section 16 (1), Federal Statistics Act. At the same time, the must comply with the scientific privilege

derived from Section 16 (6), Federal Statistics Act. Thus, the Federal Statistical Offices of the Federation and the

Federal States founded the Research Data Centers (RDCs) in 2001 to enable scientific access to official data

(Zühlke, Zwick, & Scharnhorst, 2003).

As part of the AnigeD project and competence cluster, one working package deals with the assessment of the

supply potential of synthetic on-site material. At the international level, there are already first applications by the

national statistical authorities of New Zealand, Canada, Scotland and the United States of America, among

others.1 The use of synthetic data to anonymize personal data has found wider application so far, as documented

in the literature mentioned above (Burnett-Isaacs et al., 2021).

The cluster builds on previous research that has addressed, among other things, the de facto anonymity of

economic statistics. In the case of economic statistics data, these methods are sometimes limited by oligopolitical

market structures, and there have been few applications for georeferenced data. Georeferenced data offer new

possibilities for merging heterogeneous data. According to § 10 section 3 BStatG, individual statistical data with

regional information can be integrated on a hectare level. § 10 section 3 BStatG, which allows for detailed

regional information, but here too only a few anonymization approaches have been developed for such integrated

data. In this respect, AnigeD is expected to provide new insights that will be of great interest, especially for the

commercial use of the data.

Concrete preliminary work has been done in the area of mobile phone signal data in recent years. Since 2017, the

Federal Statistical Office has been researching possible applications of mobile phone signal data in official

1 Burnett-Isaacs et al. (2021)

5

statistics (Hadam, Schmid, & Simm, 2020). Within this framework, several studies have been carried out on

different application purposes and quality aspects. This has resulted in several modular software packages for

geolocation, deduplication and aggregation of activities (see 'Mobile network data' of the ESSnet Big Data I and

II project).

In addition, the European Statistical System is working on the concrete technical implementation of privacy-

compliant processing of mobile network data and on process models for cooperation between private data

providers and official statistics. The implementation of such a process offers official statistics, and thus also

research, society and politics, the possibility of making long-term statements on longitudinal changes in

population distribution and mobility - e.g. long-term intra-German migration patterns, analysis of the effects of

new forms of work and the development of sustainable means of transport.

Within the framework of the research project "Anonymization of official statistics through synthetic data", three

lines of action are highlighted. The first line of action focuses on exploring the possible uses of synthetic data for

the RDCs of the Statistical Office of the Federation and the Federal States. In this context, methods for the (partly)

automated creation of synthetic datasets will be developed and tested. These synthetic datasets will be used in

various applications, such as data exploration, writing and testing of analysis programs, teaching, and

anonymization of particularly sensitive features and geocoordinates. It will also explore whether synthetic data

can expand the range of data recipients, such as data journalists.

The second storyline looks at the potential of high-quality synthetic datasets for the way public and private data

producers work to produce and publish aggregated results. Here we explore whether synthetic or semi-synthetic

data can be used directly in the production of results to resolve trade-offs between protecting confidentiality and

making statistical results widely and flexibly available.

The third strand will systematically compare different approaches to synthetic data production. In particular, the

extent to which the methods developed are also suitable for statistical analyses such as regression analyses will

be investigated. It will be investigated how statistical approaches can be used in the context of machine learning

and vice versa. In addition, existing approaches will be methodologically refined to address possible weaknesses,

e.g. in the use of deep learning methods from computer science.

So far, the RDCs do not provide synthetic data. Previous research regarding synthesis of official data dealt with

data structural files (Loske & Wolfanger, 2019; Hafner & Lenz, 2011) or with simulations of official data (Templ,

2017). Even if the extensive use of synthetic data for the direct production of results is not always possible for

quality reasons, there are scenarios where the use of synthetic data offers advantages. For all storylines, the

standardizability of synthetic data generation and the effort involved is crucial.

The project will also develop privacy record linkage methods that allow geocoordinated data to be stored as

Bloom filters in individual records and used for linkage or distance calculations. The security of these methods

for encoding geocoordinates will be investigated, especially with regard to the problems of statistical secrecy

caused by enriched datasets.

6

The plan is to align the environment term with typical applications or analysis models for the target data and then

balance the two (usually conflicting) goals: Maximizing the analysis potential and minimizing the risk of re-

identification.

The Chair of Statistics at the Department of Economics of the FU Berlin has developed advanced methods for

the analysis of anonymized georeferenced data in cooperation with the company INWT Statistics. The focus of

anonymization is to reduce the accuracy of the georeferenced data in order to make it difficult or impossible to

identify individual units in a dataset. Nevertheless, the dataset should remain usable for content related

evaluations. This subproject deals with the use of anonymized georeferenced data and the limitations of

anonymization. Statistical methods will be developed that both take into account the anonymization process and

enable typical evaluations of georeferenced data. These procedures will be demonstrated for different application

areas. At the same time, user-friendly open source software will be developed for these applications.

Statistical procedures will be developed that allow for smooth map representations that are not bound to a specific

area system, but are still compatible with the anonymized area values. The aim is to adapt the statistical evaluation

of georeferenced data to the anonymization procedure and to make the use of anonymized georeferenced data

sets more efficient. To this end, adapted statistical estimation procedures will be developed and supported by

open source software to facilitate their use by a wide range of users.

In order to make sound predictions about the capabilities of a potential attacker, a consistent formalization of the

material criteria specified by the legal system is required. To accomplish this legally and technically challenging

task, the DUV (German University of Administrative Sciences Speyer, german: Deutsche Universität für

Verwaltungswissenschaften Speyer) adopts a research approach that measures the extent to which the provision

of a data set increases the likelihood that an attacker will obtain new information about the data subject. This

approach is based on the recognition that any natural person is already exposed to some basic risk from data that

is generally accessible or available to a potential attacker, and that this risk remains even if the entity holding the

data refrains from publishing or sharing it.

Another question that the DUV addresses is how the publication or dissemination of the dataset affects the pre-

existing baseline risk. The DUV's approach is to examine existing proposals for measuring risk shift, taking into

account their compatibility with the legal system and practice. In particular, two approaches will be considered:

Differential Privacy (DP) and GDA Score. However, it is not enough to merely measure the shift of the basic

risk. In a third step, the DUV therefore plans to investigate in more detail the maximum extent to which the basic

risk can be shifted so that the data-holder can legitimately assume that it is only passing on anonymized data.

The software system Diffix will be used as a demonstrator for the processing, evaluation and analysis of the data

within the framework of the research tasks. It is used for the technical implementation of the anonymization

methods developed in the cluster. The aim is to make the best use of Diffix as a stand-alone application and as

part of other programming languages such as Python and R, or anonymization packages, to enable feasible

solutions.

Aims:

7

AnigeD aims to advance current anonymization methods and to identify and implement new solutions for new

problems. This should not only secure but also extend the current state of data access for science. The methods

developed and researched in the cluster will be made available not only to the project partners involved, but also

to data-holding companies. In this way, the developed and new methods can generate added value for the

companies on the one hand, and expand access to company data for science and official statistics on the other.

The main objective of AnigeD is to secure and expand access to complex data while protecting individual

characteristics, and to create greater legal certainty for practitioners. Given the exponential growth of data

volumes and the increasing complexity of data, especially in the context of georeferencing, current strategies for

protecting individual identifiers are reaching their limits. Therefore, a sub-goal of AnigeD is to secure and expand

the supply of (complex) data for science in the research data network of RDCs.

In addition, existing methods will be further developed in cooperation with companies from the data industry and

made available for commercial purposes. In this way, insights and applications developed for science through

public funding of data access will also be opened up for data-driven business mod-els. At the same time, data

from the companies will be made available for use in science and society, with appropriate protection of feature

carriers and trade secrets.

This paper reports first results from the research of the AnigeD project on the evaluation of the potentials of

synthetic data for the scientific community as well as for the providing RDCs of the Statistical Offices of the

Federation and the Federal States. Therefore, the company and the employee file of the Structure of Earnings

Survey (SES) 2018 serve as base for several synthesis approaches and the respective evaluations regarding

disclosure risks and utility of the generated synthetic data. The following section elaborates on the

conceptualization of the synthesis approach and the subsequent assessment of the disclosure risks and utility of

the synthetic data generated.

3 Conceptualization

Following the argumentation of Little (1993), the concept of partial synthesis is used to synthesize the on-site

material of the SES 2018. The SES 2018 comprises a company and an employee dataset which are both partially

synthesized as part of the present work. Various statistical techniques and machine learning approaches can be

used to conduct data synthesis (Drechsler & Haensch). Research findings of Grinsztajn, Oyallon, & Varoquaux

(2022) indicate that Classification And Regression Trees (CARTs) outperform conventional statistical techniques

and other machine learning approaches in many occasions. Thus, CARTS are predominantly used to synthesize

the two on-site datasets of the SES 2018.

Furthermore, data synthesis enables data providers to make use of different smoothing approaches as

anonymization measure for variables obtaining highly skewed distributions. The resulting reduction in estimation

accuracy leads to an increase in the level of anonymity (Nowok, Raab, Dibben, Snoke & van Lissa, 2022;

Drechsler & Reiter, 2008). Reiter (2005) identifies several reasons for providing multiple synthetic datasets per

original dataset. Drechsler (2009) suggests to provide at least as many synthetic datasets per original dataset as

8

the number of original datasets. In the present work, five synthetic datasets are generated based on the original

company and original employee dataset, each. Hence, the minimal criterion of m ≥ r 8 (Drechsler, 2009) is

complied with.

Following the partial data synthesis carried out, the generated partially synthetic datasets are checked concerning

their disclosure risks. Here k-anonymity (Sweeney, 2002) is used as one measure to quantify the number of

observations violating k=2 or k=3 anonymity (Templ, 2017) and the number of high-risk observations (Templ

2017). The mentioned key measures are calculated as ratios to the key measures of the respective off-site material

as denominator. For baseline evaluations and to enable comparisons, the same is done for the original on-site

material of the company and employee datasets of the SES 2018.

�̂�𝑘 = �̂�𝑘

1 − �̂�𝑘 𝑙𝑜𝑔 (

1

�̂�𝑘 ) | 𝑓𝑘 = 1 (1)

�̂�𝑘 =

�̂�𝑘

1 − �̂�𝑘 − (

�̂�𝑘

1 − �̂�𝑘 )

2

𝑙𝑜𝑔 ( 1

�̂�𝑘 ) | 𝑓𝑘 = 2

(2)

�̂�𝑘 = �̂�𝑘

𝑓𝑘 − (1 − �̂�𝑘)

(3)

Observations are classified as high-risk observations if their estimated individual risk �̂�𝑘 is higher than 10 % and

larger than the median individual risk �̂�𝑘 + factor δ times the median absolute deviation of �̂�𝑘 (δ ≥ 2; Templ,

2017).

Moreover, the generated partially synthetic data is evaluated regarding the number of expected random matches

as well as the absolute number of true and false matches to the original data (Drechsler & Reiter, 2008).

• Expected Match Risk for a selection based on a random guess:

∑ (

1

𝑐𝑗 ) ∗ 𝐼𝑗

𝑗∈𝑇 (4)

• True Match Rate for true matches of targets 𝐾𝑗 among all matches identified within cj units exemplarily

examined:

∑ 𝐾𝑗𝑗∈𝑇

∑ (𝑐𝑗 = 1)𝑗∈𝑇 ⁄ (5)

• False Match Rate for the share of incorrectly assumed matches within cj units exemplarily examined:

1 − (

∑ 𝐾𝑗𝑗∈𝑇

∑ (𝑐𝑗 = 1)𝑗∈𝑇 ⁄ ) (6)

Considering the high-quality standards for official data, it is important to further evaluate the analytic potential

of the generated partially synthetic SES 2018 data, in addition to the disclosure risk assessment. Consequently,

the generated partially synthetic company and employee datasets are examined concerning their global and model

specific utility. Variable transformations according to Raghunathan, Lepkowski, Van Hoewyk & Solenberger

(2001) are used to ensure compliance with basic logical constraints on variable relationships. Furthermore,

descriptive statistics and distributions of the generated partially synthetic and the original data of both files of the

9

SES 2018 are compared to assess global utility. Finally, the propensity Mean-Squared Error (pMSE) is used as a

final measure for global utility to rate the similarity of the generated partially synthetic datasets and the original

database.

𝑝𝑀𝑆𝐸 =

1

𝑚 ∑ (

1

𝑁 ∑(𝑝�̂� − 𝑐)2

𝑁

𝑖

)

𝑚

𝑗

(7)

A model specific utility evaluation provides deeper insights to the usefulness of the partially synthetic company

and employee datasets of the SES 2018. Considering that many research questions in the scientific community

are worked with several models, underlines the importance of a model specific utility evaluation even further.

The confidence interval overlap is a measure to assess the model specific utility and serves as an indicator for the

accuracy of estimates obtained from models which are estimated on synthetic data (Karr, Kohnen, Oganian,

Reiter, & Sanil 2006). Hence, the present work estimates exemplary linear and logistic regression models for

partially synthetic company and employee data material, each. These exemplary regression models are used to

estimate the average confidence interval overlap over all coefficients just as the separate confidence interval

overlap.

𝐽 𝑘

= 1

2 ∗ [

𝑈𝑜𝑣𝑒𝑟,𝑘 − 𝐿𝑜𝑣𝑒𝑟,𝑘

𝑈𝑜𝑟𝑖𝑔,𝑘 − 𝐿𝑜𝑟𝑖𝑔,𝑘

+ 𝑈𝑜𝑣𝑒𝑟,𝑘 − 𝐿𝑜𝑣𝑒𝑟,𝑘

𝑈𝑠𝑦𝑛𝑡ℎ,𝑘 − 𝐿𝑠𝑦𝑛𝑡ℎ,𝑘

] (8)

4 Results

4.1 Disclosure Risk Evaluation

Spline smoothing has proven to be the best approach for the data synthesis of the company data of the SES 2018

regarding the cost-benefit ratio of disclosure risks and global/model specific utility. The evaluation of disclosure

risks is executed by comparing k-anonymity key measures as well as the number of high-risk observations

building up on Templ (2017), as already described in section 3. Contrary to first expectations increases in the

mentioned key figures for the generated synthetic data material are recorded. Nevertheless, it needs to be

underlined that the increase results through the data synthesis, so there is actually no increase in real high-risk

observations which implies that the pool of partially synthetic high-risk observations is larger compared to the

respective numbers in the original data material. It is believed that this is more likely to indicate increased security

in terms of confidentiality, since the risk of finding a truly high-risk observation should decrease.

The examination of the key measures of Drechsler & Reiter (2008) slightly support this assumption because they

reveal that a random disclosure only arises with a probability of less than 0.1 %. Furthermore, it turns out that,

there is no true match to be observed in the generated partially synthetic company data.

In contrast to the company data, the best cost-benefit-ratio regarding disclosure risks and global/model specific

data utility is achieved for the partially synthetic employee data if kernel density smoothing is applied to synthesis

highly skewed variables (e.g., income-related variables). The data synthesis model which uses spline smoothing

10

leads to a noteworthy underestimation of outliers for the variable gross monthly income. Analogous to the

disclosure risk evaluation of the company data, an adapted form of the approach of Templ (2017) is used in the

first step. Thereby, increases in the ratios of the respective key measures are observed as well. However, these

increases are less high compared to the increases observable for the partially synthetic company data.

In the second step, disclosure risks are again further evaluated by examining the expected match risk and the true

match rate (Drechsler & Reiter, 2008). In contrast to the synthetic company data, there is no risk expected for a

random match. Moreover, this is also observed for the true match rate indicating that all observations considered

to be matching to original observations are actually false matches.

4.2 Utility Evaluation

As described in section 3 the utility of the generated partially synthetic data material based on the company and

employee file of the SES 2018 is assessed both globally as well as specifically for exemplary regression models.

Variable transformations as described by Raghunathan, Lepkowski, Van Hoewyk & Solenberger (2001) ensure

that basic boundary values constraints are met. Thus, enabling to directly start with comparisons of the original

and respective synthetic data material of the SES 2018 for global utility assessment. It can be observed that the

basic descriptive statistical key measures (mean, median and standard deviation) are reflected well in the

generated synthetic company and employee material.

Additionally, data utility is assessed by examining the mean pMSE for the partially synthetic company and

employee material of the SES 2018. The mean pMSE of 0.1142 lies in the middle of the possible interval which

indicates a still existing potential for utility improvement.

An exemplary linear regression model is estimated alongside an exemplary logistic regression model to evaluate

the model specific utility of the generated partially synthetic company data. The exemplary linear regression

model estimates potential effects of the craft affiliation of a company, participation of the public sector in

company’s capital as well as the number of common working days per week on the company’s number of

employees (see Table 1). In the next step, the average confidence interval of the exemplary synthetic data-based

estimates is computed in relation to their counterparts of the original data over all m = 5 partially synthetic

datasets. The average confidence interval overlap equals 85 % over all estimates of the exemplary linear

regression. Observing a confidence interval overlap around 62 % for the explanatory variable “participation of

the public sector in the company’s capital” reveals that the overall mean confidence interval overlap of the

exemplary linear regression model is negatively impacted by the CI of the estimate.

11

An exemplary logistic model estimates effects of several explanatory effects on a previously created binary

variable that indicates whether wages are determined primarily on the basis of collective bargaining agreements

(see Table 2). The highest confidence interval overlap is estimated for the coefficients of the explanatory variables

“Craft affiliation” (94.81 %) and “Type of corporate entity = Operation of a multi-business enterprise” (93.53 %).

However, both mentioned explanatory variables are the only variables exceeding 90 % and approximating the

target value of 95 % confidence interval overlap.

Table 1: Comparison of coefficients and confidence intervals of an exemplary linear regression on variable “Number

of Employees” with both original and synthesized on-site company dataset of the SES 2018.

Original on-site material

(employee dataset)

Synthesized on-site material

(employee dataset)

CI

Overlap

Coefficient

(Std. error)

Coefficient

(Std. error)

Intercept -97.7958***

(36.4197)

-92.9597**

(36.4198)

0.9661

Craft affiliation 20.8993***

(2.8277)

19.6029***

(2.8277)

0.8830

Participation of

the public

sector in

company’s

capital

223.6840***

(11.2451)

207.0839***

(11.2451)

0.6234

Working days

per week

-17.4124***

(6.6388)

-15.5643 **

(6.6388)

0.9290

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

* p < 0.10, ** p < 0.05, *** p < 0.01

Table 2: Comparison of coefficients and confidence intervals of an exemplary logit regression on variable “collective bargaining”

with both original and synthesized on-site company dataset of the SES 2018.

Original on-site material

(employee dataset)

Synthesized on-site material

(employee dataset)

CI

Overlap

Coefficient

(Std. error)

Coefficient

(Std. error)

Intercept -0.6124***

(0.17298)

-0.8626***

(0.1730)

0.6311

Craft affiliation -0.2445***

(0.01304)

-0.2471***

(0.0130)

0.9481

Number of Employees 0.00003***

(0.0000)

0.00003***

(0.0000)

0.7657

Working Days per Week -0.1116***

(0.0336)

-0.0624*

(0.0336)

0.6266

Type of corporate entity = Operation

of a multi-business enterprise

1.5793***

(0.0349)

1.5705***

(0.0349)

0.9353

Type of corporate entity = Operation

of a multi-country enterprise

1.5240***

(0.0265)

1.5723***

(0.0265)

0.5344

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

* p < 0.10, ** p < 0.05, *** p < 0.01

12

Consequently, the overall mean confidence interval overlap for the exemplary logistic model, which is used to

further evaluate the model specific utility of the generated synthetic company data, equals 74 %. Therefore, there

is an even higher deviation for the confidence interval of explanatory variables in the exemplary logistic

regression model in comparison to the explanatory variables in the exemplary linear regression model.

All in all, confidence interval overlaps of 85 % and 74 % suggests a good similarity between the exemplary linear

and logistic regression based on the original and the generated partially synthetic data. Nevertheless, a further

increase through extended tuning of the synthesis models for the company data is expected to achieve confidence

interval overlaps close to 95 %.

In contrast to the partially synthetic company data, the partially synthetic employee data is estimated by making

use of kernel density smoothing. The thereby generated partially synthetic employee data is assessed by

comparing the descriptive statistics with the respective counterparts of the original data. This examination reveals

that the descriptive key measures as well as the distribution of the monthly gross income is similarly well met as

the key figures for the company data set. The same is true for the pMSE (0.1110) which is equally close to the

pMSE of the company dataset. Consequently, this suggests that further tuning of the data synthesis model could

also lead to a further increase in data utility in this case.

Table 3: Comparison of coefficients and confidence intervals of an exemplary linear regression on variable gross

hourly income with both original and synthesized on-site employee dataset of the SES 2018.

Original on-site material

(employee dataset)

Synthesized on-site material

(employee dataset)

CI Overlap

Coefficient

(Std. error)

Coefficient

(Std. error)

Intercept 589.1931***

(2.569)

606.0611***

(2.56875)

-0.6752

Education 1.550***

(0.0197)

1.54555***

(0.01968)

0.9394

Sex -3.3940***

(0.0249)

-3.18269***

(0.02488)

-1.1664

Year of Birth -0.0304***

(0.0012)

-0.0682***

(0.0012)

-6.9942

Year of Entry -0.2567***

(0.0015)

-0.2279***

(0.0015)

-3.7869

Restriction of

term of contract

-1.4092***

(0.0101)

1.4784***

(0.01008)

-0.7522

Private sector 0.0716

(0.0542)

0.2492***

(0.0542)

0.1643

Company size -0.0000***

(0.0000)

-0.0000***

(0.0000)

0.23399

Vocational

education

3.5243***

(0.0124)

3.3684***

(0.01235)

-0.2990

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

* p < 0.10, ** p < 0.05, *** p < 0.01

13

Moreover, an exemplary linear regression model is estimated on the generated variable gross hourly income (see

Table 3). Additionally, an exemplary logistic regression model is estimated on a binary variable indicating

whether the gross hourly income of an employee exceeds the minimal wage (see Table 4). Assessing the model-

specific utility reveals that there is no mean confidence overlap to be observed, in average, for both exemplary

regression models estimated for the partially synthetic and the original employee data.

5 Discussion

Limitations

The present research on synthesis potential of the SES 2018 does not check and ensure cress-file references

between the company and employee file. It is likely that respective logical constraints need to be considered in

future synthesis models. A similar train of thought is followed for the use case of panel data, since the present

work only examines only a single survey year. It cannot be ruled out completely that relations in longitudinal

context are not reflected accurately.

In the present research, it is dealt with a partial synthesis of the employee dataset and the company of the SES

2018. Consequently, the results are only valid for the examined datasets. It cannot be ruled out completely that

the findings for the partial data synthesis of the SES 2018 files cannot be generalized for other official surveys

such as microcensus, DRG, for example.

Table 4: Comparison of coefficients and confidence intervals of an exemplary logit regression on variable “Gross

Hourly Wages Above Minimal Wage” with both original and synthesized on-site employee dataset of the SES 2018.

Original on-site material

(employee dataset)

Synthesized on-site material

(employee dataset)

CI Overlap

Coefficient

(Std. error)

Coefficient

(Std. error)

Intercept 99.76548***

(0.9690)

84.8586***

(0.9690)

-2.9244

Sex -0.24396***

(0.01248)

-0.2399***

(0.01248)

0.9172

Year of Birth -0.04444***

(0.00049)

-0.0379***

(0.00049)

-2.3949

Education -0.0408***

(0.0075)

-0.1054***

(0.0075)

-1.1864

Vocational

Education

0.4637***

(0.00736)

0.5034***

(0.00736)

-0.3762

Restriction of

term of

contract

-1.93796***

(0.0090)

-1.6336***

(0.0090)

-7.6049

Weekly

working

hours

-0.16288***

(0.00086)

-0.1277***

(0.00086)

-9.3753

Private sector 0.2412***

(0.02997)

0.2035***

(0.02997)

0.6791

Company

size

-0.0000***

(0.0000)

-0.0000***

(0.0000)

0.5961

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

* p < 0.10, ** p < 0.05, *** p < 0.01

14

The examination of the partially synthesized SES 2018 datasets does only lead to suggestive conclusion that the

partial synthesis led to a decrease in risks of deanonymization looking at k-anonymity and number of high-risks

observations. It is theorized that an increase in the respective key measures after partial data synthesis reflects an

increase of the pool of high-risk observations containing values which do not necessarily match the original data.

Assessing the low expected match risks and true match rates of both partially synthesized company and employee

files provides, further support for this hypothesis. Nevertheless, it needs to be acknowledged that there is yet no

actual linkage between the estimated key measures yet.

Looking at the results it needs to be acknowledged that for both the employee and company file of the SES 2018,

the disclosure risks and model-specific utility-related key measures do not meet the expectations. Thus, the

current results do not provide evidence that the present generated partially synthetic data is ready for provision

to the scientific community.

In principal, the assessment of all key measures provided in this paper is only offering a personal appraisal of

partial synthetic data provision by official statistical offices. The present thesis is not to be considered as a legal

report on how to deal with the provision of synthesized data material but is only offering a personal appraisal of

potentials.

Research and Practical Implications

It is believed that existing cross-file references may not be accurately reflected in the partially synthesized SES

2018 data. This is to be investigated as part of further work on the third research area of the AnigeD project. If

there are limitations concerning cross-file references, respective constraints need to be integrated into the partial

synthesis models.

Additionally, future research should illuminate the utility and disclosure risk evaluations about panel data which

has not yet been covered by the presented work. Since the scientific community is often interested in longitudinal

research questions, it needs to be made sure that respective relations are reflected correctly, as well.

Furthermore, future research should check on the hypothesis that the increase in high-risk observations after

partial data synthesis actually reflects a larger pool of untruthful high-risk observations, indicating lower

disclosure risks. As part of this examination, it should also be exposed whether the key measures of Templ (2017)

in combination with the key measures of Drechsler & Reiter (2008) could be harmonized.

Since the present work deals only with the evaluation of potentials of data synthesis of the on-site material of

SES 2018, the generalizability of the presented findings for other official surveys should be investigated to

provide lawyers with the knowledge needed for their legal assessment on simplified data access of less

anonymized synthesized data to the scientific community.

In addition, future research should tie up to the present work. Work should be done to further improve the key

figures, which have not been satisfactory in some places to date so that publication of synthetic data can be

examined by the legal authorities in the future and implemented if necessary.

15

Current research results indicate potentials to increase confidentiality and keep the structure of original official

survey data by making use of partial data synthesis of key and target variables of the SES 2018. However, the

more precise examination of utility specific key measures (pMSE and confidence interval overlap) suggests that

the partial data synthesis models need to be tuned before a release of partially synthetic SES 2018 data can be

considered. Hyperparameter optimization seems to be a beneficial approach fur future data synthesis, enabling a

structured search for hyperparameters which are able to maximize the desired result of utility metrics (Bergstra

& Bengio, 2012).

The presented work is no legal report on the legal possibility of providing easier data access to less conservatively

anonymized official data. Even a positive evaluation for the use case of partially synthesized data of the SES

2018 does not allow to make use of this approach for other survey years of the SES or other statistics.

Consequently, a continuous legal monitoring needs to be implemented as soon as new research insights on the

potential of synthesized official data are available.

Conclusion

All in all, the, so far, the generated partially synthetic data does not allow be made publicly available because

they do not meet the expectations of the scientific community concerning the utility. Future research should focus

on examining how to further increase the utility of the present partially synthetic on-site material of SES 2018.

Only after that, a legal evaluation regarding the provision possibilities on simplified ways of access for the

generated partially synthetic data on the SES 2018 is possible. Future research should also deal with further

official statistics and panel data to further increase the knowledge on synthesis potentials for official on-site data.

16

6 Bibliography

Abowd, J., Stinson, M., & Benedetto, G. (2006). Final Report to the Social Security Administration on the

SIPP/SSA/IRS Public Use File Project. Technical Report, US Census Bureau.

Allin, P. (2021, November). Opportunities and challenges for official statistics in a digital society.

Contemporary Social Science, 16(2), pp. 156-169. doi:10.1080/21582041.2019.1687931

Brandt, M., Crößmann, A., & Gürke, C. (2011). Harmonisation of statistical confidentiality in the Federal

Republic of Germany. FDZ_Arbeitspapiere, 34, pp. 1-12.

Caiola, G., & Reiter, J. P. (2010). Random Forests for Generating Partially Synthetic, Categorical Data.

Transactions on Data Privacy, 3, pp. 27-42.

Drechsler, J. (2009). SYNTHETIC DATASETS FOR THE GERMAN IAB ESTABLISHMENT PANEL. Joint

UNECE/Eurostat work session on statistical data confidentiality, (pp. 1-12). Bilbao, Spain.

Drechsler, J. (2011). Multiple imputation in practice—a case study using a complex German establishment

survey. AStA Advances in Statistical Analysis, 95, pp. 1-26. doi:DOI 10.1007/s10182-010-0136-z

Drechsler, J., & Haensch, A.-C. (2023). 30 years of synthetic data. arXiv:2304.02107, pp. 1-42.

doi:10.48550/arXiv:2304.02107

Drechsler, J., & Reiter, J. P. (2008). Accounting for Intruder Uncertainty Due to Sampling When Estimating

Identification Disclosure Risks in Partially Synthetic Data. In J. Domingo-Ferrer, & Y. Saygin (Ed.),

Privacy in Statistical Databases. 5262, pp. 227-238. Berlin: Springer. doi:10.1007/978-3-540-87471-

3_19

Forschungsdatenzentren der Statistischen Ämter des Bundes und der Länder. (2020a). Metadatenreport. Teil I:

Allgemeine und methodische Informationen zur Verdienststrukturerhebung 2018. Metadatenreport,

Düsseldorf.

Forschungsdatenzentren der Statistischen Ämter des Bundes und der Länder. (2020b). Metadatenreport. Teil II:

Produktspezifische Informationen zur Nutzung der Verdienststrukturerhebung 2018 per On-Site-

Nutzung. Metadatenreport, Wiesbaden. doi: 10.21242/62111.2018.00.00.1.1.0

Forschungsdatenzentren der Statistischen Ämter des Bundes und der Länder. (2022). Regelungen zur

Auswertung in den Forschungsdatenzentren der Statistischen Ämter. (F. d. Länder, Ed.) pp. 1-25.

Hadam, S., Schmid, T., & Simm, J. (2020). Kleinräumige Prädiktion von Bevölkerungszahlen basierend auf

Mobilfunkdaten aus Deutschland. In B. Klumpe, J. Schröder, & M. Zwick (Eds.), Qualität bei

zusammengeführten Daten (pp. 31-48). Wiesbaden: Springer VS. doi:https://doi.org/10.1007/978-3-

658-31009-7_3

Hafner, H.-P., & Lenz, R. (2011). Some aspects concerning analytical validity and disclosure risk of CART

generated synthetic data. Joint UNECE/Eurostat work session on statistical data confidentiality, (pp. 1-

10). Tarragona, Spain.

Hu, J., & Hoshino, N. (2018). The quasi-multinomial synthesizer for categorical data. International Conference

on Privacy in Statistical Databases (pp. 75-91). Springer.

Karr, A. F., Kohnen, C. N., Oganian, A., Reiter, J. P., & Sanil, A. P. (2006). A Framework for Evaluating the

Utility of Data Altered to Protect Confidentiality. The American Statistician, 60(3), pp. 224-232.

doi:10.1198/000313006X124640

Kitchin, R. (2015). The opportunities, challenges and risks of big data for official statistics. Statistical Journal

of the IAOS, 31, pp. 471-487. doi:DOI 10.3233/SJI-150906

17

Kursa, M., & Rudnicki, W. (2010). Feature Selection with the Boruta Package. Journal of Statistical, 36(11).

Retrieved from https://doi.org/10.18637/jss.v036.i11

Manrique-Vallier, D., & Hu, J. (2018). Bayesian non-parametric generation of fully synthetic multivariate

categorical data in the presence of structural zeros. Journal of the Royal Statistical Society, 181(3), pp.

635-647.

Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of Input. Statistical Science,

9(4), pp. 538-573.

Nowok, B., Raab, G. M., & Dibben, C. (2016, October). synthpop: Bespoke Creation of Synthetic Data in R.

Journal of Statistical Software, 74(11), pp. 1-26. doi:10.18637/jss.v074.i11

Order of the First Senate of 15, 1 BvR 209/83 -, paras. 1-214 (BVerfG December 1983).

Pierson, S. (2015). Official statistics principles compared. Statistical Journal of the IAOS, 31, pp. 21-23.

doi:10.3233/SJI-150886

Pistner, M., Slavkovic, A., & Vilhuber, L. (2018). Synthetic data via quantile regression for heavy-tailed and

heteroskedastic data. In J. Domingo-Ferrer, & F. Montes (Ed.), International Conference on Privacy in

Statistical Databases (pp. 92-108). Springer. doi:https://doi.org/10.1007/978-3-319-99771-1_7

Raab, G. M., Nowok, B., & Dibben, C. (2016). Practical data synthesis for large samples. Journal of Privacy

and Confidentiality, 7(3), pp. 67-97. doi:https://doi.org/10.29012/jpc.v7i3.407

Reiter, J. P. (2005). Releasing multiply imputed, synthetic public use microdata: an illustration and empirical

study. Journal of the Royal Statistical Society, 168(1), pp. 185-205.

Reiter, J. P. (2023). Synthetic Data: A Look Back and A Look Forward. Transactions On Data Privacy, 16, pp.

15-24.

Rothe, D. (2015, 05). Statistische Geheimhaltung - der Schutz vertraulicher Daten in der amtlichen Statistik -

Teil 1: Rechtliche und methodische Grundlagen. Bayern in Zahlen, pp. 294-303.

Rubin, D. B. (1993). Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9(2), pp. 462-

468.

Schäfer, A., & Gottschall, K. (2015). From wage regulation to wage gap: how. Cambridge Journal of

Economics, 39, pp. 467-496. doi:doi:10.1093/cje/bev005

Statistisches Bundesamt. (2020a). Verdienststrukturerhebung - Niveau, Verteilung und Zusammensetzung der

Verdienste und der Arbeitszeiten abhängiger Beschäftigungsverhältnisse - Ergebnisse für Deutschland -

. Fachserie, 16(1), pp. 1-525.

Templ, M. (2017). Statistical Disclosure Control for Microdata - Methods and Applications in R (1. ed.). Basel:

Springer Cham. doi:10.1007/978-3-319-50272-4

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R.

Journal of Statistical Software, 45(3), pp. 1-67.

van der Voort, H. G., Klievink, A. J., Arnaboldi, M., & Meijer, A. J. (2019, January). Rationality and politics of

algorithms. Will the promise of big data survive the dynamics of public decision making? Government

Information Quarterly, 36(1), pp. 27-38. doi:10.1016/j.giq.2018.10.011

Woo, M.-J., Reiter, J. P., Oganian, A., & Karr, A. F. (2009). Global measures of data utility for microdata.

Journal of Privacy and Confidentiality, 1(1), pp. 111-124.

Zühlke, S., Zwick, M., & Scharnhorst, S. (2001). Die Forschungsdatenzentren der Statistischen Ämter des

Bundes und der Länder. (S. Bundesamt, Ed.) Wirtschaft und Statistik, 10, pp. 906-911.

18

Zühlke, S., Zwick, M., & Scharnhorst, S. (2003). Die Forschungsdatentren der Statistischen Ämter des Bundes

und der Länder. (S. Bundesamt, Ed.) Wirtschaft und Statistik 10, pp. 906-911.

Zühlke, S., Zwick, M., Scharnhorst, S., & Wende, T. (2005). The research data centres of the Federal Statistical

Office and the statistical offices of the Länder. FDZ-Arbeitspapiere, 3, pp. 1-11.

Zwick, M. (2016). Big Data und amtliche Statistik. In B. Keller, H. Klein, & S. Tuschl, Marktforschung der

Zukunft - Mensch oder Maschine? (pp. 157-172). Wiesbaden: Springer Gabler.

doi:https://doi.org/10.1007/978-3-658-14539-2_10

Anonymization for Integrated and

Georeferenced Data (AnigeD) Yannik Garcia Ritz & Jannek Mühlhan

UNECE Expert meeting on Statistical Data Confidentiality 2023

Agenda (1) Competency cluster AnigeD

(2) Background of Data Synthesis

(3) Synthesis Approach

(4) Evaluation Approach

(5) Evaluation Results

(6) Discussion

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

» Federal Ministry of Education and Research

(BMBF) initiates nationwide “Anonymization for

Secure Data Use” research network

» Individual research projects and

collaborative projects (competency

clusters) are funded for three years

» financed by the European Union –

NextGenerationEU

Initial situation

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 3

Information

content

CostsData

protection Provision of data

and statistics

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

» AnigeD: Anonymization for Integrated and Georeferenced Data

» Objective: securing and extending access to complex data while observing protection

requirements

» Total funding amount: EUR 4.37 million

» Funding period: 11/2022 - 11/2025

» Cluster coordination: Federal Statistical Office (Destatis), Wiesbaden

Competency cluster AnigeD - Anonymization for Integrated and Georeferenced Data

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 4

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Research partners

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 5

Associated clusters

AnoMed

ANONY- MED

AnoMoB

AnigeD

Project partners:

Destatis, FU Berlin, IAB, TH Köln, Speyer University

Associated partners:

DIW, EuroDaT, MPI-SWS,

SMA Development GmbH,

Telekom, Duisburg-Essen

University,

Other research projects:

AnGer, DARIA, GANGES

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

» Evaluation of anonymization methods using legal criteria

» Potential of anonymization by synthetic data

» Anonymization of georeferenced data

» Testing of software tools for the efficient analysis and provision of anonymized, georeferenced

data

Research priorities

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 6

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

» Comparison of procedures from statistics and informatics

» Systematic evaluation of criteria

» Analysis and methodological refinement of synthesis

procedures

» Evaluation of synthetic data acceptance by the scientific

community

Work package 3

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 7

Anonymization by means of synthetic data

Synthetic data in statistics and informatics -

Systematic comparison and methodological

refinement (SynDeStatIk)

Anonymization by means of synthetic data to

provide microdata for the scientific community

Machine learning on anonymization by means of

synthetic data

Basis for assessing synthetic data generation

approaches for statistical confidentiality

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

» Aim: reducing disclosure risks in provided datasets with less restrictions than established

measures (suppression, top coding etc.)

» Provide less aggregated microdata to enable better analysis for research at less complex

ways of data access

» Builds on previous approaches on partial (e.g., Little, 1993) and full synthesis/imputation

(Rubin, 1993)

Context of Research on Anonymization Potentials of Data Synthesis

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 8

Growing interest in

microdata Confidentiality

considerations

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Basic Idea of Data Synthesis

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 9

» Aim: generating data which mimics original data regarding distributions, relations etc.

but with lower risks of reidentification

» Idea: apply approach of imputation to critical variables / all variables

9

Non-natural NA

False declaration

Synthesized Values

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Increasing confidentiality

» Mostly CART-based synthesis

» Increasing the default minbucket parameter

leads to tree pruning

» Smoothing for heavily skewed metric

variables

» Spline smoothing

» Kernel density smoothing

Synthesis Approach

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 10

Company Dataset Employee Dataset

m=5m=5 R package:

synthpop

Nowok , Raab & Dibben (2016)

» Minimal # of synthetic datasets to be provided: m (=5) ≥ r (=2); Drechsler (2009); Reiter (2008)

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Disclosure Risk Evaluation

» Risk Ratios (k-anonymity, high-risk observations)

&#x1d442;&#x1d45f;&#x1d456;&#x1d454;&#x1d456;&#x1d45b;&#x1d44e;&#x1d459; &#x1d442;&#x1d45b;−&#x1d446;&#x1d456;&#x1d461;&#x1d452; &#x1d437;&#x1d44e;&#x1d461;&#x1d44e;

&#x1d442;&#x1d453;&#x1d453;−&#x1d446;&#x1d456;&#x1d461;&#x1d452; &#x1d437;&#x1d44e;&#x1d461;&#x1d44e; vs.

&#x1d446;&#x1d466;&#x1d45b;&#x1d461;ℎ&#x1d452;&#x1d461;&#x1d456;&#x1d450; &#x1d442;&#x1d45b;−&#x1d446;&#x1d456;&#x1d461;&#x1d452; &#x1d437;&#x1d44e;&#x1d461;&#x1d44e;

&#x1d442;&#x1d453;&#x1d453;−&#x1d446;&#x1d456;&#x1d461;&#x1d452; &#x1d437;&#x1d44e;&#x1d461;&#x1d44e; (Templ, Kowarik & Meindl, 2015)

» Drechsler & Reiter (2008): Expected Match Risk & True Match Rate

Utility Evaluation

» Ensuring logical constraints, comparing descriptive key measures, examining distributions of

analytic key variables and pMSE (global utility)

» Examining confidence interval overlaps of exemplary regression models

(model-specific utility)

Evaluation Approach

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 11

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Disclosure Risk Evaluation

Drechsler & Reiter (2008): Expected Match Risk & True Match Rate

• Expected Match Risk: data user randomly selects correct observation

σ&#x1d457;∈&#x1d447; ൗ1 &#x1d450;&#x1d457; ∗ &#x1d43c;&#x1d457;

• True Match Rate: share of truly matched targets w/ matches > 1 in D(m1-m5)

σ&#x1d457;∈&#x1d447; ൗ &#x1d43e;&#x1d457;

σ&#x1d457;∈&#x1d447;(&#x1d450;&#x1d457;=1)

Evaluation Approach

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 12

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Disclosure Risk Evaluation – Risk Ratios

» Against first assumption increase in ratios

» Potential interpretation:

» More unique (synthetic) observations

-> larger pool of risky observation

=> Lower risk of true matches

Evaluation Results

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 13

Original Synthesized, spline

smoothing

Synthesized, kernel

density smoothing

Company Dataset

Ratio k=2 violating Obs. (on-/off-site) 5,977.50 7,356.9 6,459.9

Ratio k=3 violating Obs. (on-/off-site) 1,397.00 4,884.65 4,532.25

Ratio High-Risk Obs. (on-/off-site) 89.32 120.60 91.76721

Employee Dataset

Ratio k=2 violating Obs. 1.51 2.39 2.43

Ratio k=3 violating Obs. 1.07 3.81 3.82

Ratio High-Risk Obs. (On-/Off-site) 4.06 1.60 1.53

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Disclosure Risk Evaluation – Expected Match Risk & True Match Rate

Evaluation Results

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 14

Synthesized

(Spline smoothing)

Synthesized

(Kernel density smoothing)

Company Dataset

Expect. Match Risk 0.0520 % 0.0502 %

True Match Rate 0.0000 % 0.0000 %

False Match Rate 100.0000 % 100.0000 %

Employee Dataset

Expect. Match Risk 0.0000 % 0.0000 %

True Match Rate 0.0000 % 0.0000 %

False Match Rate 100.0000 % 100.0000 %

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

» Minor expected risk for a random match for the

company data

» No expected match risk for a random match for

the company data

» No true matches for both data materials

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Utility Evaluation – Global Utility

• Ensuring logical constraints & comparing descriptive key measures

Evaluation Results

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 15

Original data

without Bavaria Synthetic data

without Bavaria [m1] Synthetic data

without Bavaria [m3] Synthetic data

without Bavaria [m5]

Federal state of selection

avg 7.6127 7.5935 7.5941 7.6024

median 7 7 7 7

SD 4.2972 4.2852 4.2863 4.3013

Total Number of Employees per Company

Avg 1338.7238 1377.4767 1350.4678 1336.3058

median 13 13 13 13

SD 12079.7587 12175.1597 11897.5250 11654.4577

Male workers in the company

Avg 59.3187 55.3524 55.5785 55.3697

median 5 5 5 5

SD 391.2943 290.0992 296.3359 272.0953

Female workers in the company

avg 43.4172 40.8809 41.7379 40.6903

median 4 4 4 4

SD 201.6976 165.3280 177.6063 164.0810

Common number of working days per week

Avg 5.0691 5.0691 5.0669 5.0675

median 5 5 5 5

SD 0.3340 0.3338 0.3314 0.3327

Total Number of Employees per Operational Unit

Avg 102.7359 96.2333 97.3165 96.0599

median 12 11 11 11

SD 534.7484 399.1344 413.5876 386.1275

Industry Code

Avg 38.6545 39.4663 39.4978 39.4355

median 14 14 14 14

SD 41.3318 41.6283 41.6528 41.6224

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

Original data without Bavaria

Synthetic data without Bavaria [m1]

Synthetic data without Bavaria [m3]

Synthetic data without Bavaria [m5]

Sex

Avg 1.4697 1.4696 1.4703 1.4697

median 1 1 1 1

SD 0.4991 0.4991 0. 4991 0.4991

Year of Birth

Avg 1973.6222 1973.6273 1973.6244 1973.6249

median 1972 1972 1972 1972

SD 13.0492 13.0436 13.0453 13.04500

Year of entry into the company

Avg 2005.0247 2005.2054 2005.2030 2005.2035

median 2010 2010 2010 2010

SD 12.6139 12.4221 12.4232 12.4226

Gross monthly income

Avg 2989.3089 2988.5670 2987.0762 2985.8921

median 2686 2685 2685 2688

SD 2490.2024 2518.3297 2453.5373 2371.0687

Total earnings for overtime hours

Avg 19.9182 19.4604 19.4583 19.5073

median 0 0 0 0

SD 126.1532 122.3120 122.0371 122.5444

Shift- and night shift credits, weekend and holiday extra charges,

Avg 29.5375 29.8927 29.7160 30.0267

median 0 0 0 0

SD 128.1537 127.4987 126.1038 127.4425

Statutory deductions due to income tax and solidarity surcharge

Avg 532.9553 518.4816 518.3125 517.7757

median 329 327 328 327

SD 875.9017 745.5711 766.2905 709.1242

Statutory deductions due to social insurance

avg 477.2533 484.0925 483.5502 483.7454

median 446 443 443 443

SD 309.9909 391.1970 352.7239 357.0693

Gross yearly income

avg 37903.7149 35862.8038 35844.9144 35830.7048

median 33367 32220 32220 32256

SD 36898.1805 30219.9569 29442.44793 28452.8244

Net monthly income

avg 1985.79994 1985.9929 1985.2135 1984.3709

median 1821 1812 1813 1814

SD 1494.4055 1545.2950 1511.5743 1467.8617

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Utility Evaluation – Distributional Examination of Key Variables

• Ensuring logical constraints & comparing descriptive key measures

• Distributions of analytic key variables

Evaluation Results

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 16

Firmendaten Angestelltendaten

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Utility Evaluation – Global Utility

• Ensuring logical constraints & comparing descriptive key measures

• Distributions of analytic key variables

• Propensity Mean-Squared Error (pMSE)

&#x1d45d;&#x1d440;&#x1d446;&#x1d438; = 1

&#x1d45a; σ&#x1d457; &#x1d45a; 1

&#x1d441; σ&#x1d456; &#x1d441; ෝ&#x1d45d;&#x1d456; − &#x1d450; 2

• pMSE interval per definition [0; 0.25]

Evaluation Results

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 17

Company Dataset Employee Dataset

Spline Smoothing Kernel Density Smoothing Spline Smoothing Kernel Density Smoothing

pMSE 0.1142 0.1142 0.1102 0.1110

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States.

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Utility Evaluation – Model-Specific Utility (Company Material)

• Examining confidence interval overlaps of exemplary regression models

Evaluation Results

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 18

mean Confidence Interval Overlap: 0.85 mean Confidence Interval Overlap: 0.74

Firmendaten

Coefficents for fit to

„Number of Employees/Operational Unit“

Coefficients for fit to „Company Loans are Negiotated Based

on Collective Bargaining“

Craft affiliation

Participation of

Public Sector

Common

Number of

Working Days

per Week

Craft affiliation

Employees per

Company Participation of

Public Sector Muti-unit

company Multi-country

company

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Utility Evaluation – Model-Specific Utility (Employee Material)

• Examining confidence interval overlaps of exemplary regression models

Evaluation Results

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 19

Original on-site material (employee dataset)

Synthesized on-site material (employee

dataset)

CI Overlap Coefficient

(Std. error) Coefficient (Std. error)

Intercept 589.1931*** (2.569)

606.0611*** (2.56875)

-0.6752

Education 1.550*** (0.0197)

1.54555*** (0.01968)

0.9394

Sex -3.3940*** (0.0249)

-3.18269*** (0.02488)

-1.1664

Year of Birth -0.0304*** (0.0012)

-0.0682*** (0.0012)

-6.9942

Year of Entry -0.2567*** (0.0015)

-0.2279*** (0.0015)

-3.7869

Restriction of term of contract

-1.4092*** (0.0101)

1.4784*** (0.01008)

-0.7522

Private sector 0.0716 (0.0542)

0.2492*** (0.0542)

0.1643

Company size -0.0000*** (0.0000)

-0.0000*** (0.0000)

0.23399

Vocational education

3.5243*** (0.0124)

3.3684*** (0.01235)

-0.2990

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States. * p < 0.10, ** p < 0.05, *** p < 0.01

Original on-site material (employee dataset)

Synthesized on-site material (employee dataset)

CI Overlap Coefficient

(Std. error) Coefficient (Std. error)

Intercept 99.76548*** (0.9690)

84.8586*** (0.9690)

-2.9244

Sex -0.24396*** (0.01248)

-0.2399*** (0.01248)

0.9172

Year of Birth -0.04444*** (0.00049)

-0.0379*** (0.00049)

-2.3949

Education -0.0408*** (0.0075)

-0.1054*** (0.0075)

-1.1864

Vocational Education

0.4637*** (0.00736)

0.5034*** (0.00736)

-0.3762

Restriction of term of contract

-1.93796*** (0.0090)

-1.6336*** (0.0090)

-7.6049

Weekly working hours

-0.16288*** (0.00086)

-0.1277*** (0.00086)

-9.3753

Private sector

0.2412*** (0.02997)

0.2035*** (0.02997)

0.6791

Company size

-0.0000*** (0.0000)

-0.0000*** (0.0000)

0.5961

Source: SES 2018. RDCs of the Statistical Offices of the Federation and the Federal States. * p < 0.10, ** p < 0.05, *** p < 0.01

Angestelltendaten

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Limitations

» Only partial synthesis is tested

» Results are not generizable for other surveys

» Assessment of Risk-Utility-Ratio yet not satisfying

» Research results cannot serve as legal report

Discussion

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 20

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Research Implications

» Replicate studies with a full synthesis approach

» Evaluate utility and risk for longitudinal data (of other surveys)

» Examine further surveys regarding potentials of synthetic data

» Applying hyperparameter tuning to optimize cost-utility-ratio

Discussion

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 21

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Practical Implications

» If future research is able to improve generated synthetic data regarding the Risk-Utility-Ratio:

» Lawyers need to evaluate possibilities to provide synthetic on-site material via off-site

access

» Synthetic data need to be provided as separate product at first without opportunities fo project-

specific processing until insights are gained

Discussion

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 22

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Discussion

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 23

Conclusion

» Attempt to synthetize and evaluate official German on-site material

» New insights on synthesis of further official surveys

» Encouraging results regarding global utility and disclosure risks

» Improvable results concerning utility

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Conclusion

» Two Options:

1. release partially synthetic data tailored to specific research questions of the data users

2. release fully synthetic datasets if follow-up research is able to provide evidence for an

improved model-specific

Discussion

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 24

Contact Statistisches Bundesamt

Postal address

65180 Wiesbaden

[email protected]

[email protected]

Functional mailbox

[email protected]

www.destatis.de/contact

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Drechsler, J. (2009). Generating multiply imputed synthetic datasets: theory and implementation. (Doctoral dissertation,

Otto-Friedrich-Universität Bamberg, Fakultät Sozial-und Wirtschaftswissenschaften). Bamberg.

Drechsler, J., & Reiter, J. P. (2008). Accounting for Intruder Uncertainty Due to Sampling When Estimating Identification

Disclosure Risks in Partially Synthetic Data. In J. Domingo-Ferrer, & Y. Saygin (Ed.), Privacy in Statistical Databases. 5262,

pp. 227-238. Berlin: Springer. doi:10.1007/978-3-540-87471-3_19

Hafner, H.-P., & Lenz, R. (2011). Some aspects concerning analytical validity and disclosure risk of CART generated

synthetic data. Joint UNECE/Eurostat work session on statistical data confidentiality, (pp. 1-10). Tarragona, Spain.

Loske, J., & Wolfanger, T. (2019). Entwicklung Synthetischer Datenstrukturfiles. Statistische Woche, (p. 113). Trier.

Karr, A. F., Kohnen, C. N., Oganian, A., Reiter, J. P., & Sanil, A. P. (2006). A Framework for Evaluating the Utility of Data

Altered to Protect Confidentiality. The American Statistician, 60(3), pp. 224-232. doi:10.1198/000313006X124640

Sources

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 26

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Karr, A. F., Kohnen, C. N., Oganian, A., Reiter, J. P., & Sanil, A. P. (2006). A Framework for Evaluating the Utility of Data

Altered to Protect Confidentiality. The American Statistician, 60(3), pp. 224-232. doi:10.1198/000313006X124640

Little, R. J. (1993). Statistical analysis of masked data. Journal of Official Statistics, 9(2), pp. 407–426.

Nowok, B., Raab, G. M., & Dibben, C. (2016, October). synthpop: Bespoke Creation of Synthetic Data in R. Journal of

Statistical Software, 74(11), pp. 1-26. doi:10.18637/jss.v074.i11

Order of the First Senate of 15, 1 BvR 209/83 -, paras. 1-214 (BVerfG December 1983).

Reiter, J. P. (2008). Selecting the number of imputed datasets when using multiple imputation for missing data and

disclosure limitation. Statistics & Probability Letters, 78, pp. 15-20.

Rothe, D. (2015). Statistische Geheimhaltung - der Schutz vertraulicher Daten in der amtlichen Statistik - Teil 1:

Rechtliche und methodische Grundlagen. Bayern in Zahlen, pp. 294-303.

Rubin, D. B. (1993). Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9(2),

pp. 462-468.

Sources

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 27

[email protected]

destatis.deUNECE Expert Meeting on Statistical Confidentiality

Rubin, D. B. (1993). Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9(2), pp. 462-468.

Templ, M. (2017). Statistical Disclosure Control for Microdata - Methods and Applications in R (1. ed.). Basel:

Springer Cham. doi:10.1007/978-3-319-50272-4

Templ, M., Kowarik, A., & Meindl, B. (2015, October). Statistical Disclosure Control for Micro-Data Using the R Package

sdcMicro. Journal of Statistical Software, 67(4), pp. 1-36. doi:10.18637/jss.v067.i04

Woo, M.-J., Reiter, J. P., Oganian, A., & Karr, A. F. (2009). Global measures of data utility for microdata. Journal of

Privacy and Confidentiality, 1(1), pp. 111-124.

Zühlke, S., Zwick, M., Scharnhorst, S., & Wende, T. (2005). The research data centres of the Federal Statistical

Office and the statistical offices of the Länder. FDZ-Arbeitspapiere, 3, pp. 1-11.

Sources

29.09.2023Statistisches Bundesamt (Destatis) - IFEB 28

  • Slide 1: Anonymization for Integrated and Georeferenced Data (AnigeD)
  • Slide 2: Agenda
  • Slide 3: Initial situation
  • Slide 4: Competency cluster AnigeD - Anonymization for Integrated and Georeferenced Data
  • Slide 5: Research partners
  • Slide 6: Research priorities
  • Slide 7: Work package 3
  • Slide 8: Context of Research on Anonymization Potentials of Data Synthesis
  • Slide 9: Basic Idea of Data Synthesis
  • Slide 10: Synthesis Approach
  • Slide 11: Evaluation Approach
  • Slide 12: Evaluation Approach
  • Slide 13: Evaluation Results
  • Slide 14: Evaluation Results
  • Slide 15: Evaluation Results
  • Slide 16: Evaluation Results
  • Slide 17: Evaluation Results
  • Slide 18: Evaluation Results
  • Slide 19: Evaluation Results
  • Slide 20: Discussion
  • Slide 21: Discussion
  • Slide 22: Discussion
  • Slide 23: Discussion
  • Slide 24: Discussion
  • Slide 25: Contact
  • Slide 26: Sources
  • Slide 27: Sources
  • Slide 28: Sources

An overview of data protection strategies for individual-level geocoded data, Institute for Employment Research, Germany

individual data, georeferenced data, confidentiality concerns, privacy protection, utility, limited access

Languages and translations
English

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

Expert meeting on Statistical Data Confidentiality 26–28 September 2023, Wiesbaden

An overview of data protection strategies for individual-level geocoded data Maike Steffen, Konstantin Körner, Jörg Drechsler

Institute for Employment Research (IAB)

[email protected]

Abstract In response to a growing need for small-scale geographic information in various research areas, data-collecting institutions are increasingly georeferencing individual-level data. However, due to confidentiality concerns, external researchers typically have very limited access to these data if at all, resulting in a substantial loss of informational value. A growing body of literature on data protection strategies for geocoded data attempts to find solutions for the tradeoff between privacy protection and utility preservation of the individual-level data. The purpose of this paper is to systematically collect and review the literature in the field and to offer a classification of existing methods. Various strategies for estimating the utility and the remaining risk of disclosure for the protected data are also discussed.

1 Introduction

Geocoded data have become increasingly relevant in various research areas since they offer insights that can only be acquired considering spatial context. The granular information enables researchers to include fine geographic patterns and spatial variation of individual characteristics in their analyses. The detailed geographical information facilitates studying such diverse topics as neighborhood effects, mobility patterns, or the spread of diseases to name only a few of the possible applications. Moreover, the geo-coordinates are not subject to changes over time as it is the case with administrative borders, which often hampers longitudinal analyses. Finally, the availability of detailed geographical information allows to easily merge information from various data sources. However, access to detailed geocoding information is currently limited as it is well known that detailed geo- graphical information is highly identifying (De Montjoye et al., 2013). To still enable access to this valuable source of information, various strategies have been proposed in the literature to protect confidentiality while still maintaining the utility of the collected information. This paper aims to give an overview of the various approaches. We also provide an overview of metrics that have been used to assess the disclosure risk and the utility of the protected data. The remainder of the paper is organized as follows. In Section 2, we review the three most popular approaches for protecting geocoded data: aggregation, geographic masking, and data synthesis. In Section 3, we discuss various tools which are used to assess the risk and utility of the protected data. Section 4 concludes the article.

2 Data Protection Strategies

Two general strategies are commonly applied to reduce the risk of disclosure when disseminating data to the public: information reduction and perturbation. Information reduction limits the amount of detail that is available in the data. This can range form discretizing continuous variables (e.g., reporting age in five-year intervals) over coarsening categorical variables (e.g., reporting only the first two digits of a hierarchical classification code such as the NACE code) to removing entire variables. Perturbation approaches try to preserve the level of detail contained in the original data. They reduce the risk of disclosure by slightly altering the microdata on the record level. Examples include noise infusion, top-coding, or swapping. Both strategies are also used when disseminating detailed geo-information. Aggregation as a form of information reduction is probably the most widely adopted strategy to reduce the risk of reidentification. We will review different aggregation strategies in more detail in Section 2.1. The early influential paper by Armstrong et al. (1999) lists two alternative strategies to aggregation that rely on perturbation: affine transformations and geographic masking. Affine transformations are methods that displace, rescale, or rotate the entire vector of original locations. Since they are completely deterministic, these methods are relatively easy to reverse engineer. They also lead to a substantial loss of information since the transformation of the original locations are data independent and thus spatial clustering effects found in the original data can be destroyed. Furthermore, external geographical information can no longer be linked to the transformed data in a reasonable way (Zandbergen, 2014). For these reasons, these methods have never been widely adopted and we will only review geographic masking in more detail in Section 2.2. In recent years, synthetic data approaches have emerged as another perturbation strategy. With synthetic data, original values are replaced with synthetic values drawn from a model fitted to the original data. We will review synthetic data approaches for disseminating detailed geo-information in Section 2.3.

2.1 Aggregation

As discussed earlier, aggregation is the most widely adopted strategy to reduce risks from reidentification. Aggregation does not alter the information, that is, the number of observations per aggregated unit remains

2

accurate and the location of individuals may be coarsened but will not be replaced by fake locations. However, it does lead to a loss of information and thereby reduces the range of applications the data can be used for. Broadly, there are two general aggregation strategies: aggregation within pre-defined areas, such as grid cells or administrative areas, and more spatially flexible microaggregation, which ensures that each aggregation cell contains a predefined number of records. The use of aggregation within pre-defined areas is by far the most commonly adopted approach, and guidelines to assign observations to standardized grid cells have been developed (e.g., INSPIRE, 2014). Using standardized formats comes with the advantage that additional spatial information such as climate, health, or economic data can be easily linked using these grid cells (Klumpe et al., 2020). At the same time, it is a rather inflexible strategy. If the uniformly sized grid cells are sufficiently small, they allow detailed analyses, but may not protect confidentiality adequately in sparsely populated cells. If they are large enough to protect confidentiality even in rural areas, there is a high information loss in urban areas. To address this issue, grid cell sizes can be adapted to the population density (e.g., Lagonigro et al., 2017). This approach, however, renders the linking of external grid cell data more difficult. Some researchers (e.g., Groß et al., 2017, 2020) have proposed to improve the utility of the aggregated data by applying a smoothing function based on kernel density estimators, which randomly reassigns the individuals to point locations within the aggregation cell. This strategy can, for example, be beneficial if the goal is to compute distance measures or for plotting the data on a map. Microaggregation techniques allow to flexibly adapt the size of the aggregation area to the desired level of protection (Domingo-Ferrer and Torra, 2005; Castro et al., 2022). Research on microaggregation in the context of geographic data mainly focuses on anonymizing digital trace data (see, e.g., Domingo-Ferrer and Trujillo- Rasua, 2012; Rebollo-Monedero et al., 2011), but the approach has also been adopted to achieve strong privacy guarantees for geocoded data based on the concept of differential privacy (Soria-Cormas and Drechsler, 2013). While microaggregation can protect privacy consistently, it creates irregular polygons that are somewhat difficult to interpret and cannot easily be linked to external geographic data.

2.2 Geographic Masking

Geographic masking relies on randomly displacing the original location to protect confidentiality. A variety of methods have been developed in this field. The simplest form of geographic masking assigns new locations by drawing a circle with fixed radius around the original location and randomly picking a new location on that circle (Zandbergen, 2014). With such a fixed displacement distance, the risk of re-engineering the original locations from the masked data can be relatively high (Zandbergen, 2014), hence random perturbation within a predefined maximum distance from the original location is more commonly used (see Armstrong et al., 1999; Kwan et al., 2004; Zandbergen, 2014; Hampton et al., 2010). This increases the level of protection as the actual displacement distance is unknown to the end user even if the masking approach is disclosed. Various strategies how to randomly draw the displacement distance have been proposed in the literature. One strategy is to use a uniform distribution within the radius of a circle centered on the original value (Armstrong et al., 1999; Zimmerman and Pavlik, 2008). Since this allows for the masked location to be very close or even equal to the original location, an alternative method called donut masking that provides higher confidentiality protection has been suggested (Hampton et al., 2010; Allshouse et al., 2010; Kounadi and Leitner, 2015). This masking method requires a minimum displacement distance additionally to the maximum displacement distance, forming a donut shape around the original location. An alternative approach to increase the displacement distance is N-Rand masking (Wightman et al., 2011), which also uses perturbation within a circle but draws &#x1d441; potential displacement locations. The location that is furthest away from the original location is then selected as the final displacement location. Instead of displacing the original locations within a circle with fixed radius and using a uniform distribution, some authors have suggested drawing the distance and direction of displacement from a bivariate Gaussian probability distribution (Cassa et al., 2006, 2008; Zimmerman and Pavlik, 2008). Compared to drawing from a uniform distribution, using a Gaussian distribution renders a displacement close to the original location more likely and therefore has little effect on spatial clusters (Cassa et al., 2006). Of course, a negative consequence is

3

an increased risk of disclosure as most of the masked locations will be close to the original location. A variant of this method therefore uses a bimodal Gaussian distribution to approximate donut masking (Zandbergen, 2014). Note that, although unlikely, extremely high displacement distances can drawn from a normal distribution for a small fraction of the locations (Armstrong et al., 1999). If population density in the data varies substantially, perturbation with fixed maximum distance (or fixed variance for the bivariate Gaussian approach) may lead to an unnecessarily large alteration of spatial information in highly populated areas where shorter displacement distances may suffice, and to privacy risks where population density is low and locations should be displaced more. This can be addressed by taking population density into account, such that the radius of the displacement area is larger in less densely populated areas (Kwan et al., 2004; Cassa et al., 2006; Hampton et al., 2010; Lu et al., 2012; Zurbarán et al., 2018). This results in masked data that are more similar to the original data in urban areas while offering a higher level of confidentiality protection in rural areas. With the bivariate Gaussian approach, the variance of the distribution can be set to be inversely proportional to the square of the population density (Cassa et al., 2006). However, as illustrated in Allshouse et al. (2010), using externally provided population density data on an administrative area level as a benchmark, as done for example in Cassa et al. (2006); Hampton et al. (2010), may not sufficiently protect confidentiality in areas with high population distribution heterogeneity. As a remedy, the authors suggest tripling the displacement distance in areas with heterogeneous population distribution. Kounadi and Leitner (2016) argue that, when information is available at the point level, the actual distance to the &#x1d458;th nearest neighbor should be used to determine displacement distance rather than using external population density data at the administrative-area level. In recent years, some authors proposed masking techniques that displace the original locations taking the actual position of the surrounding locations into account, such as Voronoi masking or location swapping (Seidl et al., 2015; Zhang et al., 2017). Voronoi Masking, developed by Seidl et al. (2015), is based on Voronoi polygons (Voronoi, 1908), which are shapes built around each single location with boundaries marking the half of the distance to the next location in any direction. A Voronoi polygon surrounding a point location contains all locations that are closer to this location than they are to any neighboring point locations in the data. In the masking process, each original location is moved to the closest point along the boundaries of its polygon, placing it in the middle between two actual locations. Seidl et al. (2019) find that this decreases map users’ beliefs in being able to re-identify households. The locations are, on average, moved less in areas with higher density of the original points. At the same time, a group of at least two locations that are remote but close to each other will likely be displaced less than would be the case using random perturbation methods, and multiple locations may be relocated to the same masked location. Since many masking approaches do not account for geographic characteristics or whether units exist at the masked location, they may generate unrealistic locations, such as within water bodies or parks. Zhang et al. (2017) propose a location swapping approach to address these concerns. This method draws a circle or donut around the original location with varying distances based on population density. Then, the original location is swapped with another location with similar geographic characteristics within the specified area. They find that location swapping yields higher values of &#x1d458;-anonymity (defined in Section 3.1) than random perturbation using the same displacement area. However, we note that when applying random perturbation techniques with a maximum displacement distance, and especially in scarcely populated areas, the actual level of &#x1d458; achieved can be lower than the level implied by commonly applied techniques to measure &#x1d458; and, thus, we generally do not recommend using this measure to assess the level of protection (we will discuss this problem in more detail in Section 3.1). To address the problem with distance based perturbation techniques, Kounadi and Leitner (2016) propose adaptive areal elimination masking that guarantees a minimum &#x1d458;-anonymity for every location. This method merges predefined shapes, e.g., administrative areas, until the number of locations per polygon is &#x1d458; or higher. The locations are then aggregated or randomly perturbed within each polygon. While this guarantees to achieve the desired level of &#x1d458;-anonymity, most polygons will contain (substantially) more than &#x1d458; units and therefore spatial patterns can be altered excessively.

4

2.3 Synthetic Data

An alternative to the information reduction and masking methods discussed in the previous sections is to replace the true observations with draws from a statistical model, i.e., to generate synthetic data. Such datasets aim to preserve distributional properties and the spatial structure of the original data. Since these patterns are preserved at a much smaller spatial level compared to other anonymization techniques, authors such as Quick et al. (2018); Lawson et al. (2012), and Bradley et al. (2017) argue that synthetic data is able to reduce the risk of ecological fallacies (i.e., misleading inferences from the protected data, see Freedman, 1999). Two general approaches are distinguished in the literature: fully and partially synthetic data. With fully synthetic data (Rubin, 1993), all records in the released data are synthetic. Since synthesizing all variables in a dataset can be challenging for large scale surveys, Little (1993) suggested synthesizing only those variables that are either sensitive or that could be used for re-identification. See Drechsler (2011); Drechsler and Haensch (2023) for a detailed overview on the topic. The approach has also been adopted in recent years for protecting data containing detailed geographical information. Two general strategies can be distinguished in the literature. Several papers do not synthesize the geographical information. Instead, they specifically account for the spatial structure of the data when synthesizing other variables in the dataset to improve the utility of the synthetic data. While these papers focus on protecting sensitive information in the data, i.e., reducing the risk of attribute disclosure, other approaches directly synthesize the geographical information, hence reducing the risk of reidentification. We will separately review the two strategies in the remainder of this section.

2.3.1 Synthesizing non-geographic variables while preserving the spatial information. Sakshaug and Raghu- nathan (2010) is one of the early papers that specifically adjust common synthesis strategies to preserve the detailed spatial information. The authors propose using mixed effects modeling strategies. Mixed effects synthe- sis models are a natural way to preserve the geographical clustering effect. These models are especially popular in the literature on small area estimation. The authors later (2014) extended their approach by incorporating area level covariates in the model, which allows to generate synthetic data even for small areas not included in the original sample. Zhou et al. (2010) offer a more rigorous treatment of the spatial information problem by modeling all variables as spatial processes and applying spatial smoothing when modeling the variables. They show that their method introduces bias for non-linear regression models and propose a strategy for choosing the smoothing function to keep this bias small. Yet another synthesis strategy is described in Quick et al. (2018), which uses a differential smoothing synthesizer for locations of home sale in San Francisco. Their approach is a two-step process. First, they model the log-transformed home sale prices using an unrestricted hierarchical model. Second, they identify spatial outliers based on the distances to their nearest neighbors, then fit a restricted hierarchical model to provide additional smoothing for higher protection. In a related approach, Quick and Waller (2018) also use a hierarchical Bayesian model that preserves spatial, temporal, and between age-groups dependencies. They synthesize county-level heart disease deaths to complete public use data, which would be suppressed at units with cases lower than 10. More recently, Koebe et al. (2023) suggest publishing two different versions of georeferenced data. The first version includes the original location, but all other attributes are synthesized using a Gaussian copula model. The second version omits the geographic identifier, but leaves the other attributes at their original values.

2.3.2 Synthesizing the geographical information. The first successful implementation of geographical synthesis was discussed in Machanavajjhala et al. (2008). The authors propose a strategy for synthesizing the place of living for all individuals working in the U.S. The synthesizer is used to generate the underlying data for an application called OnTheMap provided by the U.S. Census Bureau. This application graphically visualizes commuting patterns on a detailed geographical level. The authors used a Dirichlet/Multinomial model for synthesis and adjusted the Dirichlet priors such that they were able to prove that their synthesizer guaranteed some formal level of privacy called Y−&#x1d6ff;-probabilistic differential privacy (see Machanavajjhala et al. (2008) for details). However, the multinomial model used in this paper offers low utility if the population sizes or event rates are very heterogenious. To address this limitation, Quick (2021) suggests relying on Poisson models–popular

5

in the disease mapping literature–for differentially private data synthesis. He later extended the approach by incorporating public knowledge to further improve the utility of the synthesizer (Quick, 2022). Another synthesis strategy proposed by Wang and Reiter (2012) is to treat the detailed geocoding information as a continuous variable and use CART models to sequentially synthesize the longitude and latitude of the geocodes. This approach was later compared in Drechsler and Hu (2021) with two other synthesis strategies for the geocodes: using a Dirichlet Process of Mixtures of Products of Multinomials (Si and Reiter, 2013; Hu et al., 2018, DPMPM) and CART models treating the geocoding information as categorical variables. The authors find that the categorical CART models offer the highest utility, but also the highest risk of disclosure. When trying to increase the level of protection, they find it to be more effective to synthesize additional variables instead of aggregating the geocoding information to a higher grid level. Burgette and Reiter (2013) generate a partially synthetic dataset in which they synthesize the location of US census tract identifiers using a Bayesian multinomial model with a group of Dirichlet processes priors and a multiple shrinkage prior distribution. This framework is chosen because it shrinks the parameters toward a small number of learned locations, which increases the utility of the data. Paiva et al. (2014) use areal level spatial models (often called disease mapping models in the literature) to synthesize the geographical information. Although they start with exact geographies, their methods require defining fine grids over the spatial domain, then using the conditional autoregressive (CAR) model of Besag et al. (1991) to model the distribution of grid-counts. When synthesizing exact geographies, they recommend first to synthesize grid cells for each individual, and second to randomly assign each individual a location within the grid cells. The approach is computationally intensive and can be challenging to apply if the number of categorical variables or the number of levels within the variables is large. The authors also note that their partially synthetic data do not preserve the spatial pattern because the independent draws from the underlying Poisson model can imply that close geographic units in the original data might be far apart in the synthetic data. This caveat is considered by Quick et al. (2015) who extend the spatial modeling process of geo-coordinates using marked point process models, which simultaneously model the location and the variables (Liang et al., 2008; Taddy and Kottas, 2012). Specifically, the authors propose to model the data in three steps: (i) specify multinomial models for the categorical variables in the data, (ii) use a log-Gaussian Cox process to model the geographical location within each cell specified by cross classifying all categorical variables, and (iii) specify a normal regression for continuous variables given the categorical variables and location. The authors point out that estimating this model can be computationally intractable and suggest several steps and simplifying assumptions to reduce the computational burden.

3 Risk and Utility Assessment

Data dissemination always faces two conflicting goals: minimizing the risk of disclosure and maintaining the usefulness of the data. Therefore, it is crucial to always evaluate data protection strategies for both of these dimensions. In this section we review strategies that have been proposed in the literature to measure the utility and the level of protection for geocoded data that underwent some form of disclosure protection.

3.1 Risk Evaluation

The most commonly applied measure for evaluating the disclosure risk of masked geodata is spatial &#x1d458;-anonymity. It is related to the classical definition as proposed by Sweeney (2002), which states that &#x1d458;-anonymity is achieved if a record is indstinguishable from &#x1d458; − 1 other records in the dataset based on a set of prespecified variables (e.g. age, sex, education). Specifically, spatial &#x1d458;-anonymity is reached if a location is indistinguishable from at least &#x1d458; − 1 other locations. However, in practice it is interpreted in many different ways (Cassa et al., 2006; Allshouse et al., 2010; Hampton et al., 2010; Kounadi and Leitner, 2016; Zhang et al., 2017; Hasanzadeh et al., 2020).

6

There are two main definitions of &#x1d458;-anonymity for masked geodata. First, some researchers define spatial &#x1d458;-anonymity as the number of locations around the original point within a circle with radius equal to the displacement distance (Hampton et al., 2010; Allshouse et al., 2010). The second definition is to measure &#x1d458;-anonymity as the number of locations around the masked location that are within a circle with radius equal to the displacement distance (Lu et al., 2012; Zhang et al., 2017; Hasanzadeh et al., 2020). Note, however, that both approaches can overestimate the level of &#x1d458; , when random perturbation within a circle or donut is applied. This can be amplified if the maximum displacement distance depends on the population density (Allshouse et al., 2010) or is determined by the distance to the &#x1d458; &#x1d461;ℎ nearest neighbor. To illustrate, imagine one household located in an area with few observations or low population density which borders an urban area. If the displacement radius for this household is chosen to reach a certain level of &#x1d458;-anonymity, its maximum displacement distance will be relatively large reaching the outer areas of the urban area. A location in the urban area, on the contrary, has many neighbors in close proximity and will thus, taking &#x1d458;-anonymity as the objective, be displaced within a smaller area that does not include all possible displacements of the rural location. In this example, the rural location may be the only one that can be displaced far into the rural area. As a consequence an ill-intentioned user of the released data can be confident that a masked record in certain rural areas can only stem from one of the few observations in the rural area. Thus, neither counting the cases within a circle around the original point nor counting the cases within a circle around the masked point provides adequate information how well these points are protected. Kounadi and Leitner (2016) empirically demonstrate that to achieve the desired level of &#x1d458;-anonymity for close to 100% of the locations, the maximum distance of displacement needs to be substantially larger than the distance to the &#x1d458; &#x1d461;ℎ nearest neighbor. Beyond the (often flawed) risk assessment based on spatial &#x1d458;-anonymity, strategies for measuring the remaining risk of disclosure are surprisingly limited. Some authors discuss general aspects that impact the risk of disclosure. For example, Cassa et al., 2008 point out that risks of reidentification increase when multiple protected versions of the same georeferenced dataset are published. The original locations can then be approximated by averaging of the masked locations (assuming the same records can be uniquely identified in the different datasets). The more versions of the data are published, the higher the accuracy of this approximation. As Zimmerman and Pavlik (2008) point out, the risk is particularly high when the locations are labelled or details on the masking approach are disclosed such as the maximum displacement radius. A classical risk assessment strategy that has been used in some applications is to mount a record linkage attack. With these types of attacks, the intruder is assumed to possess some information about the units contained in the database (e.g., age, marital status, and employment status) and uses this information to identify units in the database. Risk measures based on record linkage attacks typically try to estimate how likely it is that such an attack will lead to a correct identification in the protected dataset. In the context of geocoded data, it is typically assumed that one of the attributes that is known to the attacker is the (approximately) exact location of the target record. Simulated record linkage attacks have for example been used in Drechsler and Hu (2021) (and implicitly in Koebe et al., 2023) to assess how well the different synthesis strategies protect the geographical information. Drechsler and Hu (2021) use risk measures originally proposed in Reiter and Mitra (2009) to specifically estimate reidentification risks for partially synthetic data. With this approach it is assumed that the attackers possess some background knowledge for a set of target records they wish to identify in the data. Based on this knowledge, they estimate the probability of a match for each unit in the released file. A match is declared for the record that has the highest average matching probability across the synthetic datasets. The risk is evaluated by means of these matches using two different measures. The first one calculates the expected number of correctly declared matches, i.e., the expected match risk. The second one calculates the number of correct unique matches, i.e., the true match rate. Another strategy to evaluate the level of protection specifically for partially synthetic data approaches was used in Quick et al. (2018). The authors focus on spatial outliers in the original data. For those records, they generate a large number of synthetic values by repeatedly drawing from the synthesis model. They then look at histograms of the generated values. If the spatial synthesis model is overfitting, the draws from the model will be centered around the true value with limited variability potentially indicating an unacceptable risk of disclosure. Using a related idea, Quick et al. (2015) and Quick and Waller (2018) compare synthesized values with the

7

true, confidential values. In light of privacy protection, the objective is here to obtain different values. Given that they propose releasing two versions of the same dataset (see Section 2.3), Koebe et al. (2023) measure the risk of correctly re-identifying the sensitive small-area identifiers (zip codes) in the unprotected data without geoinformation using information from the synthetic data. They train random forest models on the dataset in which the geolocations have been protected. The trained model is then run on the original data to predict the locations. The fraction of successful predictions denotes the risk measure.

3.2 Utility Evaluation

While offering a sufficient level of protection should always be the primary goal of any disclosure limitation strategy, it is crucial to also measure its impacts on utility. In the geocoding context, the utility is typically assessed by measuring to what extent the spatial structure of the data is maintained. The list of metrics that is used for this purpose in the literature is almost as large as the disclosure avoidance literature itself. Here, we only focus on the utility assessment based on spatial pattern retention. A more general discussion on utility evaluations can be found for example in Domingo-Ferrer et al. (2012). In the following, we will classify the various approaches into four broad categories: (1) point locations and density measures; (2) cluster analysis; (3) spatial autocorrelation; and (4) land use assessment.

3.2.1 Point Locations and Density Measures. Utility evaluations often start by graphically comparing the population densities of the confidential data and the protected data. A simple approach is to visually compare the locations on a map (e.g., Kwan et al., 2004). However, unless the original data is non-confidential, this approach can only be used internally, as the plots of the original data might spill sensitive information otherwise. A more versatile approach is to estimate the population density using kernel density estimation (Shi et al., 2009; Gatrell et al., 1996). The kernel density estimator creates a smooth density surface which allows to graphically compare the densities of the original and masked data on a heatmap (e.g., Kwan et al., 2004; Zandbergen, 2014). The heatmaps can be used to either visualize the density levels for each dataset separately or to directly display the discrepancies between the two densities. Beyond visualizing the population densities (e.g., Gatrell et al., 1996) the approach can also be used to measure spatial discrepancies in any other variable contained in the data. For example, Seidl et al. (2015) show differences in total warm water consumption among others.

3.2.2 Clustering. Another common approach to evaluate the utility of the protected dataset is to assess whether the data show similar clustering behavior as the original data. A descriptive statistic that is often used to describe clustering in a point pattern is Ripley’s &#x1d43e; function (see, e.g., Kwan et al., 2004; Zhang et al., 2017; Quick et al., 2015; Seidl et al., 2015; Drechsler and Hu, 2021). It is defined as expected number of points within a predefined radius around the location of interest normalized by the average point density across the entire geographical area covered in the data (Ripley, 1976; Kwan et al., 2004). It assesses to which extent a point pattern deviates from spatial homogeneity (Drechsler and Hu, 2021). Based on the &#x1d43e; function, the more easily interpretable &#x1d43f; function can be computed. It takes values close to zero for homogeneously distributed data, while positive values indicate heterogeneity or clustering. Closely related, the cross-&#x1d43e; function and its analog for the &#x1d43f; statistic assess the clustering of one point pattern relative to another point pattern, for example the underlying population distribution (Kwan et al., 2004). As an alternative measure, Zhang et al. (2017) apply an average nearest-neighbor analysis to quantify how well the spatial pattern of the original data is preserved. Specifically, they compute a nearest-neighbor index that consists of the average distances from each unit to its nearest neighbor (measured in, e.g., Euclidean or Manhattan distance). An index value similar to that of the original data indicates comparable clustering intensity. In a related approach, Lu et al. (2012) apply a nearest-neighbor index that compares the average distance to the nearest neighbor with the expected distance assuming a uniform distribution of the locations. Values below one indicate clustering. Seidl et al. (2015) use a nearest-neighbor hierarchical clustering analysis to compare the number of clusters on the first level (clusters of individual data points) in the data (see also Levine, 2006; Kounadi and Leitner, 2015). They also compare standard deviational ellipses between the original and the protected data. These ellipses cover the area that is within, say, one or two standard deviations from the center of

8

the cluster (Kounadi and Leitner, 2015). They facilitate understanding the two-dimensional clustering behavior. Another measure to assess clustering and to identify hotspots is the Gi* statistic proposed by Getis and Ord (1992); Ord and Getis (1995). The Gi* statistic can be used to test the null hypothesis of spatial independence. Rejecting the null hypothesis indicates clustering (Getis and Ord, 1992). Kounadi and Leitner (2015) develop an indicator that combines nearest-neighbor hierarchical clustering and the Gi* statistic. In health research, SatScan (Kulldorff, 1997) is a popular software tool for disease mapping. It can be used to identify spacial and temporal clustering in the data (Kulldorff et al., 2005). Several authors (Olson et al., 2006; Cassa et al., 2006; Hampton et al., 2010) use the software to compare the sensitivity and specificity of the underlying cluster detection approach run on the original and protected data. Finally, some researchers use the original and masked dots to identify a data-dependent geographical area. The utility of the protected data is assessed by measuring the overlap of this area between the two datasets. For example, Hasanzadeh et al. (2017) propose an approach that compares the similarity of individuals’ frequently visited points. Specifically, they extend the residential points to home areas, where the edges mark locations that are visited frequently. Large overlaps of the home areas of the protected and the confidential data indicate high similarity of individuals’ neighborhoods in both datasets.

3.2.3 Spatial Autocorrelation. While clustering analysis focuses on identifying the number and size of clusters in the data, spatial autocorrelation more generally assesses the spatial dependence in a point pattern. Both approaches are closely related. A prevalent measure for spatial autocorrelation is Moran’s I (e.g., Ord and Getis, 1995; Lu et al., 2012; Seidl et al., 2015). It tests whether the null hypothesis that the spatial autocorrelation is zero can be rejected. If this is the case, spatial autocorrelation can be assumed. Another common measure to compare spatial autocorrelation between datasets is the empirical semivariogram. (Matheron, 1963; Quick et al., 2018; Seidl et al., 2015)). It visualizes the homogeneity of non-geographic variables as a function of the distance between the locations. An output graph that increases and then flattens with further distance indicates positive spatial autocorrelation.

3.2.4 Land use. Another widely used approach to measure the utility of masked geodata is to compare the geography of the masked point-coordinates with their original counterparts. Quick and Waller (2018) and Zhang et al. (2017) consider, for instance, land cover categories or the proximity to roads. Regarding land cover rates, they compare whether the point-locations are in the same raster of either urban or rural areas. In an optimal scenario, the protected data would have the same share of points in urban areas as the original. Analogously, this applies to the proximity to roads, where the authors measure the closest distance of each point to the next road. The distances are compared using cumulative distribution functions (cdfs). The closer the two cdfs from the original and the protected data, the higher the utility of the protected data. Related works (e.g., Hasanzadeh et al., 2020) also evaluate other geographic characteristics such as the greenness of the surroundings.

4 Conclusion

Broad access to detailed geo-information can enhance the understanding of our society in numerous ways. Thus, it is not surprising that many data disseminating agencies are currently discussing how to provide access to these data for external researchers without compromising the confidentiality of the units contained in the data. Optimizing the trade-off between offering high utility granular information and sufficient data protection has been the subject of various methods for disclosure protection. In this paper, we have reviewed the literature on protection strategies for georeferenced microdata. Its main strands can be divided into coarsening the geo- information, masking it by altering, perturbing, or swapping the original locations, and disseminating synthetic data instead of the original data. We also discussed the different methods that are used to evaluate the risk and utility of the protected data. When assessing the risk of disclosure, we found that many papers rely on different notions of &#x1d458;-anonymity. We discussed a key concern with these notions, namely that for many of the distance based masking techniques, disclosure risks are underestimated based on this procedures as the obtained value

9

of &#x1d458; tends to be much larger than the true number of indistinguishable records. We therefore strongly advice against using spatial &#x1d458;-anonymity in this context. Regarding the utility evaluation, we conclude that there are many useful approaches discussed in the literature and that it would be an interesting avenue for future research to consolidate the plethora of different measures.

References

Allshouse, W. B., M. K. Fitch, K. H. Hampton, D. C. Gesink, I. A. Doherty, P. A. Leone, M. L. Serre, and W. C. Miller (2010). Geomasking sensitive health data and privacy protection: an evaluation using an e911 database. Geocarto international 25(6), 443–452.

Armstrong, M. P., G. Rushton, and D. L. Zimmerman (1999). Geographically masking health data to preserve confidentiality. Statistics in medicine 18(5), 497–525.

Besag, J., J. York, and A. Mollié (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the institute of statistical mathematics 43, 1–20.

Bradley, J. R., C. K. Wikle, and S. H. Holan (2017). Regionalization of multiscale spatial processes by using a criterion for spatial aggregation error. Journal of the Royal Statistical Society Series B: Statistical Methodology 79(3), 815–832.

Burgette, L. F. and J. P. Reiter (2013). Multiple-shrinkage multinomial probit models with applications to simulating geographies in public use data. Bayesian analysis (Online) 8(2).

Cassa, C. A., S. J. Grannis, J. M. Overhage, and K. D. Mandl (2006). A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection. Journal of the American Medical Informatics Association 13(2), 160–165.

Cassa, C. A., S. C. Wieland, and K. D. Mandl (2008). Re-identification of home addresses from spatial locations anonymized by gaussian skew. International journal of health geographics 7, 1–9.

Castro, J., C. Gentile, and E. Spagnolo-Arrizabalaga (2022). An algorithm for the microaggregation problem using column generation. Computers & Operations Research 144, 105817.

De Montjoye, Y.-A., C. A. Hidalgo, M. Verleysen, and V. D. Blondel (2013). Unique in the crowd: The privacy bounds of human mobility. Scientific reports 3(1), 1–5.

Domingo-Ferrer, J., L. Franconi, S. Giessing, E. Nordholt, K. Spicer, P. de Wolf, and A. Hundepool (2012). Statistical Disclosure Control. Wiley Series in Survey Methodology. Wiley.

Domingo-Ferrer, J. and V. Torra (2005). Ordinal, continuous and heterogeneous k-anonymity through microag- gregation. Data Mining and Knowledge Discovery 11, 195–212.

Domingo-Ferrer, J. and R. Trujillo-Rasua (2012). Microaggregation-and permutation-based anonymization of movement data. Information Sciences 208, 55–80.

Drechsler, J. (2011). Synthetic datasets for statistical disclosure control: theory and implementation, Volume 201. Springer Science & Business Media.

Drechsler, J. and A.-C. Haensch (2023). 30 years of synthetic data. arXiv preprint arXiv:2304.02107. Drechsler, J. and J. Hu (2021). Synthesizing Geocodes to Facilitate Access to Detailed Geographical Information

in Large-Scale Administrative Data. Journal of Survey Statistics and Methodology 9(3), 523–548. Freedman, D. A. (1999). Ecological inference and the ecological fallacy. International Encyclopedia of the

social & Behavioral sciences 6(4027-4030), 1–7. Gatrell, A. C., T. C. Bailey, P. J. Diggle, and B. S. Rowlingson (1996). Spatial point pattern analysis and its

application in geographical epidemiology. Transactions of the Institute of British geographers, 256–274. Getis, A. and J. K. Ord (1992). The analysis of spatial association by use of distance statistics. Geographical

analysis 24(3), 189–206. Groß, M., A.-K. Kreutzmann, U. Rendtel, T. Schmid, and N. Tzavidis (2020). Switching between different

non-hierachical administrative areas via simulated geo-coordinates: a case study for student residents in berlin. Journal of Official Statistics 36(2), 297–314.

10

Groß, M., U. Rendtel, T. Schmid, S. Schmon, and N. Tzavidis (2017). Estimating the density of ethnic minorities and aged people in berlin: multivariate kernel density estimation applied to sensitive georeferenced administrative data protected via measurement error. Journal of the Royal Statistical Society Series A: Statistics in Society 180(1), 161–183.

Hampton, K. H., M. K. Fitch, W. B. Allshouse, I. A. Doherty, D. C. Gesink, P. A. Leone, M. L. Serre, and W. C. Miller (2010). Mapping health data: improved privacy protection with donut method geomasking. American journal of epidemiology 172(9), 1062–1069.

Hasanzadeh, K., A. Broberg, and M. Kyttä (2017). Where is my neighborhood? a dynamic individual-based definition of home ranges and implementation of multiple evaluation criteria. Applied geography 84, 1–10.

Hasanzadeh, K., A. Kajosaari, D. Häggman, and M. Kyttä (2020). A context sensitive approach to anonymizing public participation gis data: From development to the assessment of anonymization effects on data quality. Computers, Environment and Urban Systems 83, 101513.

Hu, J., J. P. Reiter, and Q. Wang (2018). Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Analysis 13(1), 183–200.

INSPIRE (2014). Data specification on geographical grid systems – technical guidelines. Technical Report D2.8.I.2, European Commission.

Klumpe, B., J. Schröder, and M. Zwick (Eds.) (2020). Qualität bei zusammengeführten Daten. Schriftenreihe der ASI - Arbeitsgemeinschaft Sozialwissenschaftlicher Institute. Springer VS Wiesbaden.

Koebe, T., A. Arias-Salazar, and T. Schmid (2023). Releasing survey microdata with exact cluster locations and additional privacy safeguards. Humanities and Social Sciences Communications 10(1), 1–13.

Kounadi, O. and M. Leitner (2015). Spatial information divergence: Using global and local indices to compare geographical masks applied to crime data. Transactions in GIS 19(5), 737–757.

Kounadi, O. and M. Leitner (2016). Adaptive areal elimination (aae): A transparent way of disclosing protected spatial datasets. Computers, Environment and Urban Systems 57, 59–67.

Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics-Theory and methods 26(6), 1481– 1496.

Kulldorff, M., R. Heffernan, J. Hartman, R. Assunçao, and F. Mostashari (2005). A space–time permutation scan statistic for disease outbreak detection. PLoS medicine 2(3), e59.

Kwan, M.-P., I. Casas, and B. Schmitz (2004). Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks? Cartographica: The International Journal for Geographic Information and Geovisualization 39(2), 15–28.

Lagonigro, R., R. Oller, J. C. Martori, et al. (2017). A quadtree approach based on european geographic grids: reconciling data privacy and accuracy.

Lawson, A. B., J. Choi, B. Cai, M. Hossain, R. S. Kirby, and J. Liu (2012). Bayesian 2-stage space-time mixture modeling with spatial misalignment of the exposure in small area health data. Journal of agricultural, biological, and environmental statistics 17, 417–441.

Levine, N. (2006). Crime mapping and the crimestat program. Geographical analysis 38(1), 41–56. Liang, S., B. P. Carlin, and A. E. Gelfand (2008). Analysis of minnesota colon and rectum cancer point patterns

with spatial and nonspatial covariate information. The annals of applied statistics 3(3), 943. Little, R. J. (1993). Statistical analysis of masked data. Journal of Official Statistics 9(2), 407. Lu, Y., C. Yorke, and F. B. Zhan (2012). Considering risk locations when defining perturbation zones for geo-

masking. Cartographica: The International Journal for Geographic Information and Geovisualization 47(3), 168–178.

Machanavajjhala, A., D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber (2008). Privacy: Theory meets practice on the map. In 2008 IEEE 24th international conference on data engineering, pp. 277–286. IEEE.

Matheron, G. (1963). Principles of geostatistics. Economic geology 58(8), 1246–1266. Olson, K. L., S. J. Grannis, and K. D. Mandl (2006). Privacy protection versus cluster detection in spatial

epidemiology. American Journal of Public Health 96(11), 2002–2008. Ord, J. K. and A. Getis (1995). Local spatial autocorrelation statistics: distributional issues and an application.

Geographical analysis 27(4), 286–306. 11

Paiva, T., A. Chakraborty, J. Reiter, and A. Gelfand (2014). Imputation of confidential data sets with spatial locations using disease mapping models. Statistics in medicine 33(11), 1928–1945.

Quick, H. (2021). Generating poisson-distributed differentially private synthetic data. Journal of the Royal Statistical Society Series A: Statistics in Society 184(3), 1093–1108.

Quick, H. (2022). Improving the utility of poisson-distributed, differentially private synthetic data via prior predictive truncation with an application to cdc wonder. Journal of Survey Statistics and Methodology 10(3), 596–617.

Quick, H., S. H. Holan, and C. K. Wikle (2015). Zeros and ones: a case for suppressing zeros in sensitive count data with an application to stroke mortality. Stat 4(1), 227–234.

Quick, H., S. H. Holan, and C. K. Wikle (2018). Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing. Journal of the Royal Statistical Society Series A: Statistics in Society 181(3), 649–661.

Quick, H., S. H. Holan, C. K. Wikle, and J. P. Reiter (2015). Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography. Spatial Statistics 14, 439–451.

Quick, H. and L. A. Waller (2018). Using spatiotemporal models to generate synthetic data for public use. Spatial and Spatio-Temporal Epidemiology 27, 37–45.

Rebollo-Monedero, D., J. Forné, and M. Soriano (2011). An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data & Knowledge Engineering 70(10), 892–921.

Reiter, J. P. and R. Mitra (2009). Estimating risks of identification disclosure in partially synthetic data. Journal of Privacy and Confidentiality 1(1).

Ripley, B. D. (1976). The second-order analysis of stationary point processes. Journal of applied probabil- ity 13(2), 255–266.

Rubin, D. B. (1993). Statistical disclosure limitation. Journal of Official Statistics 9(2), 461–468. Sakshaug, J. W. and T. E. Raghunathan (2010). Synthetic data for small area estimation. In J. Domingo-Ferrer

and E. Magkos (Eds.), Privacy in Statistical Databases, Berlin, Heidelberg, pp. 162–173. Springer Berlin Heidelberg.

Sakshaug, J. W. and T. E. Raghunathan (2014). Generating synthetic data to produce public-use microdata for small geographic areas based on complex sample survey data with application to the national health interview survey. Journal of Applied Statistics 41(10), 2103–2122.

Seidl, D. E., P. Jankowski, and A. Nara (2019). An empirical test of household identification risk in geomasked maps. Cartography and Geographic Information Science 46(6), 475–488.

Seidl, D. E., G. Paulus, P. Jankowski, and M. Regenfelder (2015). Spatial obfuscation methods for privacy protection of household-level data. Applied Geography 63, 253–263.

Shi, X., J. Alford-Teaster, and T. Onega (2009). Kernel density estimation with geographically masked points. In 2009 17th International Conference on Geoinformatics, pp. 1–4. IEEE.

Si, Y. and J. P. Reiter (2013). Nonparametric bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of educational and behavioral statistics 38(5), 499–521.

Soria-Cormas, J. and J. Drechsler (2013). Evaluating the potential of differential privacy mechanisms for census data. In UNECE Work Session on Data Confidentiality.

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International journal of uncertainty, fuzziness and knowledge-based systems 10(05), 557–570.

Taddy, M. A. and A. Kottas (2012). Mixture modeling for marked poisson processes. Bayesian Analysis 7(2). Voronoi, G. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques.

deuxième mémoire. recherches sur les parallélloèdres primitifs. Journal für die reine und angewandte Mathematik (Crelles Journal) 1908(134), 198–287.

Wang, H. and J. P. Reiter (2012). Multiple imputation for sharing precise geographies in public use data. The annals of applied statistics 6(1), 229.

Wightman, P., W. Coronell, D. Jabba, M. Jimeno, and M. Labrador (2011). Evaluation of location obfuscation techniques for privacy in location based information systems. In 2011 IEEE Third Latin-American Conference

12

on Communications, pp. 1–6. IEEE. Zandbergen, P. A. (2014). Ensuring confidentiality of geocoded health data: assessing geographic masking

strategies for individual-level data. Advances in medicine 2014. Zhang, S., S. M. Freundschuh, K. Lenzer, and P. A. Zandbergen (2017). The location swapping method for

geomasking. Cartography and Geographic Information Science 44(1), 22–34. Zhou, Y., F. Dominici, and T. A. Louis (2010). A smoothing approach for masking spatial data. The Annals of

Applied Statistics 4(3), 1451–1475. DOI: 10.1214/09-AOAS325. Zimmerman, D. L. and C. Pavlik (2008). Quantifying the effects of mask metadata disclosure and multiple

releases on the confidentiality of geographically masked health data. Geographical analysis 40(1), 52–76. Zurbarán, M., P. Wightman, M. Brovelli, D. Oxoli, M. Iliffe, M. Jimeno, and A. Salazar (2018). Nrand-k:

Minimizing the impact of location obfuscation in spatial analysis. Transactions in GIS 22(5), 1257–1274.

13

  • 1. Introduction
  • 2. Data Protection Strategies
    • 2.1. Aggregation
    • 2.2. Geographic Masking
    • 2.3. Synthetic Data
  • 3. Risk and Utility Assessment
    • 3.1. Risk Evaluation
    • 3.2. Utility Evaluation
  • 4. Conclusion
  • References

AN OVERVIEW OF DATA PROTECTION STRATEGIES FOR INDIVIDUAL-LEVEL GEOCODED DATA UNECE Expert meeting on Statistical Data Confidentiality

Wiesbaden, 26-28 September 2023

Maike Steffen

Konstantin Körner

Jörg Drechsler

// PageSteffen, Körner, Drechsler

BACKGROUND

• More and more geo-referenced data are being collected

– Important for various research areas (e.g., to assess neighborhood effects, mobility patterns)

– Highly identifying, availability for research is limited

– IAB project on geo-referenced data → how to anonymize these data?

• Three main strategies for confidentiality protection

– Aggregation

– Geographic Masking

– Synthetic data

Data Protection Strategies for Geocoded Data 2

// PageSteffen, Körner, Drechsler

AGGREGATION

• Aggregation within pre-defined areas

– Administrative areas

– (Standardized) Grid cells

• Flexible aggregation

– Population-adjusted grid cells

– Microaggregation

Data Protection Strategies for Geocoded Data 3

External data can easily be linked Loss of spatial information

Choice of aggregation level can bias results

More efficient trade-off between confidentiality protection and utility

Cannot easily be linkend to

external data

Harder to interprete

// PageSteffen, Körner, Drechsler

GEOGRAPHIC MASKING

• Deterministic masking approaches

• Random perturbation

– Original locations are randomly displaced

– Different methods to draw maximum or minimum displacement distance

– Possibility to adapt for population density

Data Protection Strategies for Geocoded Data 4

Widely used, straightforward method

Point-locations as output

No guaranteed level of privacy protection, especially in rural areas or areas with heterogenous population density

Displacement within a circle

Donut masking Gaussian masking Bimodal gaussian masking

// PageSteffen, Körner, Drechsler

GEOGRAPHIC MASKING

• Deterministic masking approaches

• Random perturbation

– Original locations are randomly displaced

– Different methods to draw maximum or minimum displacement distance

– Possibility to adapt for population density

Data Protection Strategies for Geocoded Data 4

Widely used, straightforward method

Point-locations as output

No guaranteed level of privacy protection, especially in rural areas or areas with heterogenous population density

// PageSteffen, Körner, Drechsler

GEOGRAPHIC MASKING SOMETIMES OFFERS LITTLE PROTECTION

Data Protection Strategies for Geocoded Data 5

outlier

// PageSteffen, Körner, Drechsler

GEOGRAPHIC MASKING SOMETIMES OFFERS LITTLE PROTECTION

Data Protection Strategies for Geocoded Data 5

outlier

// PageSteffen, Körner, Drechsler

GEOGRAPHIC MASKING SOMETIMES OFFERS LITTLE PROTECTION

Data Protection Strategies for Geocoded Data 5

outlier

// PageSteffen, Körner, Drechsler

GEOGRAPHIC MASKING SOMETIMES OFFERS LITTLE PROTECTION

Data Protection Strategies for Geocoded Data 5

outlier

// PageSteffen, Körner, Drechsler

GEOGRAPHIC MASKING

• Location swapping (Zhang et al., 2017)

– Original location is swapped with another location within a circle or donut

• Adaptive Areal Masking (Kounadi & Leitner, 2016)

– random perturbation within pre-defined areas with at least &#x1d458; location points

– Guarantees a certain level of anonymity

– High alteration of locations

Data Protection Strategies for Geocoded Data 6

// PageSteffen, Körner, Drechsler

SYNTHETIC DATA

Synthesizing of non-geographic variables

• Account for spatial structure to synthesize non-geographic variables

• Data release

– Detail level of geographic information

– Separate release of 2 data sets (Koebe et al.

2023)

Data Protection Strategies for Geocoded Data 7

Synthesizing of geographic information

• Aggregated data (Quick, 2021; 2022; Paiva et al., 2014)

• Exact geographic coordninates (Wang & Reiter,

2012; Drechsler and Hu, 2021)

• Fully synthetic data (e.g., Quick et al., 2015)

RISK AND UTILITY ASSESSMENT

// PageSteffen, Körner, Drechsler

RISK ASSESSMENT

• K-anonymity

– Definition: a record must be indistinguishable from at least &#x1d458; − 1 other records

– Spatial k-anonymity for masking methods: measure the number of locations within a radius equal to the displacement distance

(1) number of locations around the original point

(2) number of locations around the masked location

– Problems with this measurement

• Alternatives

– Record linkage attacks (Drechsler and Hu, 2021; Quick et al. 2015)

– Assessment of overfitting regarding spatial outliers (Quick et al. 2018)

Data Protection Strategies for Geocoded Data 9

// PageSteffen, Körner, Drechsler

SPATIAL K-ANONYMITY EXAMPLE

Data Protection Strategies for Geocoded Data 10

outlier

// PageSteffen, Körner, Drechsler

SPATIAL K-ANONYMITY EXAMPLE

Data Protection Strategies for Geocoded Data 10

outlier

// PageSteffen, Körner, Drechsler

UTILITY ASSESSMENT

Comparison of original and anonymized data

1. Point locations and density measures

– Distances between original and masked locations

– Heatmaps using Kernel density estimation

2. Clustering

3. Spatial autocorrelation

4. Applied results

Data Protection Strategies for Geocoded Data 11

// PageSteffen, Körner, Drechsler

CONCLUSION

• Three main strands of confidentiality protecting strategies

• Some common masking techniques do not provide adequate confidentiality protection

• Common risk measures should be carefully evaluated

Data Protection Strategies for Geocoded Data 12

Aggregation

Fixed areas

Flexible aggregation

Geographic Masking

Deterministic methods

Random noise

Record swapping

Other

Synthetic Data

Synthesizing of non- geographic variables

Synthesizing of geographic variables

// PageSteffen, Körner, Drechsler

KEY REFERENCES

Drechsler, J. and J. Hu (2021). Synthesizing Geocodes to Facilitate Access to Detailed Geographical Information in Large-Scale Administrative Data. Journal of Survey Statistics and Methodology 9(3), 523–548.

Koebe, T., A. Arias-Salazar, and T. Schmid (2023). Releasing survey microdata with exact cluster locations and additional privacy safeguards. Humanities and Social Sciences Communications 10(1), 1–13.

Kounadi, O. and M. Leitner (2016). Adaptive areal elimination (aae): A transparent way of disclosing protected spatial datasets. Computers, Environment and Urban Systems 57, 59–67.

Paiva, T., A. Chakraborty, J. Reiter, and A. Gelfand (2014). Imputation of confidential data sets with spatial locations using disease mapping models. Statistics in medicine 33(11), 1928–1945.

Quick, H. (2021). Generating poisson-distributed differentially private synthetic data. Journal of the Royal Statistical Society Series A: Statistics in Society 184(3), 1093–1108.

Quick, H. (2022). Improving the utility of poisson-distributed, differentially private synthetic data via prior predictive truncation with an application to cdc wonder. Journal of Survey Statistics and Methodology 10(3), 596–617.

Quick, H., S. H. Holan, C. K. Wikle, and J. P. Reiter (2015). Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography. Spatial Statistics 14, 439–451.

Quick, H., S. H. Holan, and C. K. Wikle (2018). Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing. Journal of the Royal Statistical Society Series A: Statistics in Society 181(3), 649–661.

Quick, H. and L. A. Waller (2018). Using spatiotemporal models to generate synthetic data for public use. Spatial and Spatio-Temporal Epidemiology 27, 37–45.

Sakshaug, J. W. and T. E. Raghunathan (2010). Synthetic data for small area estimation. In J. Domingo-Ferrer and E. Magkos (Eds.), Privacy in Statistical Databases, Berlin, Heidelberg, pp. 162–173. Springer Berlin Heidelberg.

Sakshaug, J. W. and T. E. Raghunathan (2014). Generating synthetic data to produce public-use microdata for small geographic areas based on complex sample survey data with application to the national health interview survey. Journal of Applied Statistics 41(10), 2103–2122.

Wang, H. and J. P. Reiter (2012). Multiple imputation for sharing precise geographies in public use data. The annals of applied statistics 6(1), 229.

Zhang, S., Freundschuh, S. M., Lenzer, K., & Zandbergen, P. A. (2017). The location swapping method for geomasking. Cartography and Geographic Information Science, 44(1), 22-34.

Zhou, Y., F. Dominici, and T. A. Louis (2010). A smoothing approach for masking spatial data. The Annals of Applied Statistics 4(3), 1451–1475. DOI: 10.1214/09-AOAS325.

Data Protection Strategies for Geocoded Data 14

CONTACT

Maike Steffen

[email protected]

Remote Access for Scientific Use Files – a New Pathway for German Official Statistics Microdata Access, DESTATIS Germany

remote access, data access path, microdata, microdata for scientific purposes

Languages and translations
English

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

Expert Meeting on Statistical Data Confidentiality

26-28 September 2023, Wiesbaden

Remote Access for Scientific Use Files – a New Pathway for German Official

Statistics Microdata Access

Hanna Brenzel ( Research Data Centre of the Federal Statistical Office)

Katharina Cramer (Research Data Centre of the Statistical Offices of the Federal States)

Volker Güttgemanns (Research Data Centre of the Statistical Offices of the Federal States)

Marcel Mathes (Research Data Centre of the Statistical Offices of the Federal States)

[email protected]

Abstract

The fundamental goal of the Research Data Centre of the Federal Statistical Office and the Research Data Centre of the

Statistical Offices of the Federal States (RDC) is not only to provide access to official statistics microdata, but also to

continuously improve and adapt the access to the changing needs of empirical science. In order to meet the broad range of

needs of the empirically working scientific community, the RDC have offered different access paths since their founding, through which differently anonymised data products are made available. Now, the RDC come up with a new remote access

prototype system including a new data product. All access paths differ both in terms of the anonymity degree of the

provided microdata as well as in the access way of data provision. At first, existing and firmly established data access

paths are outlined and their contractual and legal conditions explained. Subsequently, the newly installed remote access

prototype and its features and requirements are presented. Provided that the ongoing evaluation phase turns out positive,

this data access option will define one more way of data access operated regularly in its full version from 2024 onwards.

The analysis potential of the data provided therein will classify between the scientific-use files transmitted to the scientific

institutions and the data provided for on-site analysis at the RDC safe centres. This paper highlights various challenges,

such as data protection requirements and legal framework conditions, which must be considered.

2

1 Introduction

With the establishment of the Research Data Centre (RDC) of the Federal Statistical Office in the fall of 2001

and with the RDC of the statistical offices of the Länder in April 2002, an important cornerstone and a central intersection was created between the scientific community and official statistics as data and information service

provider.

Together, the RDCs offer the empirically working scientific community a coordinated range of data and services for the scientific use of high-quality microdata from official statistics.

Over time, however, expectations of the RDCs have evolved fundamentally, and stakeholders in politics and

scientific communities have been pushing for substantial improvements in data access and data usage capabilities for some time.

Remote access represents an up to date and modern way of accessing data and is accordingly demanded by data

users. The statistical offices of other European countries (e.g., the Netherlands, France or Finland) can be

mentioned as reference benchmarks. They have created the legal and technical prerequisites to make their data available to researchers via remote access some time ago. Last but not least, a remote access system is currently

being set up at European level by Eurostat.

On one hand, the establishment of a remote access system - with the investment in a connectable infrastructure - will advance the continuous development of the RDC. By catering towards the needs of the scientific

community, the status of the RDC as a modern data provider will be consolidated. On the other hand, the

currently complex and inefficient system of data access can be streamlined to a uniform and manageable system without limiting the flexibility of the users.

2 Status Quo

The RDC of the Federal Statistical Office, together with the RDC of the statistical offices of the Länder, offer

access to more than 3,000 different data products for over 90 statistics for scientific use via different ways of

access. They differ both in terms of the anonymity of the accessible data and in the type of data provision. Generally, the existing ways of data access can be divided into two categories, as figure 1 illustrates. In the case

of the so-called "on-site access", the data remains in the secure areas of the statistical offices of the Federation

and the Federal States. Since the RDCs can closely control the access to the data and provide output only after

confidentiality check, the data are only weakly anonymized. With the "off-site access," on the other hand, users can work with the individual data at their own institutes. Since the output are not checked by the data centers,

the individual data has to be more anonymized.

The category “off-site” includes the so-called Public Use Files (PUF), Campus Files (CF) and Scientific Use Files (SUF). “On-site” includes PC workplaces at the RDC, so called “safe centers” and remote execution (see

the homepage of the RDC, https://www.forschungsdatenzentrum.de/en/access).

Safe centers exist in all locations of both RDC. These can be used by researchers to analyse microdata inside

the safe premises of the statistical offices. As the individual data are already protected by the regulation of data

access and the equipment of the PC workstation, formally anonymous microdata can be provided at the safe

centers. Thus, a nationwide infrastructure in Germany is available for these data. The safe centers are equipped with common statistical programs (Stata, R as well as partly SPSS and SAS) and

are completely isolated from the outside. A separate PC workstation with internet connection is available for e-

mail communication and internet searches. In contrast to the safe centers, remote execution does not provide direct access to the microdata. Instead, data

structure files are made available that resemble the original material with regard to structure and variable

values, but do not permit any analyses in terms of content and do not hold any risk of exposing confidential

information. Using these data sets, program codes can be prepared by the users using the statistical programs SPSS, SAS, Stata or R. These program codes are applied by staff of the statistical offices to analyse the original

data. The data users receive the results of those analyses after the relevant confidentiality checks.

SUFs are standardized datasets created by the RDC for popular statistics. SUFs offer lower potential for

analyses than on-site ways of access, but are designed to be suitable for a large proportion of scientific research

projects. Due to the de facto anonymization of microdata, they may be used outside the protected premises of official statistics according to Sect. 16 para. 6 nos. 1 BStatG. Due to legal restrictions, SUF may only be used

3

by researchers who are employed by a research institution that is registered and located in Germany. The use of

SUF may only take place in Germany. Until recently, the SUF were sent by DVD to the respective scientific institution with which user contract was concluded. Since June 2023, recent modernization measures now allow

the SUF to be accessed directly via a download portal to the institution authorized to use the data.

In particular, on-site ways of access entail additional work for both data users and RDC staff. At the same time,

the share of data uses via these access paths steadily increases over time compared to off-site uses. The development of a remote access system therefore pursues the goal of ensuring the technical connectivity to a

modern and demand-oriented data provision for the scientific community. With this technology, the increased

expectations of the research community for an up-to-date and modern data provision can be fulfilled in the long term. In addition, the remote access system holds potential for future innovation by reducing or substituting

existing labor-intensive ways of access (reduction of on-site support, reduction of coordination of appointments

with users, reduction of coordination and support of remote execution, etc.). Consequently, the scarce resources of the RDC could be invested more efficiently, for example in supporting additional data usage or further

developing the data and service offers. At the same time, there is increased potential regarding data parsimony,

as it is expected that this system will reduce the number of intermediate results per project that require

confidentiality checks. Furthermore, the RDC aim to sustainably strengthen their leading role in the group of German RDC.

Figure 1: Ways of data access at the research data centres (RDC) of the statistical offices of the Federation and the Federal States

3 The Remote Access System

3.1 The technical structure

IT and data security play a crucial role in setting up the remote access system. The aim is to ensure that the

remote access system is implemented in compliance with the law while maintaining the required IT security

standards.

A virtual desktop infrastructure based on CITRIX was chosen as the IT-architecture. The system components

set up are located in the so-called IDMZ (Internet Demilitarized Zone), in which procedures are operated that

are to be accessible from the Internet. In the IDMZ, a distinction is made between three areas: Access Area

(Pex), Application Area (Pin1) and Data Area (Pin2). These three areas are separated from each other by

firewalls, which only allow approved communication between the neighboring areas within the application. A

so-called transport encryption secures the communication path between the server and the client.

Two-factor authentication and IP whitelisting are implemented as additional IT security measures for the Citrix

solution. Two-factor authentication means that, in addition to the user-specific work accounts protected by a

personal password, a uniquely generated token must be used for each log-in. IP whitelisting allows only

specific IP addresses to gain access to the remote access system. Prior to each authorized use, the IP address of

the respective facility is allowed or added to the whitelist. This ensures that unauthorized IP addresses do not

initially gain access to the system. This implements geoblocking as a technical measure as well as

strengthening protection against possible (automated) attack attempts.

In addition, app protection is used to, among other things, prevent the user from taking screenshots of the data.

Remote system access is controlled on a per user basis by an access management system, only authorized users

are granted access. Within the system, authorizations are limited to the extent required for data analysis. The

creation of user-specific working accounts, which are managed centrally and secured by the user and access

management, ensures that access is only possible to requested data. Each account is linked to a data folder in

which user-specific official microdata are stored by RDC staff.

In addition to the technical measures, a number of technical and contractual-organizational measures are

introduced to increase data protection. Before the data can be accessed, a user contract has to be concluded

between the scientific institution and the responsible statistical office. It is contractually stipulated that up-to-

4

date software, operating system and virus protection are used on the client side when accessing the virtual

desktop infrastructure. As well as, re-identification of individual cases is illicit. The RDC are legally bound to

check all statistical results for statistical confidentiality that were created within the context of scientific

projects based on provided microdata. This serves the protection of data according to section 16 (6) of the

Federal Statistics Law (BStatG). Should individual cases be part of the output then they have to be blocked

consistently across all results of a project. Data users who plan to re-identify individual cases are liable to

prosecution and are expelled from further data uses.

In order to ensure that the system is tied to a specific location, its use is contractually established and sanctions

are imposed in the event of violations. In addition, it is contractually stipulated that scientific institutions can be

excluded from using the remote access system or from the possibility of carrying out further research projects

via the RDC in the event of serious violations of the terms of use. In the event of a striking breach of contract,

the scientific institutions can also be sanctioned with a penalty payment of up to EUR 20,000.

3.2 Data material in the remote access system

Remote access to formally anonymized data is not feasible within the current legal framework. One possible

way of implementation is to offer remote access for de facto anonymized data with slight modifications, as this

would not require amendment of the law. In this case, the degree of data modification is of utmost relevance: If

the level of anonymization is too high, the data offered will not meet the needs of the scientific community; if

the level of data anonymization is too low, confidentiality can no longer be maintained. The degree of de facto

anonymization therefore largely determines the benefits and coverage of the demand of the scientific

community. In addition, the expected effects on the capacity of the RDC heavily depend on covering as many

of the science community's projects as possible via the remote access system and, in particular, on reducing the

costly uses of remote execution. However, this goal can only be achieved if significantly more data can be

provided via remote access than via the current dissemination path via off-site SUF.

Microdata are described as “de facto anonymous” if it is not possible to completely rule out de-anonymization

but assigning the information to the respective statistical unit “requires unreasonable effort in terms of time,

cost and manpower” (Section 16 (6) of the Federal Statistics Act). According to the Federal Statistics Act,

however, de facto anonymous data may only be used by scientific institutions and only to carry out scientific

projects.

When creating de facto anonymity, the aim is to virtually eliminate the probability of correctly assigning data to

respondents, while preserving the statistical information content as much as possible. Different anonymization

methods can be used for this purpose. Common methods are information reduction (e.g. aggregation, class

formation, censoring) and information modification (e.g. swapping). In order to determine de facto anonymity,

the effort and benefit of deanonymization must be evaluated.

Factual anonymity thus does not completely exclude the possibility of re-identification, but puts its risk in a

cost/benefit ratio. Costs for data users primarily include the consequences for actions in violation of the

contract. Re-identification is strictly prohibited and punishable by fine or imprisonment (Section 203 StGB). In

addition, consequences such as loss of reputation, loss of access to data of official statistics, etc., which threaten

in the event of de-anonymization of the data, must also be considered by scientific users. This is because the

users are obligated to maintain the anonymity of the data both by the formal obligation and the user agreement.

Factual anonymity therefore does not result solely from the remaining information content of the data, but is

composed of a triad: 1) modification of the data material, 2) technical/organizational measures, and 3)

contractual measures. Therefore, it also depends on the access condition, if a microdata set can be described as

Figure 2: Technical infrastructure of the remote access system

5

de facto anonymous. Of crucial importance here is what additional knowledge is available and where the data

access takes place. Depending on whether the microdata is used outside or inside the statistical offices, de facto

anonymity can be achieved with more (off-site SUF) or less (on-site SUF) severe losses of information.

The de facto anonymity of microdata from official statistics is thus not a fixed quantity, but can be mapped

along a continuum. In principle, it can be stated: The higher the technical and contractual measures, the fewer

anonymization measures need to be taken and the higher the analysis potential of the data.

No technical measures are used for the previous off-site SUFs. Factual anonymity must therefore only be

achieved from the two remaining measures: in addition to the contractual commitment and the commitment of

the users, de facto anonymity is achieved by strongly anonymizing the data material itself. For this purpose, a

statistics-specific anonymization concept is developed for each data material.

With the new remote SUF or on-site SUF, de facto anonymity can be achieved by significantly less

modification of the data. This is justified by the high level of technical measures and the associated possibility

to control the data access. In contrast to off-site SUF, the data is not passed on. It is solely possible to view the

data via a virtual desktop (VDI environment). A so-called "transport encryption" secures the communication

path between the server (sender) as well as the client (receiver). An exchange between the technical

infrastructure of the data users and the data on the server of the official statistics or a download of the official

data is thus technically impossible. Thus, unauthorized data linkage is impossible and the RDC has a high level

of use control via log files. With regard to the risk of de-anonymization, data access via remote access therefore

reduces many risks compared to the previous off-site SUFs.

3.3 The use of Remote Access

The remote access system, which is currently under construction, will be set up as a classic remote desktop

version. As in the past, scientific institutions that are entitled to use the system in accordance with Section 16

BStatG have to apply for data access. If the application is approved, the researchers are then able to access the

secure area within their scientific institution by using their own hardware. Within the secure area common

statistical software such as RStudio and Stata is available. The major advantage compared to remote execution

is that researchers can see the microdata and do not have to "blindly" program their syntaxes as before (see

Figure 3). By working directly with and being able to view the data, it should be possible to significantly

reduce the number of intermediate results previously generated via remote execution, thus minimizing a very

labor-intensive process step in the RDC. The goal should be that only final outputs are checked for

confidentiality by the RDC staff and will be released. This also supports the principle of data parsimony.

Figure 3: Remote Access at the RDC

Work on setting up such a system began in November 2021. The system is currently in the evaluation phase.

On one hand, the technical implementation of the system is being tested and its resilience checked using penetration tests. On the other hand, the user-friendliness and the attractiveness of the data material provided is

to be examined thoroughly. In a first step, only absolutely anonymous data material was made available via the

system for a selected group of people. In a second step, off-site SUFs will then be made available to power users who have already completed a valid user application with the RDC. The third step will then be to test the

redesigned on-site/remote SUF material. Since the system requires a redesign of all statistics-specific

anonymization concepts, a gradual integration of the existing data products in the RDC is planned. The start will be made with the most requested data product, the microcensus. In order to be able to evaluate the

operating grade of the system appropriately, DRG statistics will be offered as one of the first data products in

the remote access system in addition to the microcensus. If the evaluation of the system is positive, other data

products that are of high demand will follow.

6

4 BIBLIOGRAPHY

Brenzel, Hanna / Zwick, Markus. An information infrastructure has emerged in Germany – the Research Data

Centre of the Federal Statistical Office. German version published in WISTA | 6 | 2022, p. 54 et seq.

Homepage of the Research Data Centre of the Federal Statistical Office and the Federal States

https://www.forschungsdatenzentrum.de/en

Remote Access for Scientific Use Files – a New Pathway for German Official Statistics Microdata Access UNECE - Expert Meeting on Statistical Data Confidentiality

26-28 September 2023, Wiesbaden

Hanna Brenzel, Katharina Cramer, Volker Güttgemanns, Marcel Mathes, Hariolf Merkle

Agenda

(1) Motivation

(2) Status Quo

(3) The Remote Access System

(4) Outline

destatis.de

freepik

26.09.2023Federal Statistical Office (Destatis) 3

Remote access for SUF…

• offers convenient data access for scientists from their own scientific institution and thus enables up-to-date and efficient data analysis

• offers the scientific community the opportunity to save travel and waiting time

• contributes sustainably to the further development of the RDC and its range of services

• drives the digitization of procedures and processes

• promotes the awareness of confidentiality requirements by the scientific community and favors faster statistical confidentiality checks, provision and publication of results

Motivation

destatis.de

26.09.2023Federal Statistical Office (Destatis) 4

Status Quo

On-site use Off-site use

Way of access Remote execution Safe centres Off-site Scientific Use Files

Public Use Files/ Campus Files

Degree of data anonymisation

Formally anonymous

Formally anonymous

De facto anonymous

Absolutely anonymous

Entitled to use Research institution Research institution Research institution All

Data storage during use Statistical offices Statistical offices Research institution Any

Location of users during use

Any Location of the RDC Research institution Any

Anonymisation

Analysis potential

destatis.de

26.09.2023Federal Statistical Office (Destatis) 5

Where to go…

On-site use Under construction

Off-site use

Way of access Remote execution Safe centres Remote access Off-site Scientific Use Files

Public Use Files/ Campus Files

Degree of data anonymisation

Formally anonymous

Formally anonymous

De facto anonymous

De facto anonymous

Absolutely anonymous

Entitled to use Research institution

Research institution

Research institution

Research institution All

Data storage during use Statistical offices Statistical offices Statistical offices Research institution Any

Location of users during use

Any Location of the RDC

Research institution

Research institution Any

Anonymisation

Analysis potential

destatis.de

Technical and organizational measures Technical measures

» Transport encryption

» Two-factor authentication IP whitelisting

» Server-side monitoring/logging/backup

» Operation in high security environment

» Limited scope of use and user-specific working accounts

» BSI-compliant hardware and software administration

Remote Access System

26.09.2023Federal Statistical Office (Destatis) 6

Organizational/contractual measures

» Clause on up-to-date software on the hardware of the facilities

» Contractual ban on re-identification

» Exclusion of the institution in case of misconduct

» Clause on location-based access to data

» Access only for eligible institutions according to Section 16 (6) Federal Statistical Act

» Fines for breach of contract

Anonymisation measures

» Anonymisation concept and special anonymisation of vulnerable units

destatis.de

Data material (I)

“de facto anonymity”

» “requires unreasonable effort in terms of time, cost and manpower” (Section 16 (6) of the Federal Statistics Act)

» de facto anonymity is thus not a fixed quantity, but can be mapped along a continuum

» the higher the technical and contractual measures, the fewer anonymization measures need to be taken and the higher the analysis potential of the data

Remote Access System

26.09.2023Federal Statistical Office (Destatis) 7

De facto anonymity

Technical measure

Organizational /contractual

measure

Anonymisation measures

destatis.de

Data material (II)

» Significant difference to the current dissemination path via off-site SUF

» With the new remote SUF, de facto anonymity can be achieved by less modification of the data. This is justified by the high level of technical measures and the associated possibility to control the data access

» In contrast to off-site SUF, the data is not passed on. It is solely possible to view the data via a virtual desktop

» Unauthorized data linkage is impossible and the RDC has a high level of use control compared to off-site SUFs

Remote Access System

26.09.2023Federal Statistical Office (Destatis) 8

destatis.de

Source: German version published in WISTA | 6 | 2022, p. 54 et seq.

26.09.2023Federal Statistical Office (Destatis) 9

Remote Access at the RDC

destatis.de

26.09.2023Federal Statistical Office (Destatis) 10

» Phase 2: Evaluation of the system

» multi-stage application and function tests for various user groups

» penetration, accessibility, load and performance tests

» feasibility and usability of SecureBootSticks

» System dimensioning for the desired number of usage accesses

» Expansion of the offered data materials

Outline

Contact Research Data Centre of the Federal Statistical Office

Phone: +49 611 / 75-2420 E-Mail: [email protected]

www.forschungsdatenzentrum.de

  • Slide 1: Remote Access for Scientific Use Files – a New Pathway for German Official Statistics Microdata Access
  • Slide 2: Agenda
  • Slide 3: Motivation
  • Slide 4: Status Quo
  • Slide 5: Where to go…
  • Slide 6: Remote Access System
  • Slide 7: Remote Access System
  • Slide 8: Remote Access System
  • Slide 9: Remote Access at the RDC
  • Slide 10: Outline
  • Slide 11: Contact

An overview of data protection strategies for individual-level geocoded data

data protection strategies, geocoded individual data, georeferencing individual data, confidentiality, access

Languages and translations
English

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

Expert meeting on Statistical Data Confidentiality 26–28 September 2023, Wiesbaden

An overview of data protection strategies for individual-level geocoded data Maike Steffen, Konstantin Körner, Jörg Drechsler

Institute for Employment Research (IAB)

[email protected]

Abstract In response to a growing need for small-scale geographic information in various research areas, data-collecting institutions are increasingly georeferencing individual-level data. However, due to confidentiality concerns, external researchers typically have very limited access to these data if at all, resulting in a substantial loss of informational value. A growing body of literature on data protection strategies for geocoded data attempts to find solutions for the tradeoff between privacy protection and utility preservation of the individual-level data. The purpose of this paper is to systematically collect and review the literature in the field and to offer a classification of existing methods. Various strategies for estimating the utility and the remaining risk of disclosure for the protected data are also discussed.

1 Introduction

Geocoded data have become increasingly relevant in various research areas since they offer insights that can only be acquired considering spatial context. The granular information enables researchers to include fine geographic patterns and spatial variation of individual characteristics in their analyses. The detailed geographical information facilitates studying such diverse topics as neighborhood effects, mobility patterns, or the spread of diseases to name only a few of the possible applications. Moreover, the geo-coordinates are not subject to changes over time as it is the case with administrative borders, which often hampers longitudinal analyses. Finally, the availability of detailed geographical information allows to easily merge information from various data sources. However, access to detailed geocoding information is currently limited as it is well known that detailed geo- graphical information is highly identifying (De Montjoye et al., 2013). To still enable access to this valuable source of information, various strategies have been proposed in the literature to protect confidentiality while still maintaining the utility of the collected information. This paper aims to give an overview of the various approaches. We also provide an overview of metrics that have been used to assess the disclosure risk and the utility of the protected data. The remainder of the paper is organized as follows. In Section 2, we review the three most popular approaches for protecting geocoded data: aggregation, geographic masking, and data synthesis. In Section 3, we discuss various tools which are used to assess the risk and utility of the protected data. Section 4 concludes the article.

2 Data Protection Strategies

Two general strategies are commonly applied to reduce the risk of disclosure when disseminating data to the public: information reduction and perturbation. Information reduction limits the amount of detail that is available in the data. This can range form discretizing continuous variables (e.g., reporting age in five-year intervals) over coarsening categorical variables (e.g., reporting only the first two digits of a hierarchical classification code such as the NACE code) to removing entire variables. Perturbation approaches try to preserve the level of detail contained in the original data. They reduce the risk of disclosure by slightly altering the microdata on the record level. Examples include noise infusion, top-coding, or swapping. Both strategies are also used when disseminating detailed geo-information. Aggregation as a form of information reduction is probably the most widely adopted strategy to reduce the risk of reidentification. We will review different aggregation strategies in more detail in Section 2.1. The early influential paper by Armstrong et al. (1999) lists two alternative strategies to aggregation that rely on perturbation: affine transformations and geographic masking. Affine transformations are methods that displace, rescale, or rotate the entire vector of original locations. Since they are completely deterministic, these methods are relatively easy to reverse engineer. They also lead to a substantial loss of information since the transformation of the original locations are data independent and thus spatial clustering effects found in the original data can be destroyed. Furthermore, external geographical information can no longer be linked to the transformed data in a reasonable way (Zandbergen, 2014). For these reasons, these methods have never been widely adopted and we will only review geographic masking in more detail in Section 2.2. In recent years, synthetic data approaches have emerged as another perturbation strategy. With synthetic data, original values are replaced with synthetic values drawn from a model fitted to the original data. We will review synthetic data approaches for disseminating detailed geo-information in Section 2.3.

2.1 Aggregation

As discussed earlier, aggregation is the most widely adopted strategy to reduce risks from reidentification. Aggregation does not alter the information, that is, the number of observations per aggregated unit remains

2

accurate and the location of individuals may be coarsened but will not be replaced by fake locations. However, it does lead to a loss of information and thereby reduces the range of applications the data can be used for. Broadly, there are two general aggregation strategies: aggregation within pre-defined areas, such as grid cells or administrative areas, and more spatially flexible microaggregation, which ensures that each aggregation cell contains a predefined number of records. The use of aggregation within pre-defined areas is by far the most commonly adopted approach, and guidelines to assign observations to standardized grid cells have been developed (e.g., INSPIRE, 2014). Using standardized formats comes with the advantage that additional spatial information such as climate, health, or economic data can be easily linked using these grid cells (Klumpe et al., 2020). At the same time, it is a rather inflexible strategy. If the uniformly sized grid cells are sufficiently small, they allow detailed analyses, but may not protect confidentiality adequately in sparsely populated cells. If they are large enough to protect confidentiality even in rural areas, there is a high information loss in urban areas. To address this issue, grid cell sizes can be adapted to the population density (e.g., Lagonigro et al., 2017). This approach, however, renders the linking of external grid cell data more difficult. Some researchers (e.g., Groß et al., 2017, 2020) have proposed to improve the utility of the aggregated data by applying a smoothing function based on kernel density estimators, which randomly reassigns the individuals to point locations within the aggregation cell. This strategy can, for example, be beneficial if the goal is to compute distance measures or for plotting the data on a map. Microaggregation techniques allow to flexibly adapt the size of the aggregation area to the desired level of protection (Domingo-Ferrer and Torra, 2005; Castro et al., 2022). Research on microaggregation in the context of geographic data mainly focuses on anonymizing digital trace data (see, e.g., Domingo-Ferrer and Trujillo- Rasua, 2012; Rebollo-Monedero et al., 2011), but the approach has also been adopted to achieve strong privacy guarantees for geocoded data based on the concept of differential privacy (Soria-Cormas and Drechsler, 2013). While microaggregation can protect privacy consistently, it creates irregular polygons that are somewhat difficult to interpret and cannot easily be linked to external geographic data.

2.2 Geographic Masking

Geographic masking relies on randomly displacing the original location to protect confidentiality. A variety of methods have been developed in this field. The simplest form of geographic masking assigns new locations by drawing a circle with fixed radius around the original location and randomly picking a new location on that circle (Zandbergen, 2014). With such a fixed displacement distance, the risk of re-engineering the original locations from the masked data can be relatively high (Zandbergen, 2014), hence random perturbation within a predefined maximum distance from the original location is more commonly used (see Armstrong et al., 1999; Kwan et al., 2004; Zandbergen, 2014; Hampton et al., 2010). This increases the level of protection as the actual displacement distance is unknown to the end user even if the masking approach is disclosed. Various strategies how to randomly draw the displacement distance have been proposed in the literature. One strategy is to use a uniform distribution within the radius of a circle centered on the original value (Armstrong et al., 1999; Zimmerman and Pavlik, 2008). Since this allows for the masked location to be very close or even equal to the original location, an alternative method called donut masking that provides higher confidentiality protection has been suggested (Hampton et al., 2010; Allshouse et al., 2010; Kounadi and Leitner, 2015). This masking method requires a minimum displacement distance additionally to the maximum displacement distance, forming a donut shape around the original location. An alternative approach to increase the displacement distance is N-Rand masking (Wightman et al., 2011), which also uses perturbation within a circle but draws &#x1d441; potential displacement locations. The location that is furthest away from the original location is then selected as the final displacement location. Instead of displacing the original locations within a circle with fixed radius and using a uniform distribution, some authors have suggested drawing the distance and direction of displacement from a bivariate Gaussian probability distribution (Cassa et al., 2006, 2008; Zimmerman and Pavlik, 2008). Compared to drawing from a uniform distribution, using a Gaussian distribution renders a displacement close to the original location more likely and therefore has little effect on spatial clusters (Cassa et al., 2006). Of course, a negative consequence is

3

an increased risk of disclosure as most of the masked locations will be close to the original location. A variant of this method therefore uses a bimodal Gaussian distribution to approximate donut masking (Zandbergen, 2014). Note that, although unlikely, extremely high displacement distances can drawn from a normal distribution for a small fraction of the locations (Armstrong et al., 1999). If population density in the data varies substantially, perturbation with fixed maximum distance (or fixed variance for the bivariate Gaussian approach) may lead to an unnecessarily large alteration of spatial information in highly populated areas where shorter displacement distances may suffice, and to privacy risks where population density is low and locations should be displaced more. This can be addressed by taking population density into account, such that the radius of the displacement area is larger in less densely populated areas (Kwan et al., 2004; Cassa et al., 2006; Hampton et al., 2010; Lu et al., 2012; Zurbarán et al., 2018). This results in masked data that are more similar to the original data in urban areas while offering a higher level of confidentiality protection in rural areas. With the bivariate Gaussian approach, the variance of the distribution can be set to be inversely proportional to the square of the population density (Cassa et al., 2006). However, as illustrated in Allshouse et al. (2010), using externally provided population density data on an administrative area level as a benchmark, as done for example in Cassa et al. (2006); Hampton et al. (2010), may not sufficiently protect confidentiality in areas with high population distribution heterogeneity. As a remedy, the authors suggest tripling the displacement distance in areas with heterogeneous population distribution. Kounadi and Leitner (2016) argue that, when information is available at the point level, the actual distance to the &#x1d458;th nearest neighbor should be used to determine displacement distance rather than using external population density data at the administrative-area level. In recent years, some authors proposed masking techniques that displace the original locations taking the actual position of the surrounding locations into account, such as Voronoi masking or location swapping (Seidl et al., 2015; Zhang et al., 2017). Voronoi Masking, developed by Seidl et al. (2015), is based on Voronoi polygons (Voronoi, 1908), which are shapes built around each single location with boundaries marking the half of the distance to the next location in any direction. A Voronoi polygon surrounding a point location contains all locations that are closer to this location than they are to any neighboring point locations in the data. In the masking process, each original location is moved to the closest point along the boundaries of its polygon, placing it in the middle between two actual locations. Seidl et al. (2019) find that this decreases map users’ beliefs in being able to re-identify households. The locations are, on average, moved less in areas with higher density of the original points. At the same time, a group of at least two locations that are remote but close to each other will likely be displaced less than would be the case using random perturbation methods, and multiple locations may be relocated to the same masked location. Since many masking approaches do not account for geographic characteristics or whether units exist at the masked location, they may generate unrealistic locations, such as within water bodies or parks. Zhang et al. (2017) propose a location swapping approach to address these concerns. This method draws a circle or donut around the original location with varying distances based on population density. Then, the original location is swapped with another location with similar geographic characteristics within the specified area. They find that location swapping yields higher values of &#x1d458;-anonymity (defined in Section 3.1) than random perturbation using the same displacement area. However, we note that when applying random perturbation techniques with a maximum displacement distance, and especially in scarcely populated areas, the actual level of &#x1d458; achieved can be lower than the level implied by commonly applied techniques to measure &#x1d458; and, thus, we generally do not recommend using this measure to assess the level of protection (we will discuss this problem in more detail in Section 3.1). To address the problem with distance based perturbation techniques, Kounadi and Leitner (2016) propose adaptive areal elimination masking that guarantees a minimum &#x1d458;-anonymity for every location. This method merges predefined shapes, e.g., administrative areas, until the number of locations per polygon is &#x1d458; or higher. The locations are then aggregated or randomly perturbed within each polygon. While this guarantees to achieve the desired level of &#x1d458;-anonymity, most polygons will contain (substantially) more than &#x1d458; units and therefore spatial patterns can be altered excessively.

4

2.3 Synthetic Data

An alternative to the information reduction and masking methods discussed in the previous sections is to replace the true observations with draws from a statistical model, i.e., to generate synthetic data. Such datasets aim to preserve distributional properties and the spatial structure of the original data. Since these patterns are preserved at a much smaller spatial level compared to other anonymization techniques, authors such as Quick et al. (2018); Lawson et al. (2012), and Bradley et al. (2017) argue that synthetic data is able to reduce the risk of ecological fallacies (i.e., misleading inferences from the protected data, see Freedman, 1999). Two general approaches are distinguished in the literature: fully and partially synthetic data. With fully synthetic data (Rubin, 1993), all records in the released data are synthetic. Since synthesizing all variables in a dataset can be challenging for large scale surveys, Little (1993) suggested synthesizing only those variables that are either sensitive or that could be used for re-identification. See Drechsler (2011); Drechsler and Haensch (2023) for a detailed overview on the topic. The approach has also been adopted in recent years for protecting data containing detailed geographical information. Two general strategies can be distinguished in the literature. Several papers do not synthesize the geographical information. Instead, they specifically account for the spatial structure of the data when synthesizing other variables in the dataset to improve the utility of the synthetic data. While these papers focus on protecting sensitive information in the data, i.e., reducing the risk of attribute disclosure, other approaches directly synthesize the geographical information, hence reducing the risk of reidentification. We will separately review the two strategies in the remainder of this section.

2.3.1 Synthesizing non-geographic variables while preserving the spatial information. Sakshaug and Raghu- nathan (2010) is one of the early papers that specifically adjust common synthesis strategies to preserve the detailed spatial information. The authors propose using mixed effects modeling strategies. Mixed effects synthe- sis models are a natural way to preserve the geographical clustering effect. These models are especially popular in the literature on small area estimation. The authors later (2014) extended their approach by incorporating area level covariates in the model, which allows to generate synthetic data even for small areas not included in the original sample. Zhou et al. (2010) offer a more rigorous treatment of the spatial information problem by modeling all variables as spatial processes and applying spatial smoothing when modeling the variables. They show that their method introduces bias for non-linear regression models and propose a strategy for choosing the smoothing function to keep this bias small. Yet another synthesis strategy is described in Quick et al. (2018), which uses a differential smoothing synthesizer for locations of home sale in San Francisco. Their approach is a two-step process. First, they model the log-transformed home sale prices using an unrestricted hierarchical model. Second, they identify spatial outliers based on the distances to their nearest neighbors, then fit a restricted hierarchical model to provide additional smoothing for higher protection. In a related approach, Quick and Waller (2018) also use a hierarchical Bayesian model that preserves spatial, temporal, and between age-groups dependencies. They synthesize county-level heart disease deaths to complete public use data, which would be suppressed at units with cases lower than 10. More recently, Koebe et al. (2023) suggest publishing two different versions of georeferenced data. The first version includes the original location, but all other attributes are synthesized using a Gaussian copula model. The second version omits the geographic identifier, but leaves the other attributes at their original values.

2.3.2 Synthesizing the geographical information. The first successful implementation of geographical synthesis was discussed in Machanavajjhala et al. (2008). The authors propose a strategy for synthesizing the place of living for all individuals working in the U.S. The synthesizer is used to generate the underlying data for an application called OnTheMap provided by the U.S. Census Bureau. This application graphically visualizes commuting patterns on a detailed geographical level. The authors used a Dirichlet/Multinomial model for synthesis and adjusted the Dirichlet priors such that they were able to prove that their synthesizer guaranteed some formal level of privacy called Y−&#x1d6ff;-probabilistic differential privacy (see Machanavajjhala et al. (2008) for details). However, the multinomial model used in this paper offers low utility if the population sizes or event rates are very heterogenious. To address this limitation, Quick (2021) suggests relying on Poisson models–popular

5

in the disease mapping literature–for differentially private data synthesis. He later extended the approach by incorporating public knowledge to further improve the utility of the synthesizer (Quick, 2022). Another synthesis strategy proposed by Wang and Reiter (2012) is to treat the detailed geocoding information as a continuous variable and use CART models to sequentially synthesize the longitude and latitude of the geocodes. This approach was later compared in Drechsler and Hu (2021) with two other synthesis strategies for the geocodes: using a Dirichlet Process of Mixtures of Products of Multinomials (Si and Reiter, 2013; Hu et al., 2018, DPMPM) and CART models treating the geocoding information as categorical variables. The authors find that the categorical CART models offer the highest utility, but also the highest risk of disclosure. When trying to increase the level of protection, they find it to be more effective to synthesize additional variables instead of aggregating the geocoding information to a higher grid level. Burgette and Reiter (2013) generate a partially synthetic dataset in which they synthesize the location of US census tract identifiers using a Bayesian multinomial model with a group of Dirichlet processes priors and a multiple shrinkage prior distribution. This framework is chosen because it shrinks the parameters toward a small number of learned locations, which increases the utility of the data. Paiva et al. (2014) use areal level spatial models (often called disease mapping models in the literature) to synthesize the geographical information. Although they start with exact geographies, their methods require defining fine grids over the spatial domain, then using the conditional autoregressive (CAR) model of Besag et al. (1991) to model the distribution of grid-counts. When synthesizing exact geographies, they recommend first to synthesize grid cells for each individual, and second to randomly assign each individual a location within the grid cells. The approach is computationally intensive and can be challenging to apply if the number of categorical variables or the number of levels within the variables is large. The authors also note that their partially synthetic data do not preserve the spatial pattern because the independent draws from the underlying Poisson model can imply that close geographic units in the original data might be far apart in the synthetic data. This caveat is considered by Quick et al. (2015) who extend the spatial modeling process of geo-coordinates using marked point process models, which simultaneously model the location and the variables (Liang et al., 2008; Taddy and Kottas, 2012). Specifically, the authors propose to model the data in three steps: (i) specify multinomial models for the categorical variables in the data, (ii) use a log-Gaussian Cox process to model the geographical location within each cell specified by cross classifying all categorical variables, and (iii) specify a normal regression for continuous variables given the categorical variables and location. The authors point out that estimating this model can be computationally intractable and suggest several steps and simplifying assumptions to reduce the computational burden.

3 Risk and Utility Assessment

Data dissemination always faces two conflicting goals: minimizing the risk of disclosure and maintaining the usefulness of the data. Therefore, it is crucial to always evaluate data protection strategies for both of these dimensions. In this section we review strategies that have been proposed in the literature to measure the utility and the level of protection for geocoded data that underwent some form of disclosure protection.

3.1 Risk Evaluation

The most commonly applied measure for evaluating the disclosure risk of masked geodata is spatial &#x1d458;-anonymity. It is related to the classical definition as proposed by Sweeney (2002), which states that &#x1d458;-anonymity is achieved if a record is indstinguishable from &#x1d458; − 1 other records in the dataset based on a set of prespecified variables (e.g. age, sex, education). Specifically, spatial &#x1d458;-anonymity is reached if a location is indistinguishable from at least &#x1d458; − 1 other locations. However, in practice it is interpreted in many different ways (Cassa et al., 2006; Allshouse et al., 2010; Hampton et al., 2010; Kounadi and Leitner, 2016; Zhang et al., 2017; Hasanzadeh et al., 2020).

6

There are two main definitions of &#x1d458;-anonymity for masked geodata. First, some researchers define spatial &#x1d458;-anonymity as the number of locations around the original point within a circle with radius equal to the displacement distance (Hampton et al., 2010; Allshouse et al., 2010). The second definition is to measure &#x1d458;-anonymity as the number of locations around the masked location that are within a circle with radius equal to the displacement distance (Lu et al., 2012; Zhang et al., 2017; Hasanzadeh et al., 2020). Note, however, that both approaches can overestimate the level of &#x1d458; , when random perturbation within a circle or donut is applied. This can be amplified if the maximum displacement distance depends on the population density (Allshouse et al., 2010) or is determined by the distance to the &#x1d458; &#x1d461;ℎ nearest neighbor. To illustrate, imagine one household located in an area with few observations or low population density which borders an urban area. If the displacement radius for this household is chosen to reach a certain level of &#x1d458;-anonymity, its maximum displacement distance will be relatively large reaching the outer areas of the urban area. A location in the urban area, on the contrary, has many neighbors in close proximity and will thus, taking &#x1d458;-anonymity as the objective, be displaced within a smaller area that does not include all possible displacements of the rural location. In this example, the rural location may be the only one that can be displaced far into the rural area. As a consequence an ill-intentioned user of the released data can be confident that a masked record in certain rural areas can only stem from one of the few observations in the rural area. Thus, neither counting the cases within a circle around the original point nor counting the cases within a circle around the masked point provides adequate information how well these points are protected. Kounadi and Leitner (2016) empirically demonstrate that to achieve the desired level of &#x1d458;-anonymity for close to 100% of the locations, the maximum distance of displacement needs to be substantially larger than the distance to the &#x1d458; &#x1d461;ℎ nearest neighbor. Beyond the (often flawed) risk assessment based on spatial &#x1d458;-anonymity, strategies for measuring the remaining risk of disclosure are surprisingly limited. Some authors discuss general aspects that impact the risk of disclosure. For example, Cassa et al., 2008 point out that risks of reidentification increase when multiple protected versions of the same georeferenced dataset are published. The original locations can then be approximated by averaging of the masked locations (assuming the same records can be uniquely identified in the different datasets). The more versions of the data are published, the higher the accuracy of this approximation. As Zimmerman and Pavlik (2008) point out, the risk is particularly high when the locations are labelled or details on the masking approach are disclosed such as the maximum displacement radius. A classical risk assessment strategy that has been used in some applications is to mount a record linkage attack. With these types of attacks, the intruder is assumed to possess some information about the units contained in the database (e.g., age, marital status, and employment status) and uses this information to identify units in the database. Risk measures based on record linkage attacks typically try to estimate how likely it is that such an attack will lead to a correct identification in the protected dataset. In the context of geocoded data, it is typically assumed that one of the attributes that is known to the attacker is the (approximately) exact location of the target record. Simulated record linkage attacks have for example been used in Drechsler and Hu (2021) (and implicitly in Koebe et al., 2023) to assess how well the different synthesis strategies protect the geographical information. Drechsler and Hu (2021) use risk measures originally proposed in Reiter and Mitra (2009) to specifically estimate reidentification risks for partially synthetic data. With this approach it is assumed that the attackers possess some background knowledge for a set of target records they wish to identify in the data. Based on this knowledge, they estimate the probability of a match for each unit in the released file. A match is declared for the record that has the highest average matching probability across the synthetic datasets. The risk is evaluated by means of these matches using two different measures. The first one calculates the expected number of correctly declared matches, i.e., the expected match risk. The second one calculates the number of correct unique matches, i.e., the true match rate. Another strategy to evaluate the level of protection specifically for partially synthetic data approaches was used in Quick et al. (2018). The authors focus on spatial outliers in the original data. For those records, they generate a large number of synthetic values by repeatedly drawing from the synthesis model. They then look at histograms of the generated values. If the spatial synthesis model is overfitting, the draws from the model will be centered around the true value with limited variability potentially indicating an unacceptable risk of disclosure. Using a related idea, Quick et al. (2015) and Quick and Waller (2018) compare synthesized values with the

7

true, confidential values. In light of privacy protection, the objective is here to obtain different values. Given that they propose releasing two versions of the same dataset (see Section 2.3), Koebe et al. (2023) measure the risk of correctly re-identifying the sensitive small-area identifiers (zip codes) in the unprotected data without geoinformation using information from the synthetic data. They train random forest models on the dataset in which the geolocations have been protected. The trained model is then run on the original data to predict the locations. The fraction of successful predictions denotes the risk measure.

3.2 Utility Evaluation

While offering a sufficient level of protection should always be the primary goal of any disclosure limitation strategy, it is crucial to also measure its impacts on utility. In the geocoding context, the utility is typically assessed by measuring to what extent the spatial structure of the data is maintained. The list of metrics that is used for this purpose in the literature is almost as large as the disclosure avoidance literature itself. Here, we only focus on the utility assessment based on spatial pattern retention. A more general discussion on utility evaluations can be found for example in Domingo-Ferrer et al. (2012). In the following, we will classify the various approaches into four broad categories: (1) point locations and density measures; (2) cluster analysis; (3) spatial autocorrelation; and (4) land use assessment.

3.2.1 Point Locations and Density Measures. Utility evaluations often start by graphically comparing the population densities of the confidential data and the protected data. A simple approach is to visually compare the locations on a map (e.g., Kwan et al., 2004). However, unless the original data is non-confidential, this approach can only be used internally, as the plots of the original data might spill sensitive information otherwise. A more versatile approach is to estimate the population density using kernel density estimation (Shi et al., 2009; Gatrell et al., 1996). The kernel density estimator creates a smooth density surface which allows to graphically compare the densities of the original and masked data on a heatmap (e.g., Kwan et al., 2004; Zandbergen, 2014). The heatmaps can be used to either visualize the density levels for each dataset separately or to directly display the discrepancies between the two densities. Beyond visualizing the population densities (e.g., Gatrell et al., 1996) the approach can also be used to measure spatial discrepancies in any other variable contained in the data. For example, Seidl et al. (2015) show differences in total warm water consumption among others.

3.2.2 Clustering. Another common approach to evaluate the utility of the protected dataset is to assess whether the data show similar clustering behavior as the original data. A descriptive statistic that is often used to describe clustering in a point pattern is Ripley’s &#x1d43e; function (see, e.g., Kwan et al., 2004; Zhang et al., 2017; Quick et al., 2015; Seidl et al., 2015; Drechsler and Hu, 2021). It is defined as expected number of points within a predefined radius around the location of interest normalized by the average point density across the entire geographical area covered in the data (Ripley, 1976; Kwan et al., 2004). It assesses to which extent a point pattern deviates from spatial homogeneity (Drechsler and Hu, 2021). Based on the &#x1d43e; function, the more easily interpretable &#x1d43f; function can be computed. It takes values close to zero for homogeneously distributed data, while positive values indicate heterogeneity or clustering. Closely related, the cross-&#x1d43e; function and its analog for the &#x1d43f; statistic assess the clustering of one point pattern relative to another point pattern, for example the underlying population distribution (Kwan et al., 2004). As an alternative measure, Zhang et al. (2017) apply an average nearest-neighbor analysis to quantify how well the spatial pattern of the original data is preserved. Specifically, they compute a nearest-neighbor index that consists of the average distances from each unit to its nearest neighbor (measured in, e.g., Euclidean or Manhattan distance). An index value similar to that of the original data indicates comparable clustering intensity. In a related approach, Lu et al. (2012) apply a nearest-neighbor index that compares the average distance to the nearest neighbor with the expected distance assuming a uniform distribution of the locations. Values below one indicate clustering. Seidl et al. (2015) use a nearest-neighbor hierarchical clustering analysis to compare the number of clusters on the first level (clusters of individual data points) in the data (see also Levine, 2006; Kounadi and Leitner, 2015). They also compare standard deviational ellipses between the original and the protected data. These ellipses cover the area that is within, say, one or two standard deviations from the center of

8

the cluster (Kounadi and Leitner, 2015). They facilitate understanding the two-dimensional clustering behavior. Another measure to assess clustering and to identify hotspots is the Gi* statistic proposed by Getis and Ord (1992); Ord and Getis (1995). The Gi* statistic can be used to test the null hypothesis of spatial independence. Rejecting the null hypothesis indicates clustering (Getis and Ord, 1992). Kounadi and Leitner (2015) develop an indicator that combines nearest-neighbor hierarchical clustering and the Gi* statistic. In health research, SatScan (Kulldorff, 1997) is a popular software tool for disease mapping. It can be used to identify spacial and temporal clustering in the data (Kulldorff et al., 2005). Several authors (Olson et al., 2006; Cassa et al., 2006; Hampton et al., 2010) use the software to compare the sensitivity and specificity of the underlying cluster detection approach run on the original and protected data. Finally, some researchers use the original and masked dots to identify a data-dependent geographical area. The utility of the protected data is assessed by measuring the overlap of this area between the two datasets. For example, Hasanzadeh et al. (2017) propose an approach that compares the similarity of individuals’ frequently visited points. Specifically, they extend the residential points to home areas, where the edges mark locations that are visited frequently. Large overlaps of the home areas of the protected and the confidential data indicate high similarity of individuals’ neighborhoods in both datasets.

3.2.3 Spatial Autocorrelation. While clustering analysis focuses on identifying the number and size of clusters in the data, spatial autocorrelation more generally assesses the spatial dependence in a point pattern. Both approaches are closely related. A prevalent measure for spatial autocorrelation is Moran’s I (e.g., Ord and Getis, 1995; Lu et al., 2012; Seidl et al., 2015). It tests whether the null hypothesis that the spatial autocorrelation is zero can be rejected. If this is the case, spatial autocorrelation can be assumed. Another common measure to compare spatial autocorrelation between datasets is the empirical semivariogram. (Matheron, 1963; Quick et al., 2018; Seidl et al., 2015)). It visualizes the homogeneity of non-geographic variables as a function of the distance between the locations. An output graph that increases and then flattens with further distance indicates positive spatial autocorrelation.

3.2.4 Land use. Another widely used approach to measure the utility of masked geodata is to compare the geography of the masked point-coordinates with their original counterparts. Quick and Waller (2018) and Zhang et al. (2017) consider, for instance, land cover categories or the proximity to roads. Regarding land cover rates, they compare whether the point-locations are in the same raster of either urban or rural areas. In an optimal scenario, the protected data would have the same share of points in urban areas as the original. Analogously, this applies to the proximity to roads, where the authors measure the closest distance of each point to the next road. The distances are compared using cumulative distribution functions (cdfs). The closer the two cdfs from the original and the protected data, the higher the utility of the protected data. Related works (e.g., Hasanzadeh et al., 2020) also evaluate other geographic characteristics such as the greenness of the surroundings.

4 Conclusion

Broad access to detailed geo-information can enhance the understanding of our society in numerous ways. Thus, it is not surprising that many data disseminating agencies are currently discussing how to provide access to these data for external researchers without compromising the confidentiality of the units contained in the data. Optimizing the trade-off between offering high utility granular information and sufficient data protection has been the subject of various methods for disclosure protection. In this paper, we have reviewed the literature on protection strategies for georeferenced microdata. Its main strands can be divided into coarsening the geo- information, masking it by altering, perturbing, or swapping the original locations, and disseminating synthetic data instead of the original data. We also discussed the different methods that are used to evaluate the risk and utility of the protected data. When assessing the risk of disclosure, we found that many papers rely on different notions of &#x1d458;-anonymity. We discussed a key concern with these notions, namely that for many of the distance based masking techniques, disclosure risks are underestimated based on this procedures as the obtained value

9

of &#x1d458; tends to be much larger than the true number of indistinguishable records. We therefore strongly advice against using spatial &#x1d458;-anonymity in this context. Regarding the utility evaluation, we conclude that there are many useful approaches discussed in the literature and that it would be an interesting avenue for future research to consolidate the plethora of different measures.

References

Allshouse, W. B., M. K. Fitch, K. H. Hampton, D. C. Gesink, I. A. Doherty, P. A. Leone, M. L. Serre, and W. C. Miller (2010). Geomasking sensitive health data and privacy protection: an evaluation using an e911 database. Geocarto international 25(6), 443–452.

Armstrong, M. P., G. Rushton, and D. L. Zimmerman (1999). Geographically masking health data to preserve confidentiality. Statistics in medicine 18(5), 497–525.

Besag, J., J. York, and A. Mollié (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the institute of statistical mathematics 43, 1–20.

Bradley, J. R., C. K. Wikle, and S. H. Holan (2017). Regionalization of multiscale spatial processes by using a criterion for spatial aggregation error. Journal of the Royal Statistical Society Series B: Statistical Methodology 79(3), 815–832.

Burgette, L. F. and J. P. Reiter (2013). Multiple-shrinkage multinomial probit models with applications to simulating geographies in public use data. Bayesian analysis (Online) 8(2).

Cassa, C. A., S. J. Grannis, J. M. Overhage, and K. D. Mandl (2006). A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection. Journal of the American Medical Informatics Association 13(2), 160–165.

Cassa, C. A., S. C. Wieland, and K. D. Mandl (2008). Re-identification of home addresses from spatial locations anonymized by gaussian skew. International journal of health geographics 7, 1–9.

Castro, J., C. Gentile, and E. Spagnolo-Arrizabalaga (2022). An algorithm for the microaggregation problem using column generation. Computers & Operations Research 144, 105817.

De Montjoye, Y.-A., C. A. Hidalgo, M. Verleysen, and V. D. Blondel (2013). Unique in the crowd: The privacy bounds of human mobility. Scientific reports 3(1), 1–5.

Domingo-Ferrer, J., L. Franconi, S. Giessing, E. Nordholt, K. Spicer, P. de Wolf, and A. Hundepool (2012). Statistical Disclosure Control. Wiley Series in Survey Methodology. Wiley.

Domingo-Ferrer, J. and V. Torra (2005). Ordinal, continuous and heterogeneous k-anonymity through microag- gregation. Data Mining and Knowledge Discovery 11, 195–212.

Domingo-Ferrer, J. and R. Trujillo-Rasua (2012). Microaggregation-and permutation-based anonymization of movement data. Information Sciences 208, 55–80.

Drechsler, J. (2011). Synthetic datasets for statistical disclosure control: theory and implementation, Volume 201. Springer Science & Business Media.

Drechsler, J. and A.-C. Haensch (2023). 30 years of synthetic data. arXiv preprint arXiv:2304.02107. Drechsler, J. and J. Hu (2021). Synthesizing Geocodes to Facilitate Access to Detailed Geographical Information

in Large-Scale Administrative Data. Journal of Survey Statistics and Methodology 9(3), 523–548. Freedman, D. A. (1999). Ecological inference and the ecological fallacy. International Encyclopedia of the

social & Behavioral sciences 6(4027-4030), 1–7. Gatrell, A. C., T. C. Bailey, P. J. Diggle, and B. S. Rowlingson (1996). Spatial point pattern analysis and its

application in geographical epidemiology. Transactions of the Institute of British geographers, 256–274. Getis, A. and J. K. Ord (1992). The analysis of spatial association by use of distance statistics. Geographical

analysis 24(3), 189–206. Groß, M., A.-K. Kreutzmann, U. Rendtel, T. Schmid, and N. Tzavidis (2020). Switching between different

non-hierachical administrative areas via simulated geo-coordinates: a case study for student residents in berlin. Journal of Official Statistics 36(2), 297–314.

10

Groß, M., U. Rendtel, T. Schmid, S. Schmon, and N. Tzavidis (2017). Estimating the density of ethnic minorities and aged people in berlin: multivariate kernel density estimation applied to sensitive georeferenced administrative data protected via measurement error. Journal of the Royal Statistical Society Series A: Statistics in Society 180(1), 161–183.

Hampton, K. H., M. K. Fitch, W. B. Allshouse, I. A. Doherty, D. C. Gesink, P. A. Leone, M. L. Serre, and W. C. Miller (2010). Mapping health data: improved privacy protection with donut method geomasking. American journal of epidemiology 172(9), 1062–1069.

Hasanzadeh, K., A. Broberg, and M. Kyttä (2017). Where is my neighborhood? a dynamic individual-based definition of home ranges and implementation of multiple evaluation criteria. Applied geography 84, 1–10.

Hasanzadeh, K., A. Kajosaari, D. Häggman, and M. Kyttä (2020). A context sensitive approach to anonymizing public participation gis data: From development to the assessment of anonymization effects on data quality. Computers, Environment and Urban Systems 83, 101513.

Hu, J., J. P. Reiter, and Q. Wang (2018). Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Analysis 13(1), 183–200.

INSPIRE (2014). Data specification on geographical grid systems – technical guidelines. Technical Report D2.8.I.2, European Commission.

Klumpe, B., J. Schröder, and M. Zwick (Eds.) (2020). Qualität bei zusammengeführten Daten. Schriftenreihe der ASI - Arbeitsgemeinschaft Sozialwissenschaftlicher Institute. Springer VS Wiesbaden.

Koebe, T., A. Arias-Salazar, and T. Schmid (2023). Releasing survey microdata with exact cluster locations and additional privacy safeguards. Humanities and Social Sciences Communications 10(1), 1–13.

Kounadi, O. and M. Leitner (2015). Spatial information divergence: Using global and local indices to compare geographical masks applied to crime data. Transactions in GIS 19(5), 737–757.

Kounadi, O. and M. Leitner (2016). Adaptive areal elimination (aae): A transparent way of disclosing protected spatial datasets. Computers, Environment and Urban Systems 57, 59–67.

Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics-Theory and methods 26(6), 1481– 1496.

Kulldorff, M., R. Heffernan, J. Hartman, R. Assunçao, and F. Mostashari (2005). A space–time permutation scan statistic for disease outbreak detection. PLoS medicine 2(3), e59.

Kwan, M.-P., I. Casas, and B. Schmitz (2004). Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks? Cartographica: The International Journal for Geographic Information and Geovisualization 39(2), 15–28.

Lagonigro, R., R. Oller, J. C. Martori, et al. (2017). A quadtree approach based on european geographic grids: reconciling data privacy and accuracy.

Lawson, A. B., J. Choi, B. Cai, M. Hossain, R. S. Kirby, and J. Liu (2012). Bayesian 2-stage space-time mixture modeling with spatial misalignment of the exposure in small area health data. Journal of agricultural, biological, and environmental statistics 17, 417–441.

Levine, N. (2006). Crime mapping and the crimestat program. Geographical analysis 38(1), 41–56. Liang, S., B. P. Carlin, and A. E. Gelfand (2008). Analysis of minnesota colon and rectum cancer point patterns

with spatial and nonspatial covariate information. The annals of applied statistics 3(3), 943. Little, R. J. (1993). Statistical analysis of masked data. Journal of Official Statistics 9(2), 407. Lu, Y., C. Yorke, and F. B. Zhan (2012). Considering risk locations when defining perturbation zones for geo-

masking. Cartographica: The International Journal for Geographic Information and Geovisualization 47(3), 168–178.

Machanavajjhala, A., D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber (2008). Privacy: Theory meets practice on the map. In 2008 IEEE 24th international conference on data engineering, pp. 277–286. IEEE.

Matheron, G. (1963). Principles of geostatistics. Economic geology 58(8), 1246–1266. Olson, K. L., S. J. Grannis, and K. D. Mandl (2006). Privacy protection versus cluster detection in spatial

epidemiology. American Journal of Public Health 96(11), 2002–2008. Ord, J. K. and A. Getis (1995). Local spatial autocorrelation statistics: distributional issues and an application.

Geographical analysis 27(4), 286–306. 11

Paiva, T., A. Chakraborty, J. Reiter, and A. Gelfand (2014). Imputation of confidential data sets with spatial locations using disease mapping models. Statistics in medicine 33(11), 1928–1945.

Quick, H. (2021). Generating poisson-distributed differentially private synthetic data. Journal of the Royal Statistical Society Series A: Statistics in Society 184(3), 1093–1108.

Quick, H. (2022). Improving the utility of poisson-distributed, differentially private synthetic data via prior predictive truncation with an application to cdc wonder. Journal of Survey Statistics and Methodology 10(3), 596–617.

Quick, H., S. H. Holan, and C. K. Wikle (2015). Zeros and ones: a case for suppressing zeros in sensitive count data with an application to stroke mortality. Stat 4(1), 227–234.

Quick, H., S. H. Holan, and C. K. Wikle (2018). Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing. Journal of the Royal Statistical Society Series A: Statistics in Society 181(3), 649–661.

Quick, H., S. H. Holan, C. K. Wikle, and J. P. Reiter (2015). Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography. Spatial Statistics 14, 439–451.

Quick, H. and L. A. Waller (2018). Using spatiotemporal models to generate synthetic data for public use. Spatial and Spatio-Temporal Epidemiology 27, 37–45.

Rebollo-Monedero, D., J. Forné, and M. Soriano (2011). An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data & Knowledge Engineering 70(10), 892–921.

Reiter, J. P. and R. Mitra (2009). Estimating risks of identification disclosure in partially synthetic data. Journal of Privacy and Confidentiality 1(1).

Ripley, B. D. (1976). The second-order analysis of stationary point processes. Journal of applied probabil- ity 13(2), 255–266.

Rubin, D. B. (1993). Statistical disclosure limitation. Journal of Official Statistics 9(2), 461–468. Sakshaug, J. W. and T. E. Raghunathan (2010). Synthetic data for small area estimation. In J. Domingo-Ferrer

and E. Magkos (Eds.), Privacy in Statistical Databases, Berlin, Heidelberg, pp. 162–173. Springer Berlin Heidelberg.

Sakshaug, J. W. and T. E. Raghunathan (2014). Generating synthetic data to produce public-use microdata for small geographic areas based on complex sample survey data with application to the national health interview survey. Journal of Applied Statistics 41(10), 2103–2122.

Seidl, D. E., P. Jankowski, and A. Nara (2019). An empirical test of household identification risk in geomasked maps. Cartography and Geographic Information Science 46(6), 475–488.

Seidl, D. E., G. Paulus, P. Jankowski, and M. Regenfelder (2015). Spatial obfuscation methods for privacy protection of household-level data. Applied Geography 63, 253–263.

Shi, X., J. Alford-Teaster, and T. Onega (2009). Kernel density estimation with geographically masked points. In 2009 17th International Conference on Geoinformatics, pp. 1–4. IEEE.

Si, Y. and J. P. Reiter (2013). Nonparametric bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of educational and behavioral statistics 38(5), 499–521.

Soria-Cormas, J. and J. Drechsler (2013). Evaluating the potential of differential privacy mechanisms for census data. In UNECE Work Session on Data Confidentiality.

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International journal of uncertainty, fuzziness and knowledge-based systems 10(05), 557–570.

Taddy, M. A. and A. Kottas (2012). Mixture modeling for marked poisson processes. Bayesian Analysis 7(2). Voronoi, G. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques.

deuxième mémoire. recherches sur les parallélloèdres primitifs. Journal für die reine und angewandte Mathematik (Crelles Journal) 1908(134), 198–287.

Wang, H. and J. P. Reiter (2012). Multiple imputation for sharing precise geographies in public use data. The annals of applied statistics 6(1), 229.

Wightman, P., W. Coronell, D. Jabba, M. Jimeno, and M. Labrador (2011). Evaluation of location obfuscation techniques for privacy in location based information systems. In 2011 IEEE Third Latin-American Conference

12

on Communications, pp. 1–6. IEEE. Zandbergen, P. A. (2014). Ensuring confidentiality of geocoded health data: assessing geographic masking

strategies for individual-level data. Advances in medicine 2014. Zhang, S., S. M. Freundschuh, K. Lenzer, and P. A. Zandbergen (2017). The location swapping method for

geomasking. Cartography and Geographic Information Science 44(1), 22–34. Zhou, Y., F. Dominici, and T. A. Louis (2010). A smoothing approach for masking spatial data. The Annals of

Applied Statistics 4(3), 1451–1475. DOI: 10.1214/09-AOAS325. Zimmerman, D. L. and C. Pavlik (2008). Quantifying the effects of mask metadata disclosure and multiple

releases on the confidentiality of geographically masked health data. Geographical analysis 40(1), 52–76. Zurbarán, M., P. Wightman, M. Brovelli, D. Oxoli, M. Iliffe, M. Jimeno, and A. Salazar (2018). Nrand-k:

Minimizing the impact of location obfuscation in spatial analysis. Transactions in GIS 22(5), 1257–1274.

13

  • 1. Introduction
  • 2. Data Protection Strategies
    • 2.1. Aggregation
    • 2.2. Geographic Masking
    • 2.3. Synthetic Data
  • 3. Risk and Utility Assessment
    • 3.1. Risk Evaluation
    • 3.2. Utility Evaluation
  • 4. Conclusion
  • References

Remote Access for Scientific Use Files – a New Pathway for German Official Statistics Microdata Access

remote access, microdata for scientific purposes, access path

Languages and translations
English

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

Expert Meeting on Statistical Data Confidentiality

26-28 September 2023, Wiesbaden

Remote Access for Scientific Use Files – a New Pathway for German Official

Statistics Microdata Access

Hanna Brenzel ( Research Data Centre of the Federal Statistical Office)

Katharina Cramer (Research Data Centre of the Statistical Offices of the Federal States)

Volker Güttgemanns (Research Data Centre of the Statistical Offices of the Federal States)

Marcel Mathes (Research Data Centre of the Statistical Offices of the Federal States)

[email protected]

Abstract

The fundamental goal of the Research Data Centre of the Federal Statistical Office and the Research Data Centre of the

Statistical Offices of the Federal States (RDC) is not only to provide access to official statistics microdata, but also to

continuously improve and adapt the access to the changing needs of empirical science. In order to meet the broad range of

needs of the empirically working scientific community, the RDC have offered different access paths since their founding, through which differently anonymised data products are made available. Now, the RDC come up with a new remote access

prototype system including a new data product. All access paths differ both in terms of the anonymity degree of the

provided microdata as well as in the access way of data provision. At first, existing and firmly established data access

paths are outlined and their contractual and legal conditions explained. Subsequently, the newly installed remote access

prototype and its features and requirements are presented. Provided that the ongoing evaluation phase turns out positive,

this data access option will define one more way of data access operated regularly in its full version from 2024 onwards.

The analysis potential of the data provided therein will classify between the scientific-use files transmitted to the scientific

institutions and the data provided for on-site analysis at the RDC safe centres. This paper highlights various challenges,

such as data protection requirements and legal framework conditions, which must be considered.

2

1 Introduction

With the establishment of the Research Data Centre (RDC) of the Federal Statistical Office in the fall of 2001

and with the RDC of the statistical offices of the Länder in April 2002, an important cornerstone and a central intersection was created between the scientific community and official statistics as data and information service

provider.

Together, the RDCs offer the empirically working scientific community a coordinated range of data and services for the scientific use of high-quality microdata from official statistics.

Over time, however, expectations of the RDCs have evolved fundamentally, and stakeholders in politics and

scientific communities have been pushing for substantial improvements in data access and data usage capabilities for some time.

Remote access represents an up to date and modern way of accessing data and is accordingly demanded by data

users. The statistical offices of other European countries (e.g., the Netherlands, France or Finland) can be

mentioned as reference benchmarks. They have created the legal and technical prerequisites to make their data available to researchers via remote access some time ago. Last but not least, a remote access system is currently

being set up at European level by Eurostat.

On one hand, the establishment of a remote access system - with the investment in a connectable infrastructure - will advance the continuous development of the RDC. By catering towards the needs of the scientific

community, the status of the RDC as a modern data provider will be consolidated. On the other hand, the

currently complex and inefficient system of data access can be streamlined to a uniform and manageable system without limiting the flexibility of the users.

2 Status Quo

The RDC of the Federal Statistical Office, together with the RDC of the statistical offices of the Länder, offer

access to more than 3,000 different data products for over 90 statistics for scientific use via different ways of

access. They differ both in terms of the anonymity of the accessible data and in the type of data provision. Generally, the existing ways of data access can be divided into two categories, as figure 1 illustrates. In the case

of the so-called "on-site access", the data remains in the secure areas of the statistical offices of the Federation

and the Federal States. Since the RDCs can closely control the access to the data and provide output only after

confidentiality check, the data are only weakly anonymized. With the "off-site access," on the other hand, users can work with the individual data at their own institutes. Since the output are not checked by the data centers,

the individual data has to be more anonymized.

The category “off-site” includes the so-called Public Use Files (PUF), Campus Files (CF) and Scientific Use Files (SUF). “On-site” includes PC workplaces at the RDC, so called “safe centers” and remote execution (see

the homepage of the RDC, https://www.forschungsdatenzentrum.de/en/access).

Safe centers exist in all locations of both RDC. These can be used by researchers to analyse microdata inside

the safe premises of the statistical offices. As the individual data are already protected by the regulation of data

access and the equipment of the PC workstation, formally anonymous microdata can be provided at the safe

centers. Thus, a nationwide infrastructure in Germany is available for these data. The safe centers are equipped with common statistical programs (Stata, R as well as partly SPSS and SAS) and

are completely isolated from the outside. A separate PC workstation with internet connection is available for e-

mail communication and internet searches. In contrast to the safe centers, remote execution does not provide direct access to the microdata. Instead, data

structure files are made available that resemble the original material with regard to structure and variable

values, but do not permit any analyses in terms of content and do not hold any risk of exposing confidential

information. Using these data sets, program codes can be prepared by the users using the statistical programs SPSS, SAS, Stata or R. These program codes are applied by staff of the statistical offices to analyse the original

data. The data users receive the results of those analyses after the relevant confidentiality checks.

SUFs are standardized datasets created by the RDC for popular statistics. SUFs offer lower potential for

analyses than on-site ways of access, but are designed to be suitable for a large proportion of scientific research

projects. Due to the de facto anonymization of microdata, they may be used outside the protected premises of official statistics according to Sect. 16 para. 6 nos. 1 BStatG. Due to legal restrictions, SUF may only be used

3

by researchers who are employed by a research institution that is registered and located in Germany. The use of

SUF may only take place in Germany. Until recently, the SUF were sent by DVD to the respective scientific institution with which user contract was concluded. Since June 2023, recent modernization measures now allow

the SUF to be accessed directly via a download portal to the institution authorized to use the data.

In particular, on-site ways of access entail additional work for both data users and RDC staff. At the same time,

the share of data uses via these access paths steadily increases over time compared to off-site uses. The development of a remote access system therefore pursues the goal of ensuring the technical connectivity to a

modern and demand-oriented data provision for the scientific community. With this technology, the increased

expectations of the research community for an up-to-date and modern data provision can be fulfilled in the long term. In addition, the remote access system holds potential for future innovation by reducing or substituting

existing labor-intensive ways of access (reduction of on-site support, reduction of coordination of appointments

with users, reduction of coordination and support of remote execution, etc.). Consequently, the scarce resources of the RDC could be invested more efficiently, for example in supporting additional data usage or further

developing the data and service offers. At the same time, there is increased potential regarding data parsimony,

as it is expected that this system will reduce the number of intermediate results per project that require

confidentiality checks. Furthermore, the RDC aim to sustainably strengthen their leading role in the group of German RDC.

Figure 1: Ways of data access at the research data centres (RDC) of the statistical offices of the Federation and the Federal States

3 The Remote Access System

3.1 The technical structure

IT and data security play a crucial role in setting up the remote access system. The aim is to ensure that the

remote access system is implemented in compliance with the law while maintaining the required IT security

standards.

A virtual desktop infrastructure based on CITRIX was chosen as the IT-architecture. The system components

set up are located in the so-called IDMZ (Internet Demilitarized Zone), in which procedures are operated that

are to be accessible from the Internet. In the IDMZ, a distinction is made between three areas: Access Area

(Pex), Application Area (Pin1) and Data Area (Pin2). These three areas are separated from each other by

firewalls, which only allow approved communication between the neighboring areas within the application. A

so-called transport encryption secures the communication path between the server and the client.

Two-factor authentication and IP whitelisting are implemented as additional IT security measures for the Citrix

solution. Two-factor authentication means that, in addition to the user-specific work accounts protected by a

personal password, a uniquely generated token must be used for each log-in. IP whitelisting allows only

specific IP addresses to gain access to the remote access system. Prior to each authorized use, the IP address of

the respective facility is allowed or added to the whitelist. This ensures that unauthorized IP addresses do not

initially gain access to the system. This implements geoblocking as a technical measure as well as

strengthening protection against possible (automated) attack attempts.

In addition, app protection is used to, among other things, prevent the user from taking screenshots of the data.

Remote system access is controlled on a per user basis by an access management system, only authorized users

are granted access. Within the system, authorizations are limited to the extent required for data analysis. The

creation of user-specific working accounts, which are managed centrally and secured by the user and access

management, ensures that access is only possible to requested data. Each account is linked to a data folder in

which user-specific official microdata are stored by RDC staff.

In addition to the technical measures, a number of technical and contractual-organizational measures are

introduced to increase data protection. Before the data can be accessed, a user contract has to be concluded

between the scientific institution and the responsible statistical office. It is contractually stipulated that up-to-

4

date software, operating system and virus protection are used on the client side when accessing the virtual

desktop infrastructure. As well as, re-identification of individual cases is illicit. The RDC are legally bound to

check all statistical results for statistical confidentiality that were created within the context of scientific

projects based on provided microdata. This serves the protection of data according to section 16 (6) of the

Federal Statistics Law (BStatG). Should individual cases be part of the output then they have to be blocked

consistently across all results of a project. Data users who plan to re-identify individual cases are liable to

prosecution and are expelled from further data uses.

In order to ensure that the system is tied to a specific location, its use is contractually established and sanctions

are imposed in the event of violations. In addition, it is contractually stipulated that scientific institutions can be

excluded from using the remote access system or from the possibility of carrying out further research projects

via the RDC in the event of serious violations of the terms of use. In the event of a striking breach of contract,

the scientific institutions can also be sanctioned with a penalty payment of up to EUR 20,000.

3.2 Data material in the remote access system

Remote access to formally anonymized data is not feasible within the current legal framework. One possible

way of implementation is to offer remote access for de facto anonymized data with slight modifications, as this

would not require amendment of the law. In this case, the degree of data modification is of utmost relevance: If

the level of anonymization is too high, the data offered will not meet the needs of the scientific community; if

the level of data anonymization is too low, confidentiality can no longer be maintained. The degree of de facto

anonymization therefore largely determines the benefits and coverage of the demand of the scientific

community. In addition, the expected effects on the capacity of the RDC heavily depend on covering as many

of the science community's projects as possible via the remote access system and, in particular, on reducing the

costly uses of remote execution. However, this goal can only be achieved if significantly more data can be

provided via remote access than via the current dissemination path via off-site SUF.

Microdata are described as “de facto anonymous” if it is not possible to completely rule out de-anonymization

but assigning the information to the respective statistical unit “requires unreasonable effort in terms of time,

cost and manpower” (Section 16 (6) of the Federal Statistics Act). According to the Federal Statistics Act,

however, de facto anonymous data may only be used by scientific institutions and only to carry out scientific

projects.

When creating de facto anonymity, the aim is to virtually eliminate the probability of correctly assigning data to

respondents, while preserving the statistical information content as much as possible. Different anonymization

methods can be used for this purpose. Common methods are information reduction (e.g. aggregation, class

formation, censoring) and information modification (e.g. swapping). In order to determine de facto anonymity,

the effort and benefit of deanonymization must be evaluated.

Factual anonymity thus does not completely exclude the possibility of re-identification, but puts its risk in a

cost/benefit ratio. Costs for data users primarily include the consequences for actions in violation of the

contract. Re-identification is strictly prohibited and punishable by fine or imprisonment (Section 203 StGB). In

addition, consequences such as loss of reputation, loss of access to data of official statistics, etc., which threaten

in the event of de-anonymization of the data, must also be considered by scientific users. This is because the

users are obligated to maintain the anonymity of the data both by the formal obligation and the user agreement.

Factual anonymity therefore does not result solely from the remaining information content of the data, but is

composed of a triad: 1) modification of the data material, 2) technical/organizational measures, and 3)

contractual measures. Therefore, it also depends on the access condition, if a microdata set can be described as

Figure 2: Technical infrastructure of the remote access system

5

de facto anonymous. Of crucial importance here is what additional knowledge is available and where the data

access takes place. Depending on whether the microdata is used outside or inside the statistical offices, de facto

anonymity can be achieved with more (off-site SUF) or less (on-site SUF) severe losses of information.

The de facto anonymity of microdata from official statistics is thus not a fixed quantity, but can be mapped

along a continuum. In principle, it can be stated: The higher the technical and contractual measures, the fewer

anonymization measures need to be taken and the higher the analysis potential of the data.

No technical measures are used for the previous off-site SUFs. Factual anonymity must therefore only be

achieved from the two remaining measures: in addition to the contractual commitment and the commitment of

the users, de facto anonymity is achieved by strongly anonymizing the data material itself. For this purpose, a

statistics-specific anonymization concept is developed for each data material.

With the new remote SUF or on-site SUF, de facto anonymity can be achieved by significantly less

modification of the data. This is justified by the high level of technical measures and the associated possibility

to control the data access. In contrast to off-site SUF, the data is not passed on. It is solely possible to view the

data via a virtual desktop (VDI environment). A so-called "transport encryption" secures the communication

path between the server (sender) as well as the client (receiver). An exchange between the technical

infrastructure of the data users and the data on the server of the official statistics or a download of the official

data is thus technically impossible. Thus, unauthorized data linkage is impossible and the RDC has a high level

of use control via log files. With regard to the risk of de-anonymization, data access via remote access therefore

reduces many risks compared to the previous off-site SUFs.

3.3 The use of Remote Access

The remote access system, which is currently under construction, will be set up as a classic remote desktop

version. As in the past, scientific institutions that are entitled to use the system in accordance with Section 16

BStatG have to apply for data access. If the application is approved, the researchers are then able to access the

secure area within their scientific institution by using their own hardware. Within the secure area common

statistical software such as RStudio and Stata is available. The major advantage compared to remote execution

is that researchers can see the microdata and do not have to "blindly" program their syntaxes as before (see

Figure 3). By working directly with and being able to view the data, it should be possible to significantly

reduce the number of intermediate results previously generated via remote execution, thus minimizing a very

labor-intensive process step in the RDC. The goal should be that only final outputs are checked for

confidentiality by the RDC staff and will be released. This also supports the principle of data parsimony.

Figure 3: Remote Access at the RDC

Work on setting up such a system began in November 2021. The system is currently in the evaluation phase.

On one hand, the technical implementation of the system is being tested and its resilience checked using penetration tests. On the other hand, the user-friendliness and the attractiveness of the data material provided is

to be examined thoroughly. In a first step, only absolutely anonymous data material was made available via the

system for a selected group of people. In a second step, off-site SUFs will then be made available to power users who have already completed a valid user application with the RDC. The third step will then be to test the

redesigned on-site/remote SUF material. Since the system requires a redesign of all statistics-specific

anonymization concepts, a gradual integration of the existing data products in the RDC is planned. The start will be made with the most requested data product, the microcensus. In order to be able to evaluate the

operating grade of the system appropriately, DRG statistics will be offered as one of the first data products in

the remote access system in addition to the microcensus. If the evaluation of the system is positive, other data

products that are of high demand will follow.

6

4 BIBLIOGRAPHY

Brenzel, Hanna / Zwick, Markus. An information infrastructure has emerged in Germany – the Research Data

Centre of the Federal Statistical Office. German version published in WISTA | 6 | 2022, p. 54 et seq.

Homepage of the Research Data Centre of the Federal Statistical Office and the Federal States

https://www.forschungsdatenzentrum.de/en

Remote Access for Scientific Use Files – a New Pathway for German Official Statistics Microdata Access

remote access, access path, microdata for scientific purposes

Languages and translations
English

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

Expert Meeting on Statistical Data Confidentiality

26-28 September 2023, Wiesbaden

Remote Access for Scientific Use Files – a New Pathway for German Official

Statistics Microdata Access

Hanna Brenzel ( Research Data Centre of the Federal Statistical Office)

Katharina Cramer (Research Data Centre of the Statistical Offices of the Federal States)

Volker Güttgemanns (Research Data Centre of the Statistical Offices of the Federal States)

Marcel Mathes (Research Data Centre of the Statistical Offices of the Federal States)

[email protected]

Abstract

The fundamental goal of the Research Data Centre of the Federal Statistical Office and the Research Data Centre of the

Statistical Offices of the Federal States (RDC) is not only to provide access to official statistics microdata, but also to

continuously improve and adapt the access to the changing needs of empirical science. In order to meet the broad range of

needs of the empirically working scientific community, the RDC have offered different access paths since their founding, through which differently anonymised data products are made available. Now, the RDC come up with a new remote access

prototype system including a new data product. All access paths differ both in terms of the anonymity degree of the

provided microdata as well as in the access way of data provision. At first, existing and firmly established data access

paths are outlined and their contractual and legal conditions explained. Subsequently, the newly installed remote access

prototype and its features and requirements are presented. Provided that the ongoing evaluation phase turns out positive,

this data access option will define one more way of data access operated regularly in its full version from 2024 onwards.

The analysis potential of the data provided therein will classify between the scientific-use files transmitted to the scientific

institutions and the data provided for on-site analysis at the RDC safe centres. This paper highlights various challenges,

such as data protection requirements and legal framework conditions, which must be considered.

2

1 Introduction

With the establishment of the Research Data Centre (RDC) of the Federal Statistical Office in the fall of 2001

and with the RDC of the statistical offices of the Länder in April 2002, an important cornerstone and a central intersection was created between the scientific community and official statistics as data and information service

provider.

Together, the RDCs offer the empirically working scientific community a coordinated range of data and services for the scientific use of high-quality microdata from official statistics.

Over time, however, expectations of the RDCs have evolved fundamentally, and stakeholders in politics and

scientific communities have been pushing for substantial improvements in data access and data usage capabilities for some time.

Remote access represents an up to date and modern way of accessing data and is accordingly demanded by data

users. The statistical offices of other European countries (e.g., the Netherlands, France or Finland) can be

mentioned as reference benchmarks. They have created the legal and technical prerequisites to make their data available to researchers via remote access some time ago. Last but not least, a remote access system is currently

being set up at European level by Eurostat.

On one hand, the establishment of a remote access system - with the investment in a connectable infrastructure - will advance the continuous development of the RDC. By catering towards the needs of the scientific

community, the status of the RDC as a modern data provider will be consolidated. On the other hand, the

currently complex and inefficient system of data access can be streamlined to a uniform and manageable system without limiting the flexibility of the users.

2 Status Quo

The RDC of the Federal Statistical Office, together with the RDC of the statistical offices of the Länder, offer

access to more than 3,000 different data products for over 90 statistics for scientific use via different ways of

access. They differ both in terms of the anonymity of the accessible data and in the type of data provision. Generally, the existing ways of data access can be divided into two categories, as figure 1 illustrates. In the case

of the so-called "on-site access", the data remains in the secure areas of the statistical offices of the Federation

and the Federal States. Since the RDCs can closely control the access to the data and provide output only after

confidentiality check, the data are only weakly anonymized. With the "off-site access," on the other hand, users can work with the individual data at their own institutes. Since the output are not checked by the data centers,

the individual data has to be more anonymized.

The category “off-site” includes the so-called Public Use Files (PUF), Campus Files (CF) and Scientific Use Files (SUF). “On-site” includes PC workplaces at the RDC, so called “safe centers” and remote execution (see

the homepage of the RDC, https://www.forschungsdatenzentrum.de/en/access).

Safe centers exist in all locations of both RDC. These can be used by researchers to analyse microdata inside

the safe premises of the statistical offices. As the individual data are already protected by the regulation of data

access and the equipment of the PC workstation, formally anonymous microdata can be provided at the safe

centers. Thus, a nationwide infrastructure in Germany is available for these data. The safe centers are equipped with common statistical programs (Stata, R as well as partly SPSS and SAS) and

are completely isolated from the outside. A separate PC workstation with internet connection is available for e-

mail communication and internet searches. In contrast to the safe centers, remote execution does not provide direct access to the microdata. Instead, data

structure files are made available that resemble the original material with regard to structure and variable

values, but do not permit any analyses in terms of content and do not hold any risk of exposing confidential

information. Using these data sets, program codes can be prepared by the users using the statistical programs SPSS, SAS, Stata or R. These program codes are applied by staff of the statistical offices to analyse the original

data. The data users receive the results of those analyses after the relevant confidentiality checks.

SUFs are standardized datasets created by the RDC for popular statistics. SUFs offer lower potential for

analyses than on-site ways of access, but are designed to be suitable for a large proportion of scientific research

projects. Due to the de facto anonymization of microdata, they may be used outside the protected premises of official statistics according to Sect. 16 para. 6 nos. 1 BStatG. Due to legal restrictions, SUF may only be used

3

by researchers who are employed by a research institution that is registered and located in Germany. The use of

SUF may only take place in Germany. Until recently, the SUF were sent by DVD to the respective scientific institution with which user contract was concluded. Since June 2023, recent modernization measures now allow

the SUF to be accessed directly via a download portal to the institution authorized to use the data.

In particular, on-site ways of access entail additional work for both data users and RDC staff. At the same time,

the share of data uses via these access paths steadily increases over time compared to off-site uses. The development of a remote access system therefore pursues the goal of ensuring the technical connectivity to a

modern and demand-oriented data provision for the scientific community. With this technology, the increased

expectations of the research community for an up-to-date and modern data provision can be fulfilled in the long term. In addition, the remote access system holds potential for future innovation by reducing or substituting

existing labor-intensive ways of access (reduction of on-site support, reduction of coordination of appointments

with users, reduction of coordination and support of remote execution, etc.). Consequently, the scarce resources of the RDC could be invested more efficiently, for example in supporting additional data usage or further

developing the data and service offers. At the same time, there is increased potential regarding data parsimony,

as it is expected that this system will reduce the number of intermediate results per project that require

confidentiality checks. Furthermore, the RDC aim to sustainably strengthen their leading role in the group of German RDC.

Figure 1: Ways of data access at the research data centres (RDC) of the statistical offices of the Federation and the Federal States

3 The Remote Access System

3.1 The technical structure

IT and data security play a crucial role in setting up the remote access system. The aim is to ensure that the

remote access system is implemented in compliance with the law while maintaining the required IT security

standards.

A virtual desktop infrastructure based on CITRIX was chosen as the IT-architecture. The system components

set up are located in the so-called IDMZ (Internet Demilitarized Zone), in which procedures are operated that

are to be accessible from the Internet. In the IDMZ, a distinction is made between three areas: Access Area

(Pex), Application Area (Pin1) and Data Area (Pin2). These three areas are separated from each other by

firewalls, which only allow approved communication between the neighboring areas within the application. A

so-called transport encryption secures the communication path between the server and the client.

Two-factor authentication and IP whitelisting are implemented as additional IT security measures for the Citrix

solution. Two-factor authentication means that, in addition to the user-specific work accounts protected by a

personal password, a uniquely generated token must be used for each log-in. IP whitelisting allows only

specific IP addresses to gain access to the remote access system. Prior to each authorized use, the IP address of

the respective facility is allowed or added to the whitelist. This ensures that unauthorized IP addresses do not

initially gain access to the system. This implements geoblocking as a technical measure as well as

strengthening protection against possible (automated) attack attempts.

In addition, app protection is used to, among other things, prevent the user from taking screenshots of the data.

Remote system access is controlled on a per user basis by an access management system, only authorized users

are granted access. Within the system, authorizations are limited to the extent required for data analysis. The

creation of user-specific working accounts, which are managed centrally and secured by the user and access

management, ensures that access is only possible to requested data. Each account is linked to a data folder in

which user-specific official microdata are stored by RDC staff.

In addition to the technical measures, a number of technical and contractual-organizational measures are

introduced to increase data protection. Before the data can be accessed, a user contract has to be concluded

between the scientific institution and the responsible statistical office. It is contractually stipulated that up-to-

4

date software, operating system and virus protection are used on the client side when accessing the virtual

desktop infrastructure. As well as, re-identification of individual cases is illicit. The RDC are legally bound to

check all statistical results for statistical confidentiality that were created within the context of scientific

projects based on provided microdata. This serves the protection of data according to section 16 (6) of the

Federal Statistics Law (BStatG). Should individual cases be part of the output then they have to be blocked

consistently across all results of a project. Data users who plan to re-identify individual cases are liable to

prosecution and are expelled from further data uses.

In order to ensure that the system is tied to a specific location, its use is contractually established and sanctions

are imposed in the event of violations. In addition, it is contractually stipulated that scientific institutions can be

excluded from using the remote access system or from the possibility of carrying out further research projects

via the RDC in the event of serious violations of the terms of use. In the event of a striking breach of contract,

the scientific institutions can also be sanctioned with a penalty payment of up to EUR 20,000.

3.2 Data material in the remote access system

Remote access to formally anonymized data is not feasible within the current legal framework. One possible

way of implementation is to offer remote access for de facto anonymized data with slight modifications, as this

would not require amendment of the law. In this case, the degree of data modification is of utmost relevance: If

the level of anonymization is too high, the data offered will not meet the needs of the scientific community; if

the level of data anonymization is too low, confidentiality can no longer be maintained. The degree of de facto

anonymization therefore largely determines the benefits and coverage of the demand of the scientific

community. In addition, the expected effects on the capacity of the RDC heavily depend on covering as many

of the science community's projects as possible via the remote access system and, in particular, on reducing the

costly uses of remote execution. However, this goal can only be achieved if significantly more data can be

provided via remote access than via the current dissemination path via off-site SUF.

Microdata are described as “de facto anonymous” if it is not possible to completely rule out de-anonymization

but assigning the information to the respective statistical unit “requires unreasonable effort in terms of time,

cost and manpower” (Section 16 (6) of the Federal Statistics Act). According to the Federal Statistics Act,

however, de facto anonymous data may only be used by scientific institutions and only to carry out scientific

projects.

When creating de facto anonymity, the aim is to virtually eliminate the probability of correctly assigning data to

respondents, while preserving the statistical information content as much as possible. Different anonymization

methods can be used for this purpose. Common methods are information reduction (e.g. aggregation, class

formation, censoring) and information modification (e.g. swapping). In order to determine de facto anonymity,

the effort and benefit of deanonymization must be evaluated.

Factual anonymity thus does not completely exclude the possibility of re-identification, but puts its risk in a

cost/benefit ratio. Costs for data users primarily include the consequences for actions in violation of the

contract. Re-identification is strictly prohibited and punishable by fine or imprisonment (Section 203 StGB). In

addition, consequences such as loss of reputation, loss of access to data of official statistics, etc., which threaten

in the event of de-anonymization of the data, must also be considered by scientific users. This is because the

users are obligated to maintain the anonymity of the data both by the formal obligation and the user agreement.

Factual anonymity therefore does not result solely from the remaining information content of the data, but is

composed of a triad: 1) modification of the data material, 2) technical/organizational measures, and 3)

contractual measures. Therefore, it also depends on the access condition, if a microdata set can be described as

Figure 2: Technical infrastructure of the remote access system

5

de facto anonymous. Of crucial importance here is what additional knowledge is available and where the data

access takes place. Depending on whether the microdata is used outside or inside the statistical offices, de facto

anonymity can be achieved with more (off-site SUF) or less (on-site SUF) severe losses of information.

The de facto anonymity of microdata from official statistics is thus not a fixed quantity, but can be mapped

along a continuum. In principle, it can be stated: The higher the technical and contractual measures, the fewer

anonymization measures need to be taken and the higher the analysis potential of the data.

No technical measures are used for the previous off-site SUFs. Factual anonymity must therefore only be

achieved from the two remaining measures: in addition to the contractual commitment and the commitment of

the users, de facto anonymity is achieved by strongly anonymizing the data material itself. For this purpose, a

statistics-specific anonymization concept is developed for each data material.

With the new remote SUF or on-site SUF, de facto anonymity can be achieved by significantly less

modification of the data. This is justified by the high level of technical measures and the associated possibility

to control the data access. In contrast to off-site SUF, the data is not passed on. It is solely possible to view the

data via a virtual desktop (VDI environment). A so-called "transport encryption" secures the communication

path between the server (sender) as well as the client (receiver). An exchange between the technical

infrastructure of the data users and the data on the server of the official statistics or a download of the official

data is thus technically impossible. Thus, unauthorized data linkage is impossible and the RDC has a high level

of use control via log files. With regard to the risk of de-anonymization, data access via remote access therefore

reduces many risks compared to the previous off-site SUFs.

3.3 The use of Remote Access

The remote access system, which is currently under construction, will be set up as a classic remote desktop

version. As in the past, scientific institutions that are entitled to use the system in accordance with Section 16

BStatG have to apply for data access. If the application is approved, the researchers are then able to access the

secure area within their scientific institution by using their own hardware. Within the secure area common

statistical software such as RStudio and Stata is available. The major advantage compared to remote execution

is that researchers can see the microdata and do not have to "blindly" program their syntaxes as before (see

Figure 3). By working directly with and being able to view the data, it should be possible to significantly

reduce the number of intermediate results previously generated via remote execution, thus minimizing a very

labor-intensive process step in the RDC. The goal should be that only final outputs are checked for

confidentiality by the RDC staff and will be released. This also supports the principle of data parsimony.

Figure 3: Remote Access at the RDC

Work on setting up such a system began in November 2021. The system is currently in the evaluation phase.

On one hand, the technical implementation of the system is being tested and its resilience checked using penetration tests. On the other hand, the user-friendliness and the attractiveness of the data material provided is

to be examined thoroughly. In a first step, only absolutely anonymous data material was made available via the

system for a selected group of people. In a second step, off-site SUFs will then be made available to power users who have already completed a valid user application with the RDC. The third step will then be to test the

redesigned on-site/remote SUF material. Since the system requires a redesign of all statistics-specific

anonymization concepts, a gradual integration of the existing data products in the RDC is planned. The start will be made with the most requested data product, the microcensus. In order to be able to evaluate the

operating grade of the system appropriately, DRG statistics will be offered as one of the first data products in

the remote access system in addition to the microcensus. If the evaluation of the system is positive, other data

products that are of high demand will follow.

6

4 BIBLIOGRAPHY

Brenzel, Hanna / Zwick, Markus. An information infrastructure has emerged in Germany – the Research Data

Centre of the Federal Statistical Office. German version published in WISTA | 6 | 2022, p. 54 et seq.

Homepage of the Research Data Centre of the Federal Statistical Office and the Federal States

https://www.forschungsdatenzentrum.de/en

Remote Access for Scientific Use Files – a New Pathway for German Official Statistics Microdata Access

remote access,  access path, microdata

Languages and translations
English

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE

CONFERENCE OF EUROPEAN STATISTICIANS

Expert Meeting on Statistical Data Confidentiality

26-28 September 2023, Wiesbaden

Remote Access for Scientific Use Files – a New Pathway for German Official

Statistics Microdata Access

Hanna Brenzel ( Research Data Centre of the Federal Statistical Office)

Katharina Cramer (Research Data Centre of the Statistical Offices of the Federal States)

Volker Güttgemanns (Research Data Centre of the Statistical Offices of the Federal States)

Marcel Mathes (Research Data Centre of the Statistical Offices of the Federal States)

[email protected]

Abstract

The fundamental goal of the Research Data Centre of the Federal Statistical Office and the Research Data Centre of the

Statistical Offices of the Federal States (RDC) is not only to provide access to official statistics microdata, but also to

continuously improve and adapt the access to the changing needs of empirical science. In order to meet the broad range of

needs of the empirically working scientific community, the RDC have offered different access paths since their founding, through which differently anonymised data products are made available. Now, the RDC come up with a new remote access

prototype system including a new data product. All access paths differ both in terms of the anonymity degree of the

provided microdata as well as in the access way of data provision. At first, existing and firmly established data access

paths are outlined and their contractual and legal conditions explained. Subsequently, the newly installed remote access

prototype and its features and requirements are presented. Provided that the ongoing evaluation phase turns out positive,

this data access option will define one more way of data access operated regularly in its full version from 2024 onwards.

The analysis potential of the data provided therein will classify between the scientific-use files transmitted to the scientific

institutions and the data provided for on-site analysis at the RDC safe centres. This paper highlights various challenges,

such as data protection requirements and legal framework conditions, which must be considered.

2

1 Introduction

With the establishment of the Research Data Centre (RDC) of the Federal Statistical Office in the fall of 2001

and with the RDC of the statistical offices of the Länder in April 2002, an important cornerstone and a central intersection was created between the scientific community and official statistics as data and information service

provider.

Together, the RDCs offer the empirically working scientific community a coordinated range of data and services for the scientific use of high-quality microdata from official statistics.

Over time, however, expectations of the RDCs have evolved fundamentally, and stakeholders in politics and

scientific communities have been pushing for substantial improvements in data access and data usage capabilities for some time.

Remote access represents an up to date and modern way of accessing data and is accordingly demanded by data

users. The statistical offices of other European countries (e.g., the Netherlands, France or Finland) can be

mentioned as reference benchmarks. They have created the legal and technical prerequisites to make their data available to researchers via remote access some time ago. Last but not least, a remote access system is currently

being set up at European level by Eurostat.

On one hand, the establishment of a remote access system - with the investment in a connectable infrastructure - will advance the continuous development of the RDC. By catering towards the needs of the scientific

community, the status of the RDC as a modern data provider will be consolidated. On the other hand, the

currently complex and inefficient system of data access can be streamlined to a uniform and manageable system without limiting the flexibility of the users.

2 Status Quo

The RDC of the Federal Statistical Office, together with the RDC of the statistical offices of the Länder, offer

access to more than 3,000 different data products for over 90 statistics for scientific use via different ways of

access. They differ both in terms of the anonymity of the accessible data and in the type of data provision. Generally, the existing ways of data access can be divided into two categories, as figure 1 illustrates. In the case

of the so-called "on-site access", the data remains in the secure areas of the statistical offices of the Federation

and the Federal States. Since the RDCs can closely control the access to the data and provide output only after

confidentiality check, the data are only weakly anonymized. With the "off-site access," on the other hand, users can work with the individual data at their own institutes. Since the output are not checked by the data centers,

the individual data has to be more anonymized.

The category “off-site” includes the so-called Public Use Files (PUF), Campus Files (CF) and Scientific Use Files (SUF). “On-site” includes PC workplaces at the RDC, so called “safe centers” and remote execution (see

the homepage of the RDC, https://www.forschungsdatenzentrum.de/en/access).

Safe centers exist in all locations of both RDC. These can be used by researchers to analyse microdata inside

the safe premises of the statistical offices. As the individual data are already protected by the regulation of data

access and the equipment of the PC workstation, formally anonymous microdata can be provided at the safe

centers. Thus, a nationwide infrastructure in Germany is available for these data. The safe centers are equipped with common statistical programs (Stata, R as well as partly SPSS and SAS) and

are completely isolated from the outside. A separate PC workstation with internet connection is available for e-

mail communication and internet searches. In contrast to the safe centers, remote execution does not provide direct access to the microdata. Instead, data

structure files are made available that resemble the original material with regard to structure and variable

values, but do not permit any analyses in terms of content and do not hold any risk of exposing confidential

information. Using these data sets, program codes can be prepared by the users using the statistical programs SPSS, SAS, Stata or R. These program codes are applied by staff of the statistical offices to analyse the original

data. The data users receive the results of those analyses after the relevant confidentiality checks.

SUFs are standardized datasets created by the RDC for popular statistics. SUFs offer lower potential for

analyses than on-site ways of access, but are designed to be suitable for a large proportion of scientific research

projects. Due to the de facto anonymization of microdata, they may be used outside the protected premises of official statistics according to Sect. 16 para. 6 nos. 1 BStatG. Due to legal restrictions, SUF may only be used

3

by researchers who are employed by a research institution that is registered and located in Germany. The use of

SUF may only take place in Germany. Until recently, the SUF were sent by DVD to the respective scientific institution with which user contract was concluded. Since June 2023, recent modernization measures now allow

the SUF to be accessed directly via a download portal to the institution authorized to use the data.

In particular, on-site ways of access entail additional work for both data users and RDC staff. At the same time,

the share of data uses via these access paths steadily increases over time compared to off-site uses. The development of a remote access system therefore pursues the goal of ensuring the technical connectivity to a

modern and demand-oriented data provision for the scientific community. With this technology, the increased

expectations of the research community for an up-to-date and modern data provision can be fulfilled in the long term. In addition, the remote access system holds potential for future innovation by reducing or substituting

existing labor-intensive ways of access (reduction of on-site support, reduction of coordination of appointments

with users, reduction of coordination and support of remote execution, etc.). Consequently, the scarce resources of the RDC could be invested more efficiently, for example in supporting additional data usage or further

developing the data and service offers. At the same time, there is increased potential regarding data parsimony,

as it is expected that this system will reduce the number of intermediate results per project that require

confidentiality checks. Furthermore, the RDC aim to sustainably strengthen their leading role in the group of German RDC.

Figure 1: Ways of data access at the research data centres (RDC) of the statistical offices of the Federation and the Federal States

3 The Remote Access System

3.1 The technical structure

IT and data security play a crucial role in setting up the remote access system. The aim is to ensure that the

remote access system is implemented in compliance with the law while maintaining the required IT security

standards.

A virtual desktop infrastructure based on CITRIX was chosen as the IT-architecture. The system components

set up are located in the so-called IDMZ (Internet Demilitarized Zone), in which procedures are operated that

are to be accessible from the Internet. In the IDMZ, a distinction is made between three areas: Access Area

(Pex), Application Area (Pin1) and Data Area (Pin2). These three areas are separated from each other by

firewalls, which only allow approved communication between the neighboring areas within the application. A

so-called transport encryption secures the communication path between the server and the client.

Two-factor authentication and IP whitelisting are implemented as additional IT security measures for the Citrix

solution. Two-factor authentication means that, in addition to the user-specific work accounts protected by a

personal password, a uniquely generated token must be used for each log-in. IP whitelisting allows only

specific IP addresses to gain access to the remote access system. Prior to each authorized use, the IP address of

the respective facility is allowed or added to the whitelist. This ensures that unauthorized IP addresses do not

initially gain access to the system. This implements geoblocking as a technical measure as well as

strengthening protection against possible (automated) attack attempts.

In addition, app protection is used to, among other things, prevent the user from taking screenshots of the data.

Remote system access is controlled on a per user basis by an access management system, only authorized users

are granted access. Within the system, authorizations are limited to the extent required for data analysis. The

creation of user-specific working accounts, which are managed centrally and secured by the user and access

management, ensures that access is only possible to requested data. Each account is linked to a data folder in

which user-specific official microdata are stored by RDC staff.

In addition to the technical measures, a number of technical and contractual-organizational measures are

introduced to increase data protection. Before the data can be accessed, a user contract has to be concluded

between the scientific institution and the responsible statistical office. It is contractually stipulated that up-to-

4

date software, operating system and virus protection are used on the client side when accessing the virtual

desktop infrastructure. As well as, re-identification of individual cases is illicit. The RDC are legally bound to

check all statistical results for statistical confidentiality that were created within the context of scientific

projects based on provided microdata. This serves the protection of data according to section 16 (6) of the

Federal Statistics Law (BStatG). Should individual cases be part of the output then they have to be blocked

consistently across all results of a project. Data users who plan to re-identify individual cases are liable to

prosecution and are expelled from further data uses.

In order to ensure that the system is tied to a specific location, its use is contractually established and sanctions

are imposed in the event of violations. In addition, it is contractually stipulated that scientific institutions can be

excluded from using the remote access system or from the possibility of carrying out further research projects

via the RDC in the event of serious violations of the terms of use. In the event of a striking breach of contract,

the scientific institutions can also be sanctioned with a penalty payment of up to EUR 20,000.

3.2 Data material in the remote access system

Remote access to formally anonymized data is not feasible within the current legal framework. One possible

way of implementation is to offer remote access for de facto anonymized data with slight modifications, as this

would not require amendment of the law. In this case, the degree of data modification is of utmost relevance: If

the level of anonymization is too high, the data offered will not meet the needs of the scientific community; if

the level of data anonymization is too low, confidentiality can no longer be maintained. The degree of de facto

anonymization therefore largely determines the benefits and coverage of the demand of the scientific

community. In addition, the expected effects on the capacity of the RDC heavily depend on covering as many

of the science community's projects as possible via the remote access system and, in particular, on reducing the

costly uses of remote execution. However, this goal can only be achieved if significantly more data can be

provided via remote access than via the current dissemination path via off-site SUF.

Microdata are described as “de facto anonymous” if it is not possible to completely rule out de-anonymization

but assigning the information to the respective statistical unit “requires unreasonable effort in terms of time,

cost and manpower” (Section 16 (6) of the Federal Statistics Act). According to the Federal Statistics Act,

however, de facto anonymous data may only be used by scientific institutions and only to carry out scientific

projects.

When creating de facto anonymity, the aim is to virtually eliminate the probability of correctly assigning data to

respondents, while preserving the statistical information content as much as possible. Different anonymization

methods can be used for this purpose. Common methods are information reduction (e.g. aggregation, class

formation, censoring) and information modification (e.g. swapping). In order to determine de facto anonymity,

the effort and benefit of deanonymization must be evaluated.

Factual anonymity thus does not completely exclude the possibility of re-identification, but puts its risk in a

cost/benefit ratio. Costs for data users primarily include the consequences for actions in violation of the

contract. Re-identification is strictly prohibited and punishable by fine or imprisonment (Section 203 StGB). In

addition, consequences such as loss of reputation, loss of access to data of official statistics, etc., which threaten

in the event of de-anonymization of the data, must also be considered by scientific users. This is because the

users are obligated to maintain the anonymity of the data both by the formal obligation and the user agreement.

Factual anonymity therefore does not result solely from the remaining information content of the data, but is

composed of a triad: 1) modification of the data material, 2) technical/organizational measures, and 3)

contractual measures. Therefore, it also depends on the access condition, if a microdata set can be described as

Figure 2: Technical infrastructure of the remote access system

5

de facto anonymous. Of crucial importance here is what additional knowledge is available and where the data

access takes place. Depending on whether the microdata is used outside or inside the statistical offices, de facto

anonymity can be achieved with more (off-site SUF) or less (on-site SUF) severe losses of information.

The de facto anonymity of microdata from official statistics is thus not a fixed quantity, but can be mapped

along a continuum. In principle, it can be stated: The higher the technical and contractual measures, the fewer

anonymization measures need to be taken and the higher the analysis potential of the data.

No technical measures are used for the previous off-site SUFs. Factual anonymity must therefore only be

achieved from the two remaining measures: in addition to the contractual commitment and the commitment of

the users, de facto anonymity is achieved by strongly anonymizing the data material itself. For this purpose, a

statistics-specific anonymization concept is developed for each data material.

With the new remote SUF or on-site SUF, de facto anonymity can be achieved by significantly less

modification of the data. This is justified by the high level of technical measures and the associated possibility

to control the data access. In contrast to off-site SUF, the data is not passed on. It is solely possible to view the

data via a virtual desktop (VDI environment). A so-called "transport encryption" secures the communication

path between the server (sender) as well as the client (receiver). An exchange between the technical

infrastructure of the data users and the data on the server of the official statistics or a download of the official

data is thus technically impossible. Thus, unauthorized data linkage is impossible and the RDC has a high level

of use control via log files. With regard to the risk of de-anonymization, data access via remote access therefore

reduces many risks compared to the previous off-site SUFs.

3.3 The use of Remote Access

The remote access system, which is currently under construction, will be set up as a classic remote desktop

version. As in the past, scientific institutions that are entitled to use the system in accordance with Section 16

BStatG have to apply for data access. If the application is approved, the researchers are then able to access the

secure area within their scientific institution by using their own hardware. Within the secure area common

statistical software such as RStudio and Stata is available. The major advantage compared to remote execution

is that researchers can see the microdata and do not have to "blindly" program their syntaxes as before (see

Figure 3). By working directly with and being able to view the data, it should be possible to significantly

reduce the number of intermediate results previously generated via remote execution, thus minimizing a very

labor-intensive process step in the RDC. The goal should be that only final outputs are checked for

confidentiality by the RDC staff and will be released. This also supports the principle of data parsimony.

Figure 3: Remote Access at the RDC

Work on setting up such a system began in November 2021. The system is currently in the evaluation phase.

On one hand, the technical implementation of the system is being tested and its resilience checked using penetration tests. On the other hand, the user-friendliness and the attractiveness of the data material provided is

to be examined thoroughly. In a first step, only absolutely anonymous data material was made available via the

system for a selected group of people. In a second step, off-site SUFs will then be made available to power users who have already completed a valid user application with the RDC. The third step will then be to test the

redesigned on-site/remote SUF material. Since the system requires a redesign of all statistics-specific

anonymization concepts, a gradual integration of the existing data products in the RDC is planned. The start will be made with the most requested data product, the microcensus. In order to be able to evaluate the

operating grade of the system appropriately, DRG statistics will be offered as one of the first data products in

the remote access system in addition to the microcensus. If the evaluation of the system is positive, other data

products that are of high demand will follow.

6

4 BIBLIOGRAPHY

Brenzel, Hanna / Zwick, Markus. An information infrastructure has emerged in Germany – the Research Data

Centre of the Federal Statistical Office. German version published in WISTA | 6 | 2022, p. 54 et seq.

Homepage of the Research Data Centre of the Federal Statistical Office and the Federal States

https://www.forschungsdatenzentrum.de/en

PRE/ACCC/2023/203 Germany

Languages and translations
English

Annex_1_GGO_paras47-51_GER_ENG.pdf

34 Rechtsetzung

Verantwortung der Bundesministerin oder des Bundesmi- nisters für eilige Vorhaben ihres oder seines Geschäftsbe- reichs wird hierdurch nicht berührt.

§ 46 Rechtssystematische und rechtsförmliche Prüfung

(1) Bevor ein Gesetzentwurf der Bundesregierung zum Beschluss vorgelegt wird, ist er dem Bundesministerium der Justiz zur Prüfung in rechtssystematischer und rechts- förmlicher Hinsicht (Rechtsprüfung) zuzuleiten.

(2) Bei Übersendung des Entwurfs ist darauf Rücksicht zu neh- men, dass dem Bundesministerium der Justiz bei Entwür- fen größeren Umfanges genügend Zeit zur Prüfung und Erörterung von Fragen, die bei der Prüfung nach Absatz 1 anfallen, zur Verfügung stehen muss.

(3) Hat das Bundesministerium der Justiz an der Vorbereitung eines Entwurfs mitgewirkt und ihn hierbei schon der Prü- fung nach Absatz 1 unterzogen, kann mit seiner Zustim- mung von einer nochmaligen Zuleitung des Entwurfs abgesehen werden.

§ 47 Beteiligung von Ländern, kommunalen Spitzenverbänden, Fachkreisen und Verbänden

(1) Der Entwurf einer Gesetzesvorlage ist Ländern, kommu- nalen Spitzenverbänden und den Vertretungen der Län- der beim Bund möglichst frühzeitig zuzuleiten, wenn ihre Belange berührt sind. Ist in wesentlichen Punkten mit der abweichenden Meinung eines beteiligten Bundesministe- riums zu rechnen, hat die Zuleitung nur im Einverneh- men mit diesem zu erfolgen. Soll das Vorhaben vertraulich behandelt werden, ist dies zu vermerken.

(2) Das Bundeskanzleramt ist über die Beteiligung zu unter- richten. Bei Gesetzentwürfen von besonderer politischer Bedeutung muss seine Zustimmung eingeholt werden.

(3) Für eine rechtzeitige Beteiligung von Zentral- und Gesamt- verbänden sowie von Fachkreisen, die auf Bundesebene bestehen, gelten die Absätze 1 und 2 entsprechend. Zeit- punkt, Umfang und Auswahl bleiben, soweit keine Sonder- vorschriften bestehen, dem Ermessen des federführenden Bundesministeriums überlassen.

35Rechtsetzung

(4) Bei der Beteiligung nach den Absätzen 1 und 3 ist ausdrück- lich darauf hinzuweisen, dass es sich um einen Gesetzent- wurf handelt, der von der Bundesregierung noch nicht beschlossen worden ist. Dem Gesetzentwurf können die Begründung und das Vorblatt beigefügt werden.

§ 48 Unterrichtung anderer Stellen

(1) Sollen die Presse sowie andere amtlich nicht beteiligte Stellen oder sonstige Personen Gesetzentwürfe aus den Bundesministerien erhalten, bevor die Bundesregierung sie beschlossen hat, bestimmt das federführende Bundes- ministerium, bei grundsätzlicher politischer Bedeutung das Bundeskanzleramt, in welcher Form dies geschehen soll.

(2) Wird ein Gesetzentwurf den Ländern, den beteiligten Fachkreisen oder Verbänden beziehungsweise Dritten im Sinne von Absatz 1 zugeleitet, so ist er den Geschäftsstellen der Fraktionen des Deutschen Bundestages, dem Bundes- rat und auf Wunsch Mitgliedern des Deutschen Bundes- tages und des Bundesrates zur Kenntnis zu geben.

(3) Über die Einstellung des Gesetzentwurfs in das Intranet der Bundesregierung oder in das Internet entscheidet das federführende Bundesministerium im Einvernehmen mit dem Bundeskanzleramt und im Benehmen mit den übrigen beteiligten Bundesministerien.

(4) Bei der Unterrichtung nach Absatz 1 bis 3 gilt § 47 Absatz 4 entsprechend.

§ 49 Kennzeichnung und Übersendung der Entwürfe

(1) Gesetzentwürfe sind mit dem Datum und dem Zusatz „Ent- wurf“ zu versehen. Änderungen gegenüber dem jeweils vorangegangenen Entwurf sind kenntlich zu machen.

(2) Bei der Übersendung ist darzulegen, ob es sich um ein Gesetzgebungsvorhaben handelt, das der Zustimmung des Bundesrates bedarf.

§ 50 Frist zur abschließenden Prüfung

Die Frist zur abschließenden Prüfung des Gesetzentwurfs durch die nach den §§ 44, 45 und 46 Beteiligten beträgt in

36 Rechtsetzung

der Regel vier Wochen. Sie kann verkürzt werden, wenn alle Beteiligten zustimmen. Bei umfangreichen oder rechtlich schwierigen Entwürfen verlängert sich die Frist auf acht Wochen, wenn dies von einem Ressort im Rahmen der Beteiligung nach § 45 beantragt wird.

Abschnitt 4 Behandlung von Gesetzentwürfen durch die Bundesregierung

§ 51 Vorlage an das Kabinett

Werden Gesetzesvorlagen nach Abschnitt 3 der Bundesre- gierung zum Beschluss vorgelegt, ist im Anschreiben zur Kabinettvorlage unbeschadet des § 22 anzugeben, 1. ob die Zustimmung des Bundesrates erforderlich ist, 2. dass das Bundesministerium der Justiz die Prüfung

nach § 46 Absatz 1 bestätigt hat, 3. dass die Anforderungen nach § 44 erfüllt sind, 4. welche abweichenden Meinungen aufgrund der Betei-

ligungen nach den §§ 45 und 47 bestehen, 5. mit welchen Kosten die Ausführung des Gesetzes Bund,

Länder oder Kommunen belastet und ob das Bundes- ministerium der Finanzen und die in den §§ 44, 45 genannten Stellen ihr Einverständnis erklärt haben,

6. ob der Nationale Normenkontrollrat nach § 45 Absatz 2 zu dem Gesetzentwurf Stellung genommen hat und ob hierzu der Entwurf einer Stellungnahme der Bundesre- gierung vorliegt,

7. inwieweit im Falle der Umsetzung einer Richtlinie oder sonstiger Rechtsakte der Europäischen Union über deren Vorgaben hinaus weitere Regelungen getroffen werden,

8. ob die Vorlage ausnahmsweise besonders eilbedürftig ist (Artikel 76 Absatz 2 Satz 4 Grundgesetz).

§ 52 Einheitliches Vertreten der Gesetzesvorlagen; Formulierungshilfe für den Deutschen Bundestag und den Bundesrat

(1) Die von der Bundesregierung beschlossenen Gesetzes- vorlagen sind vor dem Deutschen Bundestag und dem Bundesrat einheitlich zu vertreten, auch wenn einzelne Bundesministerien eine andere Auffassung hatten.

34 Legislation

This shall not affect the responsibility of the Federal Minister for urgent projects in his or her portfolio.

§ 46 Legal system and legal form examination

(1) Before a bill is submitted to the Federal Government for a decision, it shall be forwarded to the Federal Ministry of Justice for examination from a legal system and legal formal point of view (legal examination).

(2) When sending the draft, it shall be taken into account that in the case of drafts of a larger volume, the Federal Ministry of Justice must have sufficient time to examine and discuss issues arising during the examination pursuant to paragraph 1.

(3) If the Federal Ministry of Justice has participated in the preparation of a draft and has already subjected it to the examination pursuant to paragraph 1, it may, with its consent, refrain from submitting the draft again.

§ 47 Participation of Länder, municipal umbrella organizations, specialist groups and associations

(1) The draft of a bill shall be forwarded to the Länder, the central associations of local authorities and the representations of the Länder to the Federation as early as possible if their interests are affected. If the opinion of a federal ministry involved is likely to differ on essential points, the bill shall be forwarded only in agreement with that ministry. If the project is to be treated confidentially, this must be noted.

(2) The Federal Chancellery shall be informed of the participation. In the case of draft legislation of particular political significance, its consent must be obtained.

(3) Paragraphs 1 and 2 shall apply mutatis mutandis to the timely participation of central and general associations and of expert groups existing at the federal level. The timing, scope and selection shall be left to the discretion of the lead Federal Ministry, unless special provisions exist.

Subscribe to DeepL Pro to translate larger documents. Visit www.DeepL.com/pro for more information.

Legislation 35

(4) When participating in accordance with subsections (1) and (3), express reference shall be made to the fact that the bill in question has not yet been adopted by the Federal Government. The explanatory memorandum and the preliminary sheet may be attached to the draft bill.

§ 48 Informing other bodies

(1) If the press and other agencies not officially involved or other persons are to receive draft laws from the federal ministries before the federal government has passed them, the lead federal ministry, or the Federal Chancellery in the case of fundamental political importance, shall determine the form in which this is to be done.

(2) If a bill is forwarded to the Länder, the specialist groups or associations concerned or third parties within the meaning of subsection 1, it shall be made available to the offices of the parliamentary groups of the German Bundestag, the Bundesrat and, on request, to members of the German Bundestag and the Bundesrat.

(3) The lead Federal Ministry shall decide on the posting of the bill on the intranet of the Federal Government or on the Internet in agreement with the Federal Chancellery and in consultation with the other Federal Ministries involved.

(4) Section 47 (4) shall apply mutatis mutandis to information provided in accordance with (1) to (3).

§ 49 Marking and sending the drafts

(1) Draft bills shall be marked with the date and the addition "Draft". Amendments to the previous draft shall be indicated.

(2) When sending the bill, it must be stated whether it is a legislative project requiring the consent of the Bundesrat.

§ 50 Deadline for the final examination

The period for final consideration of the bill by the parties involved under sections 44, 45, and 46 shall be as follows in

36 Legislation

usually four weeks. This period may be shortened if all parties involved agree. In the case of extensive or legally difficult drafts, the period shall be extended to eight weeks if requested by a department within the framework of participation pursuant to Section 45.

Section 4 Treatment of bills by the federal government

§ 51 Submission to the Cabinet

If bills under section 3 are submitted to the Federal Re- gime for decision, the cover letter to the Cabinet submission shall state, notwithstanding section 22, 1. whether the consent of the Bundesrat is required, 2. that the Federal Ministry of Justice has confirmed the

examination pursuant to Section 46(1), 3. that the requirements according to § 44 are fulfilled, 4. which divergent opinions exist on the basis of the

participations pursuant to Sections 45 and 47, 5. with which costs the implementation of the Act will

burden the Federal Government, the Länder or local authorities and whether the Federal Ministry of Finance and the bodies referred to in sections 44, 45 have declared their consent,

6. whether the National Standards Control Council has issued an opinion on the bill pursuant to section 45(2) and whether a draft statement by the Federal Government is available in this regard,

7. the extent to which, in the case of implementation of a directive or other legal acts of the European Union, further regulations are made beyond their requirements,

8. whether the submission is exceptionally particularly urgent (Article 76 (2) sentence 4 of the Basic Law).

§ Section 52 Uniform Representation of Bills; Formulation Guide for the German Bundestag and the Bundesrat

(1) Bills passed by the Federal Government shall be presented uniformly before the German Bundestag and the Bundesrat, even if individual Federal Ministries took a different view.

Annex_2_ArbZG_paras3-9_GER_ENG.pdf

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;����������+��

������ ����� �� �����&#xd;�� �����

�� ���&#xd;���� �����������������

�����&#xd;����

�����&#xd;� ��&#xd;��� ����������� ��&#xd;������!"�"���#�$����%�&���%�'&��� ������������()����&#xd;*������� ��� ���� �����++� ,��������+�+��!"�"���#�$��---�'���.������/������&#xd; ��

������ ����������.���������()�������������++��+�+�+��#�---�

�������

��������� ��&#xd;���� ���������������� ������������ ����� �������� !"��#�$�%�����&�"�'�(���%�%"#�#�� �������)�*��+������� �)��,���,�,##%�-�%##,����� . � ,� ���/������� ����&#xd;*�������������������#���%��!����0�'�����"���� ������ ()�� ����1 �&#xd; ������������+��$����+ �&#xd;� � ��������%������&#xd;��2�� �����������

�� ������ ������ �������������� �������� �!<&#xd;"��#��� ��� �� �

�/�(*��� ��� ���� �&#xd; ��� & ��NN �&#xd;��$&#xd;()��)�&#xd;������������ ���)�&#xd;� ()������������&#xd;���)����&#xd;������"���� ��3���&#xd;*�,��� ()����

����&#xd;�������� ()�&#xd;�4�&#xd;()���5&#xd;�� ()� � �������&#xd;���������&#xd;� ��&#xd;��� �������������/.)���&#xd; ���������&#xd;� 0�)������&#xd;�������� 6�� ��7&#xd;��������&#xd;� ��&#xd;������������ ���� �/&#xd;� N

+�NN ����$������������&#xd;�� �����&#xd;()�����*�������8�&#xd;��������� �9������������&#xd;� ��)���������� ���&#xd; ()���1�)����� ��������&#xd;���)������� ()6����� N

�$<%������ �� ���������

!�'�����&#xd;� ��&#xd;��&#xd;��$&#xd;�����&#xd;� � ��� ���� �&#xd; ���&#xd;����&#xd;������"��&#xd;����&#xd; �����1������������&#xd;���)����&#xd;��0�)�3�� ��: ����&#xd;� ��&#xd;������&#xd;���)����������&#xd;�������� &#xd;����� ���������()�����#��"�������������9�����.)�����&#xd;� 0�)�3�� �����������&#xd;� ��&#xd;��

!+'�����&#xd;���)����&#xd;��$&#xd;�����&#xd;� � ��� ���� � &#xd;�������&#xd;������������ ������� �/&#xd;���&#xd;�����&#xd;)����"��� �&#xd;����� "� (). �&#xd;�����

!-'�;�()���&#xd;��&#xd;��$&#xd;�����&#xd;� � ��� ���� �&#xd; ���&#xd;����&#xd;������+-��&#xd; ���<)�&�&#xd;��".(*���&#xd;�������2���&#xd;����&#xd;����&#xd;����&#xd;����� ++��&#xd; �=�<)��

!�'�;�()�����&#xd;��&#xd;��$&#xd;�����&#xd;� � ��� ���� �&#xd; ��>��������&#xd;�&��&#xd;����)���� ��/�&#xd;�$�����������;�()���&#xd;���� �4��

!='�;�()�����&#xd;���)����&#xd;��$&#xd;�����&#xd;� � ��� ���� � &#xd;�������&#xd;���)���&��&#xd;� ��NN �� �������&#xd;)��������&#xd;� ��&#xd;��� ����������������/�&#xd; ��;�()�����&#xd;��&#xd;��5�() �� ()&#xd;()�������&#xd; ����)���������

N

+�NN ;�()�����&#xd;������&#xd;��� ��� ��?�9�����&#xd;��2�������>�)����&#xd; ���� N

&#xd;"�������� ������ &��#�'������������� �������������� ������&#xd;����� �(<������ ��������������������

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;���+������+��

,&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;����������&#xd;���)������� ��()��$��������&#xd;()��6��� ()��&#xd;�����$&#xd;��*������ ��&#xd; ������)� $���������������.������/�����&�/����&#xd;����)�������� �() �2��������������������&#xd;����)��������+��5�()���&#xd;� ,��() ()�&#xd;����()��$�������/��*�.��&#xd;()��&#xd;()��6��� ()�&#xd;�����/������

�)<*���+�� ��

,&#xd;������&#xd;��&#xd; �����()�&#xd;������� � � � ��)�����0�)�3�� ��������&#xd;��� ��� �-��@&#xd;��������&#xd;��&#xd;��������&#xd;� ��&#xd;� ������)���� � �() ��&#xd; ���������$������������=�@&#xd;��������&#xd;��&#xd;��������&#xd;� ��&#xd;��������)���� ������$������ &#xd;� �� ���������������()����,&#xd;��0�)�3�� �����()�$������*A�����&#xd;����&#xd;��� ()�&#xd;��������>�/�&#xd;� ��&#xd;��� ��� ��= @&#xd;�������� ����&#xd;���/�������B.������� � �() �$�������)&#xd;�����&#xd;��������6� �������&#xd;���)�����&#xd;()���)���0�)�3�� � �� (). �&#xd;���/������

�,<*��� ���

!�'�,&#xd;������&#xd;���)�����6 �����()�"����&#xd;����������.��&#xd;()�������&#xd;� ��&#xd;���&#xd;�������������()����0�)���&#xd;����� �&#xd;��� ��� ��� �$�������)�����

!+'�,&#xd;��,���������0�)���&#xd;���� ��� ���� ���*����&#xd;��2���*��).� ����������������1&#xd;��&#xd;()�����������"�)�������& C ���������"�������������C�� ����&�&#xd;���� � �.�����������������1&#xd;��&#xd;()�����������"�/&#xd;����������"�)���������& &#xd;�����*�)� ����&#xd;����&���&#xd;��0��� ��*� �/&#xd;��&#xd;������B���/&#xd;�� ()� ������&#xd;������9&#xd;��)�����������&#xd; �����&#xd;���$����� ���*6����/�����&�/����>�������*6����������0�)���&#xd;��&#xd;����)�����&#xd;�� �2������������ ������&#xd;����)���������&#xd;�� 5�()������()�����.���������&#xd;������������0�)���&#xd;���� ��&#xd;��� ��� ��/A� �$��������� ����&#xd;()���/&#xd;���

!-'���/�&#xd;()���������� ������*A�����&#xd;��2���*��).� ����������������1&#xd;��&#xd;()�����������"�)�������&�C �������� "�������������C�� �����26������������0�)���&#xd;�����()�#��� 3��()��)����/.)���������0� ����&#xd;� ()� �&��&#xd;� �&#xd;()����)���� ��&#xd;��D.� �������0�)���&#xd;����������&��������������&#xd;������ ����&#xd;()���/������

!�'�!/���� �����'

�-<.�����������������������

!�'�,&#xd;������&#xd;� ��&#xd;������;�()�������$()&#xd;()�����&#xd;���)����&#xd; ����()������� &#xd;()����������&#xd;� /&#xd; �� ()� ��&#xd;()�� 1�*�����&#xd; ���6�����&#xd;����� ()������()����� ����������������&#xd;�� � ���������

!+'�,&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;������;�()�����&#xd;���)������� ��()��$��������&#xd;()��6��� ()��&#xd;�����$&#xd;��*������ ��&#xd; �����)��$���������������.������/�����&�/������/�&#xd;()��������E�-�&#xd;����)���������&#xd;����2����������������� &#xd;����)���������&#xd;���5�()���&#xd;��,��() ()�&#xd;����()��$�������/��*�.��&#xd;()��&#xd;()��6��� ()�&#xd;�����/�������86����&#xd;��.���& &#xd;��������;�()�����&#xd;���)����&#xd;��$&#xd;������ �E�+��� ��=�;���+��&#xd;()������;�()�����&#xd;��)������������/�����&� &#xd;�����E�- $����+���/�������

!-'�;�()�����&#xd;���)���� &#xd;�������()�&#xd;��&� &#xd;()�����"��&#xd;�������"� (). �&#xd;�������������()�&#xd;��������.4&#xd;��� ��&#xd;��� �.����������&#xd;()��/��&#xd;������ ����&#xd;� �)��������&#xd;� ���&#xd;�&#xd;�&#xd; ()������ �()�������� ����;�()�������������� =���B���� >�)�� � ��)��;�()�����&#xd;���)������&#xd;� � �0�()��&#xd;����&#xd;��� �.����������&#xd;���� �)������,&#xd;��2� ������� <���� �()������)�����������&#xd;����������������&� � ��������&#xd;��<���� �()����������;�()�����&#xd;���)������&#xd;()� *� ����� ����()��&#xd;����"���&#xd;�� �����������&#xd;����6�������&#xd;���&#xd;()���,&#xd;�� ������"���&#xd;�� .���������&#xd;�����

!�'�,�������&#xd;�������)�������;�()�����&#xd;���)������ ��� ��������������� ��&#xd;���� 6��&#xd;)�����&#xd;������ 9��� ����&#xd;� 3��������� �����&�/��� �'NN ��()�����&#xd;� ���&#xd;�&#xd;�&#xd; ()���8� � ���������&#xd;��/�&#xd;���������&#xd;()���������;�()�����&#xd;����������&#xd;���)����&#xd;�� �&#xd;���

�� ���)�&#xd;���� .)��������� N

�'NN &#xd;��D�� )������ �����&#xd;���)��� ��&#xd;��2&#xd;����������/A� � �)��������&��� ��&#xd;()�������&#xd;������������&#xd;��D�� )��� ���������C�� �����������/������*���&����� N

('NN ��������&#xd;���)�����&#xd;���� ()/��3 �������6� �&#xd;��������)A�&#xd;���������� ������)��&������&#xd;()�������&#xd;��� ��������&#xd;��D�� )�����������������)A�&#xd;������� �����/������*���& N

� ���������&#xd;()����&#xd;�����������&#xd;���&#xd;()��1� ������&#xd; ���������� ��)����$��)�������<� ��������� ;�()�����&#xd;���)��� ��� ��&#xd;���� 6��&#xd;)�����&#xd;�������9��� ����&#xd;� 3�������()��� � ������ �����&#xd;������ ��&#xd;�����������&#xd;���&#xd;()��1� ������&#xd; ����������&� ��&#xd; ������"���&#xd;�� �������C�� �����������)A�����,���"���&#xd;�� ������ C�� ��������*������������&#xd;���������� ()�.��� 6���&#xd;���<� ���������������&#xd;����

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;���-������+��

!='�$�/�&#xd;��*�&#xd;������&#xd; ��������&#xd;()����� ���&#xd;() ������������� ��)��&�)�����������&#xd;�����������;�()�����&#xd;���)��� 6���&#xd;��/.)���������;�()���&#xd;������&#xd; ����������&#xd;� ��������&#xd;��������� ������)������)����� ��&#xd;���9�������� �&#xd;���������� ������� ()������ ��� �&#xd;)��)&#xd;�� 6���� ��)�����"���������&#xd;� �������������/.)����

!�'�1 �&#xd; �� &#xd;()���� ������&���4�;�()�����&#xd;���)�����������&#xd;()������������������&#xd;���&#xd;()���5�&#xd;����&#xd;������������ �� �&#xd;�� A���������@�4��)����)�����/&#xd;���&#xd;��6��&#xd;��������&#xd;���)����

�/<��"���������*���������

!�'�#���&#xd;����9��&#xd; ��������������� ��������&#xd;�� �9��&#xd; ������� �&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;��������*��� ������ ���/�����& ��NN ��/�&#xd;()��������E�-

�'NN �&#xd;������&#xd;� ��&#xd;��6������)��$�������/��*�.��&#xd;()��������.�����&�/����&#xd;���&#xd;������&#xd;� ��&#xd;��������.4&#xd;����� &#xd;����)���&#xd;()���<� ��������&#xd;� ����&#xd;� ()� �������"���&#xd;� ()� � �&#xd;�� �� .���& N

�'NN �&#xd;�������������� ���&#xd;() ��&#xd;������ � ��������& N

('NN !/���� �����' N

N

+�NN ��/�&#xd;()��������E���$����+��&#xd;���� �������������0�)�3�� ���&#xd;��$()&#xd;()�����&#xd;������������*�)� ����&#xd;���� �� �2���3�� ������������� �����,������� ����&#xd;���& N

-�NN ��/�&#xd;()��������E�=��� �����&#xd;��0�)���&#xd;������&#xd; �����/�&#xd;�$����������*6����&�/�����&#xd;��������������&#xd;���&#xd;� �� ������������&#xd;��26����������0�)���&#xd;��&#xd;����)�����&#xd;�� � � �������������� ���&#xd;() ��&#xd;����� ��� ����&#xd;()�� /&#xd;��& N

��NN ��/�&#xd;()��������E����� ��+ �'NN �&#xd;������&#xd;� ��&#xd;��6������)��$�������/��*�.��&#xd;()�)&#xd;��� ��������.�����&�/����&#xd;���&#xd;������&#xd;� ��&#xd;�

������.4&#xd;������&#xd;����)���&#xd;()���<� ��������&#xd;� ����&#xd;� ()� �������"���&#xd;� ()� � �&#xd;�� �� .���& N

�'NN �&#xd;�������������� ���&#xd;() ��&#xd;������ � ��������& N

N

=�NN ����"��&#xd;����� � &#xd;���� �6��&#xd;����;�()���&#xd;����� ��� �E�+��� ��-��� ��&#xd;����&#xd;���/&#xd; ()���++�����+��<)� � ��������� N

!+'�$� ���������� ���)�&#xd;� ()������������&#xd;���)�������()��&#xd;������� 3��()��������&#xd;��� ���&#xd;()���/.)���&#xd; ��� /&#xd;��&�*����&#xd;���&#xd;����9��&#xd; ��������������� ��������&#xd;�� �9��&#xd; ������� �&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;������� ������������ ���/�����& ��NN ��/�&#xd;()��������E�=��� �����&#xd;��0�)���&#xd;������&#xd;�0� ����&#xd;� ()� ������"� �����)�&#xd;�����&#xd;� � �,&#xd;�� ��

����3� ��&�&#xd;� �� �������26������������0�)���&#xd;��&#xd;� ���������#��� 3��()��)����/.)������&#xd;� � �,&#xd;�� �� �������������&#xd;������ �����&#xd;()��& N

+�NN �&#xd;��0��������������EE�-&�=��� ��������E����� ��+�&#xd;������B���/&#xd;�� ()� ������"� ������� ������1������&#xd;�� �/&#xd;� ����5&#xd;������� �&#xd;� �6 �������3� ��& N

-�NN �&#xd;��0��������������EE�-&��&�=��� ��������E����� ��+���&#xd;�����"�)�������&�C ���������"�������������C�� ���� ����1&#xd;��������&#xd;� ���9.�&#xd;�*�&#xd;����������5�)���&#xd;� ���C�� �������� 3��()��������3� ��& N

��NN �&#xd;��0��������������EE�-&��&�=��� ��������E����� ��+���&#xd;����/�������������"���&#xd;������� �"���� &���� B.����&���������&#xd;��������� �� �&#xd;����2A�3�� ()� ���&��� �����������$�&#xd; ��������� �A ����&#xd;()���0�()� �/&#xd;����&#xd;�������������&#xd;�������&��&#xd;������9��&#xd; �&#xd;�������&#xd;�� � 6������A ����&#xd;()���,&#xd;�� ������������������&#xd;�� &#xd;��/� ����&#xd;()���&#xd;�)��� ���&#xd;()���9��&#xd; ������� �������&#xd;����&�����1&#xd;�����������9.�&#xd;�*�&#xd;����&#xd;��&#xd;� ���$������ ����3� ��� N

!+�'�#���&#xd;����9��&#xd; ��������������� ��������&#xd;�� �9��&#xd; ������� �&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;��������*��� ��/�&#xd;()������������EE�-&�=��� ��������E����� ��+������� ���/�����&��&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;����()��)�� �� ���&#xd;()�6�����()��$��������������.�����&�/����&#xd;���&#xd;������&#xd;� ��&#xd;��������.4&#xd;������&#xd;����)���&#xd;()���<� ��� ����&#xd;� ����&#xd;� ()� �������"���&#xd;� ()� � �&#xd;�� �� .�����������()��� �������0���������� &#xd;()���� ������/&#xd;��&��� ��&#xd;� �� ���)�&#xd;����������&#xd;���)�����&#xd;()���� .)�����/&#xd;���

!-'�#��������� ����&#xd;()��&#xd;�� �9��&#xd; ������� ���()��� �����&�+������+��*A�������/�&#xd;()��������&#xd; ��������&#xd;()� 0����������&#xd;��"���&#xd;����&#xd;�� ��&#xd;()�����&#xd; ���������������&#xd;������ ����()�"���&#xd;�� �������,&#xd;�� �����&#xd;������������& /�����&#xd;��"���&#xd;�� �������C�� ���������&#xd;()���� ��)�&����()� ()�&#xd; ��&#xd;()������&#xd;���������/&#xd; ()�����������&#xd;������

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;����������+��

������������&#xd;���)����6����������/�������2A������� ��������&#xd;�� � ��()���9��&#xd; ������� ���/�&#xd;()���� 0����������&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;������������� ���/�����&�*������()�&#xd;��"���&#xd;������&#xd;�� �&#xd;()�����&#xd; ���������������&#xd;������ �������������()�����()��/�������1&#xd;�����()��� ����+�;���������� ��� ��/�&#xd;()��������&#xd; ��������&#xd;()��0��������)����/&#xd; ()����&#xd;()�����&#xd; ���������������&#xd;����������������&#xd;���)���� �������&�/�����/&#xd; ()���&#xd;)�����&#xd;����/����������� 6������A ����&#xd;()���,&#xd;�� ���������������&#xd; ��������&#xd;()�� "� �&#xd;������������&#xd;������&#xd; �������&#xd;������&#xd;��������&#xd;��2� ������ �"���&#xd;�� �6���/&#xd;�������&#xd;����/���������&#xd;� $&#xd;������ �D�� )��� ��()� ���(*���

!�'�,&#xd;��2&#xd;�()��������&#xd;��A ����&#xd;()���()��&#xd;()���0��&#xd;�&#xd;�� �� ��� ()� ����*A������&#xd;��&#xd;���� �����&�+������+� ������������/�&#xd;()������&#xd;��&#xd;)����0������������� �)���

!='�#���&#xd;����"���&#xd;()&�&#xd;������0�������������()�9��&#xd; ��������6��&#xd;()��/�&#xd; ���&#xd;()������� ���/�����&�*A���� �� ��)����&#xd;��0�)������ ��� ���� ��&�+������+�����()��&#xd;���� &#xd;()� ��)A������/&#xd;��&#xd;���/�����&�/�����&#xd;� ��� ����&#xd;���&#xd;()�����6������� ������&#xd;()�&#xd; �������&#xd;���� ���)�&#xd;����������&#xd;���)�����&#xd;()���� .)�����/&#xd;���

!�'�,&#xd;��"���� ���&#xd;������*�������()�0�()� ������������&#xd;���� �&#xd;�������� �"���� ���� ��� ��)����&#xd;� 0�)������ ��� ���� ��������+����� ��&� � �����&#xd;� ��� �����&#xd;���&#xd;()�����6������� ������&#xd;()�&#xd; �������&#xd;� �� ���)�&#xd;����������&#xd;���)�����&#xd;()���� .)�����/&#xd;���

!%'��� ��������&#xd;����0����������()��� ����+������������� .�����-��&#xd; �=�>�/�&#xd;� �&#xd;������&#xd;�������&#xd;���� ����+� ��� ��&#xd;������&#xd;� ��&#xd;����������.������/�����&�/������������&#xd;���)���� ()�&#xd; ��&#xd;()��&#xd;���/&#xd;��&#xd;���)����,�������&#xd;���)��� *�����&#xd;��1&#xd;�/&#xd;��&#xd;������&#xd;���&#xd;����8�&#xd; ������ �() �@������� ()�&#xd; ��&#xd;()�/&#xd;����� ����,�������&#xd;���������� ��&#xd;��� ����&#xd;���)�����&#xd;()������()��&#xd;�&#xd;���&�/�&#xd;���&#xd;� ����&#xd;��1&#xd;�/&#xd;��&#xd;�������������.����������������&#xd;� ��&#xd;���&#xd;()����*�.�� ������&#xd;��1&#xd;�/&#xd;��&#xd;�����/&#xd;����� ���)���

!?'�5������0������������()��� ������;����������&��� ����+�;���+��&#xd; �������� ��()��0������������ ������ ������ .����-������������� ��&���� ��&#xd;������&#xd;� ��&#xd;���?�$�������/A()����&#xd;()�&#xd;��,��() ()�&#xd;��������/A� 2����������������&#xd;()��6��� ()��&#xd;�����1� ������&#xd;������ ������ ��������� ��� ���� �=&���� ��&#xd;������&#xd;� ��&#xd;���? $�������/A()����&#xd;()�&#xd;��,��() ()�&#xd;������� �() �2��������������������+��5�()����&#xd;()��6��� ()��&#xd;����

!�'�5&#xd;����&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;��6�����/A� �$�������)&#xd;��� �����.�����&��� �&#xd;�����&#xd;������������ ()�� ��� �&#xd;��"����&#xd;�������������&#xd;� ��&#xd;���&#xd;���0�)���&#xd;�������&#xd;��� ��� ��� �$���������/.)���/������

�0<���'����������������

,&#xd;��"���� ���&#xd;������*�������()�0�()� ������������&#xd;���� �&#xd;�������� �"���� ���� � 6���&#xd;������ "� (). �&#xd;���� ����&#xd;()�&� 6���� �&#xd;���������&#xd;��������� 6���� �&#xd;���������&#xd;���)������33��&���&#xd;��������� ������ �� �)���� 6���&#xd;���� ���)�&#xd;����������&#xd;���)���������/������ &#xd;��&��&#xd;������&#xd;� ��&#xd;��6����E�-�)&#xd;��� ��� ()�.�*��& �&#xd;��0�)�3�� �������0�)���&#xd;����6�����&#xd;��EE�������=�)&#xd;��� ��� ��)���&��&#xd;��0��������������$()��������;�()�� ����$()&#xd;()�����&#xd;���)����&#xd;��E�����/�&#xd;����������&#xd;����/�&#xd;()��� �A��&#xd;()*�&#xd;������()�E�%��� ()�.�*��&� �/�&#xd;���&#xd;� ����$()���������� ���)�&#xd;����������&#xd;���)������ ������&#xd;()�&#xd; ���$�������&#xd;����&#xd;()�� 6��"� (). �&#xd;���� ����&#xd;()����� ����&#xd;����&#xd;��"���&#xd;����&��&#xd;������"����� &#xd;()��������&#xd;�����

1��������� ������ ������������������ ���� �2<������������������ ����

!�'�����&#xd;���)�����6� ������$����������� ����&#xd;()���8�&#xd;���������������&#xd; �+��<)���&#xd;()���� (). �&#xd;���/������

!+'�#����)� ()&#xd;()�&#xd;����"���&#xd;������&#xd;��������.4&#xd;����9��������;�()� ()&#xd;()��*����"��&#xd;��������1��������$���� ����8�&#xd;����� ��)������&#xd; ���� �() �$��������������������6(*��������/�����&�/���� 6���&#xd;���� �����"��&#xd;������ 0�)���&#xd;�� ���������+��$�����������"���&#xd;�����)��

!-'�86��2�� � �)��������"�&#xd; �)����*��������"��&#xd;�������+� �6��&#xd;����$���������8�&#xd;����� ��)������&#xd; �����/�&#xd; $������������������/������

�!3<������������������ �� ��'�������

!�'�$� �����&#xd;������&#xd;�����&#xd;()�����5��*������������������/������*A����&��6� �������&#xd;���)�������$�������� 8�&#xd;����������/�&#xd;()��������E����� (). �&#xd;���/����� ��NN &#xd;��;��������0������ �&#xd;�� ���� �/&#xd;����&#xd;�����8����/�)�&

N

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 1 from 12 -

Working Hours Act (ArbZG) ArbZG

Date of issue: 06.06.1994 Full

citation:

"Working Hours Act of June 6, 1994 (BGBl. I p. 1170, 1171), as last amended by Article 6 of the Act of December 22, 2020 (BGBl. I p. 3334)."

Status: Last amended by Art. 6 G v. 22.12.2020 I 3334

Footnote

(+++ Text reference as of: 1.7.1994 +++) (+++ Implementation of the

EGRL 104/93 (CELEXNr: 393L0104) cf. art. 4b G v. 24.12.2003 I 3002 +++)

The G was passed by the Bundestag as Article 1 G v. 6.6.1994 I 1170 (ArbZRG). It entered into force on July 1, 1994 in accordance with Art. 21 Sentence 2 of this Act.

Section One General Provisions § 1 Purpose of the law

The purpose of the law is to, 1. to ensure the safety and health protection of employees in the Federal Republic of Germany and

in the exclusive economic zone with regard to the organization of working hours and to improve the framework conditions for flexible working hours, and

2. to protect Sundays and state-recognized holidays as days of rest from work and spiritual upliftment for employees.

§ 2 Definitions

(1) For the purposes of this Act, working time is the time from the beginning to the end of work, excluding rest breaks; periods of work for several employers are to be added together. In underground mining, rest breaks count as working time.

(2) Employees within the meaning of this Act are blue- and white-collar workers and those employed for their vocational training.

(3) For the purposes of this Act, night time means the time from 11 p.m. to 6 a.m., and in bakeries and confectioneries the time from 10 p.m. to 5 a.m.

(4) For the purposes of this Act, night work is any work that involves more than two hours of night time.

(5) Night workers within the meaning of this Act are employees who 1. are normally required to work night shifts due to their work schedule, or 2. Perform night work for at least 48 days in a calendar year.

Second section Working hours and non-working hours per working day § 3 Working time of employees

Subscribe to DeepL Pro to translate larger documents. Visit www.DeepL.com/pro for more information.

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 2 from 12 -

The working day of employees may not exceed eight hours. It may be extended to up to ten hours only if an average of eight hours per working day is not exceeded within six calendar months or within 24 weeks.

§ 4 Rest breaks

Work shall be interrupted by pre-established rest breaks of at least 30 minutes if the working time exceeds six to nine hours and 45 minutes if the working time exceeds nine hours. to be interrupted altogether. The rest breaks in accordance with sentence 1 may be divided into periods of at least 15 minutes each. Employees may not be employed for longer than six consecutive hours without a rest break.

§ 5 Rest period

(1) Employees must have an uninterrupted rest period of at least eleven hours after the end of the daily work period.

(2) The duration of the rest period of paragraph 1 may be reduced by up to one hour in hospitals and other facilities for the treatment, care and supervision of persons, in restaurants and other facilities for catering and lodging, in transport companies, in broadcasting as well as in agriculture and animal husbandry, if each reduction of the rest period is compensated for within one calendar month or within four weeks by extending another rest period to at least twelve hours.

(3) By way of derogation from paragraph 1, in hospitals and other facilities for the treatment, nursing and care of persons, reductions in the rest period may be compensated for by taking time off during on-call duty that does not exceed half of the rest period at other times.

(4) (omitted)

§ 6 Night and shift work

(1) The working hours of night and shift workers shall be determined in accordance with established ergonomic findings on the humane organization of work.

(2) The working hours of night workers may not exceed eight hours per working day. It may be extended to up to ten hours only if, by way of derogation from § 3, an average of eight hours per working day is not exceeded within one calendar month or within four weeks. For periods in which night workers within the meaning of Section 2 (5) No. 2 are not required to perform night work, Section 3 Sentence 2 shall apply.

(3) Night workers are entitled to an occupational health examination prior to commencement of employment and at regular intervals of not less than three years thereafter. After completion of the After the age of 50, night workers are entitled to this right at intervals of one year. The costs of the examinations shall be borne by the employer, unless he offers the examinations to the night workers free of charge by a company physician or an inter-company service of company physicians.

(4) The employer shall transfer the night worker to a daytime workplace suitable for him at his request if a) according to occupational medicine, the continued performance of night work endangers the employee's

health, or b) there is a child under the age of twelve living in the employee's household who cannot be cared for by

another person living in the household, or c) the employee has to care for a relative in need of severe care who cannot be cared for by another

relative living in the household, unless there are urgent operational requirements to the contrary. If, in the opinion of the employer, urgent operational requirements prevent the transfer of the night worker to a daytime workplace suitable for him, the works council or personnel council must be consulted. The works council or personnel council may submit proposals to the employer for a transfer.

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 3 from 12 -

(5) In the absence of compensation provisions in collective agreements, the employer shall grant the night worker an appropriate number of paid days off for the hours worked during night time or an appropriate supplement to the gross remuneration to which he is entitled for this purpose.

(6) Ensure that night workers have the same access to in-company training and upward mobility measures as other workers.

§ 7 Deviating regulations

(1) may be permitted in a collective bargaining agreement or on the basis of a collective bargaining agreement in a works or service agreement, 1. deviating from § 3

a) to extend the working time beyond ten hours per working day if the working time regularly and to a considerable extent includes standby duty or on-call duty,

b) to determine a different compensation period, c) (omitted)

2. in deviation from § 4 sentence 2, to divide the total duration of rest breaks in shift work and transport operations into short breaks of appropriate duration,

3. in derogation of Section 5 (1), to reduce the rest period by up to two hours if the nature of the work so requires and the reduction in the rest period is compensated for within a compensation period to be specified,

4. deviating from § 6 para. 2 a) to extend the working time beyond ten hours per working day if the working time regularly

and to a considerable extent includes standby duty or on-call duty, b) to determine a different compensation period,

5. to set the beginning of the seven-hour night period of § 2 para. 3 to the time between 10 p.m. and midnight.

(2) Provided that the health protection of the employees is ensured by an appropriate compensation of time, a collective agreement or, on the basis of a collective agreement, a works or service agreement may further permit this, 1. in deviation from § 5 Para. 1, to adjust the rest periods during on-call duty to the special features of this

duty, in particular to compensate for reductions in the rest period as a result of utilization during this duty at other times,

2. to adapt the regulations of §§ 3, 5 para. 1 and § 6 para. 2 in agriculture to the tilling and harvesting season as well as to the weather conditions,

3. to adapt the regulations of §§ 3, 4, 5 para. 1 and § 6 para. 2 in the treatment, care and support of persons in accordance with the nature of this activity and the welfare of these persons,

4. to adapt the provisions of Sections 3, 4, 5 (1) and 6 (2) in the case of administrations and establishments of the Federal Government, the Länder, the municipalities and other corporations, institutions and foundations under public law, as well as in the case of other employers who are subject to a collective bargaining agreement applicable to the public service or to a collective bargaining agreement with substantially the same content, to the specific nature of the work performed at such establishments.

(2a) By way of derogation from Sections 3, 5(1) and 6(2), a collective agreement or a works or service agreement based on a collective agreement may permit the working day to be extended beyond eight hours without compensation if the working time regularly and to a considerable extent includes standby duty or on-call duty and special arrangements are made to ensure that the health of the employees is not endangered.

(3) Within the scope of application of a collective bargaining agreement pursuant to subsections 1, 2 or 2a, deviating collective bargaining provisions in the business of an employer who is not bound by a collective bargaining agreement may be adopted by means of a works or service agreement or, if there is no works or staff council, by written agreement between the employer

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 4 from 12 -

and the employee. If, on the basis of such a collective agreement, deviating provisions can be made in a works or service agreement, use may also be made of this in establishments of an employer not bound by the collective agreement. A collective agreement concluded in accordance with subsection 2 no. 4. deviating collective bargaining agreement shall apply between employers and employees who are not bound by collective bargaining agreements if the application of the collective bargaining provisions applicable to the public service has been agreed between them and the employers predominantly cover the costs of the operation with grants within the meaning of budgetary law.

(4) The churches and the religious societies under public law may provide for the deviations referred to in paragraphs 1, 2 or 2a in their regulations.

(5) In an area in which regulations are not normally made by collective agreement, exceptions within the scope of paragraphs 1, 2 or 2a may be granted by the supervisory authority if this is necessary for operational reasons and the health of the employees is not endangered.

(6) The Federal Government may, by ordinance with the consent of the Bundesrat, permit exceptions within the scope of paragraph 1 or 2, provided that this is necessary for operational reasons and the health of employees is not endangered.

(7) On the basis of a regulation pursuant to subsection 2a or subsections 3 to 5, in each case in conjunction with subsection 2a, the working time may only be extended if the employee has consented in writing. The employee may revoke the consent in writing with six months' notice. The employer may not discriminate against an employee because the employee has not declared his consent to the extension of working hours or has revoked such consent.

(8) If regulations pursuant to paragraph 1 nos. 1 and 4, paragraph 2 nos. 2 to 4 or such regulations are approved on the basis of paragraphs 3 and 4, the working time may not exceed 48 hours per week on average over twelve calendar months. If the approval is based on paragraph 5, the working time shall not exceed 48 hours per week on average over six calendar months or 24 weeks.

(9) If the working day is extended beyond twelve hours, a rest period of at least eleven hours must be granted immediately following the end of the working time.

§ 8 Dangerous work

The Federal Government may, by ordinance and with the consent of the Bundesrat, restrict working hours beyond Section 3 for individual areas of employment, for certain types of work or for certain groups of employees where particular hazards to the health of employees are to be expected, extend the rest breaks and rest periods beyond §§ 4 and 5, extend the regulations for the protection of night and shift workers in § 6 and limit the possibilities for deviation in accordance with § 7, insofar as this is necessary to protect the health of the employees. Sentence 1 shall not apply to areas of employment and work in establishments subject to mining supervision.

Third section Sunday and holiday rest § 9 Sunday and holiday rest

(1) Employees may not be employed on Sundays and public holidays from 0 a.m. to midnight.

(2) In multi-shift operations with regular day and night shifts, the start or end of the Sunday and holiday rest period may be brought forward or back by up to six hours if operations are suspended for the 24 hours following the start of the rest period.

(3) For drivers and co-drivers, the start of the 24-hour Sunday and holiday rest period may be brought forward by up to two hours.

§ 10 Sunday and holiday employment

(1) If the work cannot be performed on working days, employees may be employed on Sundays and public holidays in derogation of § 9 1. in emergency and rescue services as well as in the fire department,

Annex_3_BMWK_PublicParticipation_Deadline_KSG_GER_Redacted.pdf

Annex_4_EC-1367-2006_Art9_ENG.pdf

�����5�����ৎ�(1�ৎ������������ৎ���������ৎ���

7,7/(� ,,,

PUBLIC PARTICIPATION CONCERNING PLANS AND PROGRAMMES RELATING TO THE ENVIRONMENT

Article 9�

�� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� SURYLGH�� WKURXJK� DSSURSULDWH�SUDFWLFDO�DQG�RU� RWKHU� SURYLVLRQV�� HDUO\� DQG� HIIHFWLYH�RSSRUት WXQLWLHV�IRU�WKH�SXEOLF� WR�SDUWLFLSDWH�GXULQJ� WKH�SUHSDUDWLRQ��PRGLILFDWLRQ� RU� UHYLHZ�RI�SODQV�RU�SURJUDPPHV� UHODWLQJ� WR� WKH�HQYLURQPHQW�ZKHQ� DOO� RSWLRQV� DUH� VWLOO� RSHQ�� ,Q� SDUWLFXODU�� ZKHUH� WKH� &RPPLVVLRQ� SUHSDUHV� D� SURSRVDO� IRU� VXFK� D� SODQ� RU� SURJUDPPH� ZKLFK� LV� VXEPLWWHG� WR� RWKHU ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� RU� ERGLHV� IRU� GHFLVLRQ�� LW� VKDOO� SURYLGH� IRU� SXEOLF� SDUWLFLSDWLRQ� DW� WKDW� SUHSDUDWRU\� VWDJH��

�� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� LGHQWLI\� WKH� SXEOLF� DIIHFWHG� RU� OLNHO\� WR� EH� DIIHFWHG� E\�� RU� KDYLQJ� DQ� LQWHUHVW� LQ�� D� SODQ� RU� SURJUDPPH� RI� WKH� W\SH� UHIHUUHG� WR� LQ� SDUDJUDSK� ��� WDNLQJ� LQWR� DFFRXQW� WKH� REMHFWLYHV� RI� WKLV� 5HJXODWLRQ��

�� ॼM1� 8QLRQ�୑�LQVWLWXWLRQV�DQG�ERGLHV�VKDOO�HQVXUH�WKDW�WKH�SXEOLF� UHIHUUHG� WR� LQ� SDUDJUDSK� �� LV� LQIRUPHG�� ZKHWKHU� E\� SXEOLF� QRWLFHV� RU� RWKHU� DSSURSULDWH�PHDQV�� VXFK� DV� HOHFWURQLF�PHGLD� ZKHUH� DYDLODEOH�� RI��

�D�� WKH� GUDIW� SURSRVDO�� ZKHUH� DYDLODEOH��

�E�� WKH�HQYLURQPHQWDO� LQIRUPDWLRQ�RU�DVVHVVPHQW� UHOHYDQW� WR� WKH�SODQ�RU� SURJUDPPH� XQGHU� SUHSDUDWLRQ��ZKHUH� DYDLODEOH�� DQG�

�F�� SUDFWLFDO� DUUDQJHPHQWV� IRU� SDUWLFLSDWLRQ�� LQFOXGLQJ��

�L�� WKH� DGPLQLVWUDWLYH� HQWLW\� IURP� ZKLFK� WKH� UHOHYDQW� LQIRUPDWLRQ� PD\� EH� REWDLQHG��

�LL�� WKH� DGPLQLVWUDWLYH� HQWLW\� WR� ZKLFK� FRPPHQWV�� RSLQLRQV� RU� TXHVWLRQV�PD\� EH� VXEPLWWHG�� DQG�

�LLL�� UHDVRQDEOH� WLPH�IUDPHV� DOORZLQJ� VXIILFLHQW� WLPH� IRU� WKH� SXEOLF� WR�EH� LQIRUPHG�DQG� WR�SUHSDUH�DQG�SDUWLFLSDWH�HIIHFWLYHO\�LQ� WKH� HQYLURQPHQWDO� GHFLVLRQ�PDNLQJ� SURFHVV��

��� $� WLPH� OLPLW� RI� DW� OHDVW� HLJKW� ZHHNV� VKDOO� EH� VHW� IRU� UHFHLYLQJ� FRPPHQWV�� :KHUH� PHHWLQJV� RU� KHDULQJV� DUH� RUJDQLVHG�� SULRU� QRWLFH� RI� DW� OHDVW� IRXU� ZHHNV� VKDOO� EH� JLYHQ�� 7LPH� OLPLWV� PD\� EH� VKRUWHQHG� LQ� XUJHQW� FDVHV� RU� ZKHUH� WKH� SXEOLF� KDV� DOUHDG\� KDG� WKH� RSSRUWXQLW\� WR� FRPPHQW� RQ� WKH� SODQ� RU� SURJUDPPH� LQ� TXHVWLRQ��

��� ,Q� WDNLQJ� D� GHFLVLRQ� RQ� D� SODQ� RU� SURJUDPPH� UHODWLQJ� WR� WKH� HQYLURQPHQW� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� WDNH� GXH� DFFRXQW� RI� WKH� RXWFRPH� RI� WKH� SXEOLF� SDUWLFLSDWLRQ� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� LQIRUP� WKH� SXEOLF� RI� WKDW� SODQ� RU� SURJUDPPH�� LQFOXGLQJ� LWV� WH[W�� DQG� RI� WKH� UHDVRQV� DQG� FRQVLGHUDWLRQV� XSRQ� ZKLFK� WKH� GHFLVLRQ� LV� EDVHG�� LQFOXGLQJ� LQIRUPDWLRQ� RQ� SXEOLF� SDUWLFLSDWLRQ�

ॽB

Annex_5_DUH_Statement_KSG_GER_ENG_Redacted.pdf

Seite - 2 - der Stellungnahme zum Entwurf des KSG

Eine tiefere juristische Prüfung ist deshalb nicht notwendig. Schon die grundsätzliche Logik und Aus- richtung der Änderungen führt zu Verstößen gegen Artikel 20a GG in Verbindung mit dem Pariser Klimaschutzabkommen.

Die Deutsche Umwelthilfe fordert deshalb das Bundeskabinett sowie die Abgeordneten des Deut- schen Bundestags auf, diese Gesetzesnovellierung grundsätzlich abzulehnen.

Mit einer Veröffentlichung dieser Stellungnahme erklären wir uns einverstanden.

DUH e.V. is recognized as a non-profit organization. The annual financial statements are subject to voluntary auditing by an independent auditing firm.

Page - 2 - of the statement on the draft KSG.

A deeper legal examination is therefore not necessary. The basic logic and direction of the amendments alone lead to violations of Article 20a of the German Basic Law in conjunction with the Paris Climate Agreement.

Deutsche Umwelthilfe therefore calls on the Federal Cabinet and the members of the German Bundestag to reject this amendment in principle.

We agree to the publication of this statement.

Annex_6_BMUV_Email_GER_ENG_Redacted.pdf

Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection

Page 2

Regarding your question under point 1: The BMUV involves the Länder, associations and other bodies in all draft environmental legislation - and so also in the last 5 years (10.5.2017 to 10.5.2022) (§§ 47ff. GGO). As a rule, a processing period of four weeks is granted, which can be shortened to two to three weeks, taking into account the scope and complexity of the project. A further shortening of the participation period is possible if there are special reasons for faster processing in individual cases.

Regarding your question under point 3: In all participation procedures, the BMU(V) has involved the business associations concerned (d) and the recognized environmental associations (c). The general public (a) as well as individuals or legal entities (b; e) are not regularly involved. However, the draft references are published on the BMUV website so that all citizens and companies have the opportunity to comment on the draft. Publication on the Internet is only possible if all other departments and the Federal Chancellery agree. See also Annex 1 "House memo: "Publication of draft legislation and external Comments on the BMU website".

Regarding your question under point 4: The comments received are reviewed by the relevant department. Whether to take up the suggestions contained in a comment is a decision made on a case-by-case basis after careful consideration of all the feedback on the draft bill, including the feedback from the departmental vote. There is no provision for a backward-looking process of tracking which suggestions from which comments ultimately had an influence on the government draft. However, the comments from the state and association hearings are published on the BMUV website if they can be made accessible without barriers. This gives scientists and interested citizens the opportunity to take this backward look for themselves.

Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection

Page 4

Notes on data protection: The personal data you have provided (e.g. name and address) has been or will be processed for the purpose of contacting you and dealing with your request. The legal basis for this is Article 1 6 (l) (e) of the General Data Protection Regulation in conjunction with Section 3 of the German Federal Data Protection Act. Your data will be stored in accordance with the time limits applicable to the retention of schilitgut in the Registratun Directive, which supplements the Joint Rules of Procedure of the Federal Ministries (GVO). For more information on this and on your rights as a data subject, please refer to the BMUV's data protection statement: www.bmuv.de/datenschutz.

Attachment - Annex 1: House memo: "Publication of draft bills and external

comments on the BMU website".

Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection

Page 3

Regarding your question under point 5: The adoption of the findings and recommendations of the Aarhus Convention Compliance Committee (ACCC) in the proceedings ACCC/C/2014/120 against Slovakia were finally confirmed at the 72nd session of the ACCC from 18 to 21 October 2021. This was too late to allow a decision to be taken at the 7th Conference of the Parties to the UN ECE Aarhus Convention, which was meeting in parallel. In this respect, the decision under international law on these findings and recommendations is still pending. Accordingly, no consultations have yet taken place within the member states of the European Union or within the German government. For this reason, it is not possible at present to comment on any conclusions that may be drawn for the German legislative system and the involvement of the public.

For the existing system of implementation of Article 8 of the Aarhus Convention, reference is made to the current National Implementation Report of the Aarhus Convention in Germany (2021), which is available as follows: https://www.bmuv.de/fileadmin/Daten BMU/Download PDF/Umweltin- formation/aarhus umsetzungsbericht 2021 de clean bf.pdf."

Please let me know if you feel that your request has not been met. Should you require further information on the procedure or any other questions, please do not hesitate to contact me.

Remedies An appeal against this decision may be lodged within one month of notification. The appeal must be lodged with the Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection,

Yours sincerely on behalf of

gez. Dr. Jan Schärlau

§ Federal Ministry I for the Environment, Nature

Conservation, Building and Nuclear Safety

Special House Notice Unit Z 1 4Bonn , June 22, 2017

Publication of draft bills and external comments on the BMUB website.

Transparent legislative processes are essential for the acceptance of political decisions. As an open and dialog-oriented ministry, our ministry also serves as a role model when it comes to transparency. For this reason, all draft bills of the BMUB will be published on our website in the future. The same applies to the comments on these draft laws, i.e. those of the Länder, the municipal umbrella organizations, the expert groups and associations involved, as well as other bodies or other persons.

In the future, the following should therefore be observed when working on drafts for laws and legal ordinances:

a. In future, draft bills and ordinances are to be published on the BMUB website when the consultation process with the Länder and associations is initiated. In each individual case, the relevant department shall expressly inform the State Secretary responsible of the intended publication when submitting the bill for the initiation of departmental consultation.

b. As part of the initiation of departmental coordination, the department shall involve the departments concerned in writing in accordance with Section 48 (3) GGO in order to reach agreement with them on the publication of the draft bill and shall obtain the agreement of the BK Office.

c. The intended publication must be indicated in the letter sent to the countries and associations for consultation by means of the following text block:

"Please note that the comments you submit will generally be published on our website. This also includes names and other personal data contained in the document. By sending the statement, you agree that the

2

personal data contained in the statement will be published. We ask you to remove from the document any information that you do not agree to be published.

If you object to the publication on the Internet as a whole, the ministry page will le- dically note that a statement was submitted and who wrote it. Please send us electronically readable documents, if possible as barrier-free PDF documents and as Word files, so that barrier-free access to the documents can be made possible. With the submission, you grant the BMUB the rights of use for any graphics, images, maps and similar material included for publication on the BMUB website for an unlimited period of time."

d. Simultaneously with the initiation of the Länder and association hearings, the specialist unit sends the draft bill in PDF and Word format as well as a brief description of the content and classification of the legislative process to the ÖA unit for publication and informs it of the comment period.

When documents intended for publication are sent to the Public Relations Department, care must be taken to ensure that they are barrier-free. If required, the Public Relations Department will prepare barrier-free documents. For this purpose, all documents must also be submitted in Word format. If the documents contain graphics, images, maps and similar material, proof of the right to use these for publication on the BMUB website for an unlimited period of time must be provided. All documents must be submitted in electronically readable form (scanned versions in particular are not suitable).

e. After the deadline for comments has expired, the department electronically sends the comments approved for publication in PDF format to the Public Affairs department and encloses a list of all parties who have submitted comments. The file names of the comments must allow clear assignment to the list of participants.

f. After the government draft has been passed, the BMUB website refers to the Documentation and Information System of the German Bundestag (DIP) for the further parliamentary procedure.

gez. Flasbarth

Annex_7_PowerOfAttorney_Klinger_GER_ENG_Redacted.pdf

Vollmacht Den Rechtsanwälten und Rechtsanwältinnen Dr. Reiner Geulen, Prof. Dr. Remo Klinger & Dr. Caroline Douhaire LL.M., Karoline Borwieck, David Krebs und Lukas Rhiel wird hiermit in Sachen Beschwerde Aarhus Compliance Comitee wegen Beteiligungsrechten nach Art. 8 der Konvention Vollmacht erteilt 1. zur Prozessführung (u.a. nach §§ 81 ff. ZPO) einschließlich der Befugnis zur Er-

hebung und Zurücknahme von Widerklagen; 2. zur Vertretung in sonstigen Verfahren (insbesondere vor den Verwaltungsbehör-

den) und bei außergerichtlichen Verhandlungen aller Art; 3. zur Vertretung und Verteidigung in Strafsachen und Bußgeldsachen (§§ 302, 374

StPO) einschließlich der Vorverfahren sowie (für den Fall der Abwesenheit) zur Vertretung nach § 411 II StPO und mit ausdrücklicher Ermächtigung auch nach §§ 233 I, 234 StPO, zur Stellung von Straf- und anderen nach der Strafprozessord- nung zulässigen Anträgen und von Anträgen nach dem Gesetz über die Entschä- digung für Strafverfolgungsmaßnahmen;

4. zur Begründung und Aufhebung von Vertragsverhältnissen und zur Abgabe und Entgegennahme von einseitigen Willenserklärungen (z. B. Kündigungen) in Zu- VDPPHQKDQJ PLW GHU REHQ XQWHU ÄZHJHQ���³ JHQDQQWHQ $QJHOHJHQKHLW;

Die Vollmacht gilt für alle Instanzen und erstreckt sich auch auf Neben-, Folge- und Vorverfahren aller Art (z. B. Widerspruchsverfahren, Erörterungsterminen im Planfest- stellungsverfahren, einstweiliger Rechtsschutz in Verwaltungsverfahren, Arrest und einstweilige Verfügung, Kostenfestsetzungs-, Zwangsvollstreckungs-, Interventions-, Zwangsversteigerungs-, Zwangsverwaltungs- und Hinterlegungsverfahren sowie Insol- venzverfahren). Sie gilt auch rückwirkend für alle vorgenommenen Verfahrenshandlun- gen, einschließlich der Klageerhebung. Sie umfasst insbesondere die Befugnis, Zustellungen zu bewirken und entgegenzu- nehmen, die Vollmacht ganz oder teilweise auf andere zu übertragen (Untervollmacht), Rechtsmittel einzulegen, zurückzunehmen oder auf sie zu verzichten, den Rechtsstreit oder außergerichtliche Verhandlungen durch Vergleich, Verzicht oder Anerkenntnis zu erledigen, Geld, Wertsachen und Urkunden, insbesondere auch den Streitgegenstand und die von dem Gegner, von der Justizkasse oder von sonstigen Stellen zu erstatten- den Beträge entgegenzunehmen sowie Akteneinsicht zu nehmen.

Berlin, 06.07.2023 Jürgen Resch .............................................. ......................................................... (Datum) (Unterschrift und Name)

«««««««««««««««««««««««««««««««««««. (Adresse)

Power of attorney

The Lawyers and attorneys at law Dr. Reiner Geulen, Prof. Dr. Remo Klinger & Dr. Caroline Douhaire LL.M., Karoline Borwieck, David Krebs and Lukas Rhiel

is hereby appointed in the matter of

Complaint Aarhus Compliance Committee because of participation rights according to Art. 8 of the Convention

Power of attorney granted

1. to conduct legal proceedings (inter alia pursuant to Sections 81 et seq. of the German Code of Civil Procedure), including the power to raise and withdraw counterclaims;

2. for representation in other proceedings (especially before administrative authorities) and in out-of-court negotiations of any kind;

3. for representation and defense in criminal cases and cases involving fines (Sections 302, 374 of the Code of Criminal Procedure), including preliminary proceedings, as well as (in the event of absence) for representation pursuant to Section 411 II of the Code of Criminal Procedure and, with express authorization, also pursuant to Sections 233 I, 234 of the Code of Criminal Procedure, for filing criminal and other motions admissible under the Code of Criminal Procedure and motions pursuant to the Act on Compensation for Criminal Prosecution Measures;

4. to establish and terminate contractual relationships and to issue and receive unilateral declarations of intent (e.g. notices of termination) in connection with the matter mentioned above under "due to...". above;

The power of attorney shall apply to all instances and shall also extend to ancillary, subsequent and preliminary proceedings of all kinds (e.g. opposition proceedings, hearings in plan approval proceedings, interim relief in administrative proceedings, attachment and temporary injunction, cost assessment, compulsory execution, intervention, forced sale, forced administration and deposit proceedings as well as insolvency proceedings). It also applies retroactively to all procedural actions taken, including the filing of a lawsuit.

It includes, in particular, the authority to effect and receive service, to transfer the power of attorney in whole or in part to others (sub-power of attorney), to file, withdraw or waive appeals, to settle the legal dispute or out-of-court negotiations by way of settlement, waiver or acknowledgement, to accept money, valuables and documents, in particular also the subject matter of the dispute and the amounts to be reimbursed by the opposing party, the court cashier or other bodies, and to inspect files.

Berlin, 06.07. 2023Jürgen Resch .............................................. ......................................................... (date) (signature and name)

.......................................................................................................... (address)

Subscribe to DeepL Pro to translate larger documents. Visit www.DeepL.com/pro for more information.

PRE/ACCC/2023/203 Germany

Languages and translations
English

From: Klinger, Prof. Dr. Remo Sent: Monday, July 10, 2023 11:23 AM To: ECE-Aarhus-Compliance <[email protected]> Cc: 'Jürgen Resch extern ; 'Claudia Wesemann | DUH 'Sascha Müller-Kraenner | DUH ; 'Barbara Metz | DUH'

; 'Matthias Walter | DUH' ; 'Dorothee Saar | DUH' Subject: New Communication Dear Sir or Madam, I am enclosing a new complaint that we are filing on behalf of our client, Deutsche Umwelthilfe e.V., directed against the Federal Republic of Germany. We will send the signed original document by mail. Yours sincerely, Professor Dr. Remo Klinger _____________________________ GEULEN & KLINGER Rechtsanwälte Professor Dr. Remo Klinger

Tel.

www.geulenklinger.com

English

GEULEN & KLINGER

Rechtsanwälte

Commerzbank AG in Berlin

IBAN BIC USt-ID Nr.

*Partner der Sozietät

Communication to the Aarhus Convention‘s

Compliance Committee

I. Information on correspondent submitting the communication

1. This communication is submitted by:

Full name of submitting organisation or person(s):

Permanent address:

Telephone:

Fax:

Email:

Deutsche Umwelthilfe e.V.

Contact person authorised to represent the organisation in connection with the commu-

nication:

Name:

Title:

Jürgen Resch

Executive Director

Represented by:

Name:

Title:

Email:

Telephone:

Fax:

Prof. Dr. Remo Klinger

Lawyer

July 10, 2023

Secretary to the Aarhus Convention Compliance Committee

United Nations Economic Commission for Europe

Environment Division

Palais des Nations

CH-1211 Geneva 10

Switzerland

Dr. Reiner Geulen*

Prof. Dr. Remo Klinger*

Dr. Caroline Douhaire LL.M.

Karoline Borwieck

David Krebs

Lukas Rhiel

Telefon

Telefax

E-Mail

www.geulenklinger.com

2

II. Party concerned

2. Federal Republic of Germany.

III. Facts of the communication

1. Reasons that lead to this communication

3. The communication is targeted at the procedure of association participation in the

context of the draft amendment for the Climate Protection Act (Klimaschutzgesetz –

KSG)1 which did not allow for effective public participation and therefore violated ar-

ticle 8 of the Aarhus Convention (AC).

4. The KSG is the main framework legislation mapping out Germany’s route to climate

neutrality. The current German government is planning to amend this Act. In its cur-

rent form, as one of its main operational instruments, the KSG sets yearly emissions

targets for different sectors (so-called sectoral targets).2 If one of the sectors fails to

meet its yearly emissions target, it is obliged to develop an emergency programme

(Sofortprogramm) within three months.3 This is the main tool of the KSG to ensure

that Germany reaches its total greenhouse gas reduction targets as constitutionally

required.4 These emergency programmes are planned to be abolished by the current

German government, which many environmental associations regard as a clear de-

terioration of the German climate protection law.

5. There is no urgent need for this amendment legislation with a functional and consti-

tutional KSG currently in force. The political will to change the law was already in the

coalition agreement concluded almost two years ago.5 There are no urgent reasons

to do so now. Furthermore, the parliamentary procedure for amending the law will

not begin until September and according to current information, the amendment will

not come into force until the beginning of 2024.

6. The draft KSG amendment was sent out on Thursday, June 15, 2023 at 5:28 p.m. by

the responsible Federal Ministry for Economic Affairs and Climate Protection (Bun-

desministerium für Wirtschaft und Klimaschutz – BMWK) to initiate the participation

of associations pursuant to § 47 of the Joint Rules of Procedure of the Federal Min-

istries (Gemeinsame Geschäftsordnung der Bundesministerien – GGO). The com-

municant, Deutsche Umwelthilfe e.V. (DUH), was involved this way. The BMWK set

the response deadline for the associations as Monday, June 19, 2023 at 10:00 a.m.

In Germany, the Working Hours Act (Arbeitszeitgesetz – ArbZG) stipulates a five-day

1 Current KSG: https://www.gesetze-im-internet.de/ksg/. 2 Annex 1 to the KSG. 3 § 8 (1) KSG. 4 Article 20a of the Constitution (Grundgesetz – GG); see: Decision by the German Constitutional Court (Bundesverfassungsgericht – BVerfG) of March 24 2021 – 1 BvR 2656/18. 5 Coalition Agreement “Mehr Fortschritt wagen”, 2021, p. 55.

3

week with a maximum of 8 hours of working time per working day.6 Consequently,

the deadline set by the BMWK was one of one working day (Friday) and a maximum

of two hours (Monday 8:00 a.m. to 10:00 a.m.).

7. As the sole reason for this short comment period, the BMWK stated that this was

“politically imposed.”

8. Factually, the government had put the draft KSG amendment on the agenda for the

cabinet meeting on Wednesday, June 21, 2023.7 As outlined above, there was, how-

ever, no specific reason for the government to do this. Cabinet meetings happen

weekly. Parliament will only put the KSG amendment on their agenda after its sum-

mer break in September.8

9. In terms of content, the draft KSG amendment changes the operational tools of the

KSG comprehensively. The sectoral emergency programmes are drafted to be elim-

inated. Policy adjustments due to missed emissions targets are to be required after

two years and for all sectors collectively – as opposed to after three months and for

each specific sector.

10. The draft KSG amendment is complex and consists of numerous clauses that are

open to interpretation.9

11. Due to this extremely short comment period, DUH was unable to provide compre-

hensive and legal feedback. The feedback DUH was able to provide was one of for-

mulaic and purely general political nature in very short form. An actual legal state-

ment, which also addresses errors, or at least misleading formulations in the draft

law, was not feasible in the short time available. Effective participation was thus im-

possible. Therefore, the German government violated its obligation to ensure effec-

tive public participation by fixing sufficient time-frames under article 8 (a) of the Aar-

hus Convention.

12. The legislative procedure for the Climate Protection Act is just one example of the

significantly too short comment periods set by the German government. The same

applies to a number of other legislative projects. One example is the recent amend-

ment to the environmental provisions of the Road Traffic Act. The documents for the

hearing were sent on June 15, 2023 at 1:00 p.m., and the deadline for comments

was June 16, 2023 at 3:00 p.m., which was only a few hours. Again, this bill did not

contain any regulations with special urgency.

13. The short deadlines vary depending on the ministry responsible. Draft laws from the

ministry originally responsible for the environment sometimes have longer deadlines.

6 §§ 3, 9 ArbZG. 7 https://www.bundesregierung.de/breg-de/bundesregierung/bundeskanzleramt/kabinettssitzun- gen/bundeskabinett-ergebnisse-2197550. 8 https://www.bundestag.de/tagesordnung. 9 Compare the press statement of Minister Habeck: https://www.bmwk.de/Redaktion/DE/Vid- eos/2023/06/230614-pressestatement/video.html.

4

However, the Federal Ministry for the Economy and Climate is responsible for the

climate protection law relevant here. For the Road Traffic Act mentioned as an ex-

ample, the Federal Ministry responsible for transport.

2. Legal and procedural background in Germany

a. Legislative process

14. In Germany, laws are adopted by the parliament. They can be introduced either by

the “midst of parliament,” the Federal Council (Bundesrat) or the government.10 In

the case of governmental introduction, the government firstly submits the draft to the

Bundesrat and after receiving its comments it is submitted to parliament. Ordinarily,

parliament refers the draft to the respective Committee and adopts it after three read-

ings.11

15. The government initiates this legislative process after it has agreed upon its draft

legislation amongst its cabinet. Only after this cabinet meeting, the draft legislation

will be sent to the Bundesrat and parliament.

16. By not acting to put the draft KSG amendment onto the parliamentary agenda before

its summer break, the government underlined that the KSG amendment is not cur-

rently considered a priority and, in particular, that its adoption is not time-critical.

b. Procedural practice

17. With regards to public participation, the Joint Rule of Procedure of the Federal Min-

istries (Gemeinsame Geschäftsordnung der Bundesministerien – GGO) lays out the

ground rules. The GGO stipulates in § 47:

„§ 47 Beteiligung von Ländern, kommunalen Spitzenverbänden, Fachkreisen und

Verbänden

(1) Der Entwurf einer Gesetzesvorlage ist Ländern, kommunalen Spitzenverbänden und

den Vertretungen der Länder beim Bund möglichst frühzeitig zuzuleiten, wenn ihre

Belange berührt sind. Ist in wesentlichen Punkten mit der abweichenden Meinung

eines beteiligten Bundesministeriums zu rechnen, hat die Zuleitung nur im Einverneh-

men mit diesem zu erfolgen. Soll das Vorhaben vertraulich behandelt werden, ist dies

zu vermerken.

(2) Das Bundeskanzleramt ist über die Beteiligung zu unterrichten. Bei Gesetzentwürfen

von besonderer politischer Bedeutung muss seine Zustimmung eingeholt werden.

(3) Für eine rechtzeitige Beteiligung von Zentral- und Gesamtverbänden sowie von Fach-

kreisen, die auf Bundesebene bestehen, gelten die Absätze 1 und 2 entsprechend.

Zeitpunkt, Umfang und Auswahl bleiben, soweit keine Sondervorschriften bestehen,

dem Ermessen des federführenden Bundesministeriums überlassen. Die Beteiligung

10 Article 76 para 1 GG. 11 §§ 75 ff. of the Rules of Procedure of the German Bundestag (Geschäftsordnung des Deutschen Bundestag – GOBT).

5

nach Absatz 1 soll der Beteiligung nach diesem Absatz und der Unterrichtung nach §

48 Absatz 1 vorangehen.

(4) Bei der Beteiligung nach den Absätzen 1 und 3 ist ausdrücklich darauf hinzuweisen,

dass es sich um einen Gesetzentwurf handelt, der von der Bundesregierung noch

nicht beschlossen worden ist. Dem Gesetzentwurf können die Begründung und das

Vorblatt beigefügt werden.

(5) Wird zu einer Gesetzesvorlage eine mündliche Anhörung durchgeführt, sind hierzu

die kommunalen Spitzenverbände einzuladen, wenn ihre Belange berührt sind. Die-

sen soll bei der Anhörung vor den Zentral- und Gesamtverbänden sowie den Fach-

kreisen das Wort gewährt werden.“

English translation (by Deepl.com):

“§ 47 Consultation of Länder, municipal umbrella organisations, expert groups

and associations

(1) Draft legislation shall be submitted to the Länder, the central associations of

the local authorities and the Länder representations to the Federation as

early as possible if their interests are affected. If the opinion of a participating

Federal Ministry is likely to differ on essential points, the legislation shall be

forwarded only in agreement with that Ministry. If the project is to be treated

confidentially, this shall be noted.

(2) The Federal Chancellery shall be informed of the participation. In the case of

draft legislation of particular political importance, its consent must be ob-

tained.

(3) Paragraphs 1 and 2 shall apply correspondingly to the timely participation of

central and general associations as well as of expert groups existing at fed-

eral level. The timing, scope and selection shall be left to the discretion of the

lead Federal Ministry, unless special provisions exist. Participation under par-

agraph 1 shall precede participation under this paragraph and information

under section 48(1).

(4) When participating in accordance with subsections (1) and (3), explicit refer-

ence shall be made to the fact that the bill in question has not yet been

passed by the Federal Government. The bill may be accompanied by the

explanatory memorandum and the preliminary sheet.

(5) If an oral hearing is held on a bill, the municipal umbrella organisations shall

be invited to attend if their interests are affected. They shall be granted the

right to speak at the hearing before the central and general associations as

well as the expert groups.”

18. Hence, § 47 (3) GGO determines that draft legislation is to be submitted to relevant

central and general associations as well as expert groups as early as possible. Ac-

cordingly, this public participation takes place after the participation of the Länder and

their representations and associations but before the cabinet meeting.12

12 See § 51 GGO.

6

19. In response to a question about how the Federal Ministry for the Environment, Nature

Conservation, Nuclear Safety and Consumer Protection (Bundesministerium für Um-

welt, Naturschutz, nukleare Sicherheit und Verbraucherschutz – BMUV) practices

public participation, the BMUV stated in 2022 that, as a rule, a comment period of 4

weeks is provided for public participation regarding draft legislation on environmental

laws. Considering the scope and complexity of the respective draft legislation, this

may be shortened to three or two weeks according to the BMUV. An additional cur-

tailment of the participation period is possible as per the BMUV if there are excep-

tional reasons for faster processing in individual cases. This exemption clause cor-

respondents to article 9 (4) of EU Regulation (EC) No 1367/2006 which implements

the Aarhus Convention.

20. Consequently, although there is no strict timely requirement, both the EU as well as

the German institutions take the stance that effective public participation can only be

ensured if the comment period covers several weeks. Furthermore, in cases of re-

duced comment periods, both institutions maintain that this should only be possible

in individual cases and if exceptional reasons exist.

3. Public participation for the draft KSG amendment

21. As already presented above, the comment period given to associations with regards

to the draft KSG amendment was of one working day plus two hours.

22. The sole reason given by the responsible BMWK was that this short time period was

“politically imposed.” What this means in concrete terms was not explained.

23. The only possible justification that the communicant was able to identify was that the

government had placed this draft on the agenda of the next cabinet meeting. How-

ever, this does not imply that it was politically necessary to accelerate the procedure.

The actual legislative procedure does not begin until the draft legislation is introduced

into the Bundestag. There, however, this draft will not be on the agenda until after

the summer break in September. The period between sending the draft KSG amend-

ment to the associations and the next possible Bundestag session, where this draft

could be on the agenda, is therefore two full months with weekly cabinet meetings.

Furthermore, as detailed above, the government has planned to amend the KSG

since it formed its coalition almost two years ago, there is no particular reason why it

is adopting – and rushing associations in the process – now.

24. Moreover, there is currently a functioning KSG in place which further undermines any

alleged urgency.

4. Infringement of AC article 8

25. The Aarhus Convention maintains that public participation in decision-making is not

a matter of good practice but one of urgency with ever more complex subject matters.

Decision-making requires accurate, comprehensive and up-to-date information

which the public can be a major source of. Effective public participation therefore not

7

only provides for the people to enjoy their rights but also improves the ability of au-

thorities to carry out their responsibilities. For the public to become an effective and

useful part of the decision-making of public authorities, participation requires an

open, regular, and transparent process.13

26. The AC emphasizes that “the public input should be capable of having a tangible

influence on the actual content of the decision. When such influence can be seen in

the final decision, it is evident that the public authority has taken due account of public

input.”14

27. Hence for public participation to be effective, it should be conducted in a way that

allows for the public authorities to be influenced by the public input.

28. Article 8 of the AC standardises the procedure with regards to the preparation of laws

and rules with potential environmental impact.

29. The public participation procedure regarding the draft KSG amendment directly in-

fringes AC article 8 (a) which maintains that time-frames for public participation

should be “sufficient for effective participation.”

a. Applicability of article 8 to the preparation of legislation

30. AC article 8 refers to public participation “during the preparation by public authorities

of executive regulations and/or other generally applicable legally binding rules that

may have a significant effect on the environment.”

31. In 2021, the civic association VIA IURIS brought a communication against Slovakia

before the Aarhus Convention Complaint Committee (from here on: Committee).

There, the Committee found that “preparation of legislation by executive bodies to be

adopted by national parliaments” are included in the provision of Art. 8 AC.15 Accord-

ing to the Committee, “nothing in the title or text of article 8 of the Convention [sug-

gests] that it does not include the preparation of legislation by executive bodies to be

adopted by national parliaments. On the contrary, although the terms “legislation”

and “laws” do not appear in the provision, the wording of article 8 and the ordinary

meaning given to its terms nevertheless support the inclusion of legislation and other

normative instruments of a similar character.”16 The Committee understands the text

of article 8 as a “generic expression intended to cover different kinds of generally

applicable legally binding normative instruments, which may be referred to in different

ways in different jurisdictions.”17 Furthermore, since AC article 8 specifically adds the

term of “generally applicable legally binding normative instruments” and does not limit

13 Implementation Guide, p. 85. 14 Implementation Guide, p. 86. 15 ECE/MP.PP/C.1/2021/19, para. 95 ff. 16 ECE/MP.PP/C.1/2021/19, para. 95. 17 ECE/MP.PP/C.1/2021/19, para. 96.

8

its scope to “executive regulations”, it is, therefore, also applicable to regulations

other than those by the executive branch.18

32. Already in its findings in another case, the Committee had held that article 8 relates

“to any normative acts.”19

33. While AC article 8 is thus applicable to legislation, the Convention is more restrictive

with regards to the term “public authority”. According to AC article 2 (2) this does not

include bodies or institutions acting in a legislative capacity.

34. However, the Committee understands this limitation as strict and precise, meaning

that “it only covers activities by the body or institution with the capacity and power to

adopt the legislation.”20 This strict understanding is also in line with applying article 2

(2) as uniformly as possible to the Parties as their legislative processes likely differ.21

Otherwise, Parties could attempt to exclude the application of the AC by expanding

their preparatory processes with several public authorities involved and with no trans-

parency or public participation.22 Such “comprehensive preparatory procedures are

perfectly in line with the Convention,” however, “they must not be used to exclude

opportunities for members of the public […] to participate.” 23

35. Therefore, public authorities, including governments, “do not act in a legislative ca-

pacity when engaged in preparing laws until the draft or proposal is submitted to the

body or institution that adopts the legislation.” 24

36. The BMWK, a German governmental body, sent out the draft KSG amendment be-

fore it was submitted to the Bundestag (as detailed above). Hence, it acted within its

executive capacity concerning a “generally applicable legally binding rule.” The public

participation in question is, therefore, within the scope of article 8.

b. Violation of article 8 (a)

37. AC article 8 (a) states that Parties should take steps to establish “time-frames suffi-

cient for effective participation.” The Implementation Guide explains that the ele-

ments regulated in article 8 set forth a “basic procedural framework.”25

38. While article 8 (a) does not set specific time-frames, “the Convention states that the

authorities should plan for public participation by fixing their own schedule that is

“sufficient” for effective participation.”26 This is illustrated in the Implementation Guide

18 ECE/MP.PP/C.1/2021/19, para. 97. 19 ECE/MP.PP/C.1/2011/6/Add.1, para. 61. 20 ECE/MP.PP/C.1/2021/19, para. 99. 21 ECE/MP.PP/C.1/2021/19, para. 99. 22 ECE/MP.PP/C.1/2021/19, para. 99. 23 ECE/MP.PP/C.1/2021/19, para. 99. 24 ECE/MP.PP/C.1/2021/19, para. 101. 25 Implementation Guide, p. 185. 26 Implementation Guide, p. 121.

9

with a legislation that it finds to be a typical example of this requirement: The Hun-

garian Act XI of 1987 on Legislation provides that when drafting comment deadlines,

the following four factors are to be taken into account: (1) the person giving the opin-

ion should have the opportunity to form a well-based opinion; (2) the opinion must be

able to be taken into consideration in the drafting; (3) the size of the draft; and (4) the

type of organization giving the opinion.27 This reiterates the need for time-frames that

allow for effective participation and underlines that effective participation is made im-

possible if the deadline does not allow for comprehensive examination.

39. Therefore, it can be deduced that the public authorities have a duty of care to guar-

antee effective participation.28

40. Truly effective participation is only possible in a deliberative process.29 Deliberation

fosters a collaborative decision-making process which allows “the reconciliation of

strong democracy and demanding environmentalism.”30 This is in line with the pur-

pose of the Convention to guarantee a comprehensive decision-making process re-

garding environmental matters.

41. Finally, the last sentence of article 8 requires the result of the public participation to

be considered as far as possible. In this regard, the Implementation Guide clarifies

that “this provision establishes a relatively high burden of proof for public authorities

to demonstrate that they have taken into account public comments in processes un-

der article 8.”31

42. While it is difficult to gauge the effectiveness of public participation, it is easier to

assess ineffectiveness. It is undoubtedly ineffective if the outcome of the participation

process appears to be a “foregone conclusion.” Therefore, this “closed mind” is “in

principle unlawful under the Aarhus Convention.”32

43. It is thus on the public authorities in question to prove their open mind and demon-

strate their consideration of the results of the public participation procedure.

44. In the case of this communication, the time given for public participation was one day

and two hours. A comprehensive examination of the draft by the public plus the writ-

ing of an extensive statement within one day and two hours is impossible. A state-

ment drafted within this period can be general and broad at best. In addition, the time

the government took to consider these statements was two days. A comprehensive

examination of the submitted statements and an incorporation of these proposals

27 Implementation Guide 121. 28 ECE/MP.PP/C.1/2021/19, para. 103; Eppiney, in: Epiney/Diezig/Pirker/Reitemeyer, Aarhus Konvention, 1. Auflage 2018, Art. 18 AK, marginal no. 4 f. 29 Barritt, The Foundations of the Aarhus Convention: Environmental Democracy, Rights and Stewardship, 2020, p. 153. 30 Barritt, The Foundations of the Aarhus Convention: Environmental Democracy, Rights and Stewardship, 2020, p. 67 f. 31 Implementation Guide, p. 185. 32 Lee, “The Aarhus Convention 1998 and the Environment Act 2021”, in: The Modern Law Review 86 (3), May 2023, p. 756 (782).

10

into the text within two days is also impossible. Hence, a time-period as given in the

process in question indicates a ”closed mind” and does not allow for effective partic-

ipation. This is especially true in the case of the amendment of a critical operational

tool within the main framework legislation for climate protection.

IV. Nature of alleged non-compliance

45. This communication concerns the specific case of non-compliance of the Federal

Government in its involvement of associations regarding the draft of the Climate Pro-

tection Act Amendment. The German government did not make sufficient efforts to

allow the public to effectively participate in the process. It thereby violated article 8

(a) of the Aarhus Convention.

V. Relevant provisions of AC

46. Article 8 (a) of the Aarhus Convention.

VI. Use of domestic remedies

47. Public participation within the German legislative process is regulated by § 47 (3)

GGO (see above). The GGO is an administrative regulation with regularly no external

impact.33 This means, that legal action is not available. To legally challenge govern-

mental action that is based on administrative regulation, the regulation in questions

needs to have developed external impact. This is the case when there is consistent

practice because of the so-called “self-binding of the administration.” Based on the

right to equal treatment of article 3 (1) GG, consistent administrative practice creates

external impact and, thus, a possible legal claim.34

48. Even though the German government claims that it regularly sets statement-dead-

lines of several weeks,35 this is not the case in practice. For years, associations have

complained about deadlines that are not allowing for comprehensive examination of

the respective draft legislation.36 There is no indication that public participation re-

garding environmental legislation are dealt with consistently.

49. Consequently, there is no external impact of § 47 (3) GGO which means that there

is no domestic remedy available to enforce the present violation of article 8 of the

Convention.

33 Epping, BeckOK GG, as of 15.05.2023, Art. 65 GG, marginal no. 19.3. 34 Kluckert, Die Selbstbindung der Verwaltung nach Art. 3 I GG, in: JuS 2019, p. 536 (537). 35 See the letter of the BMUV provided as a supporting document. 36 For example: https://www.handelsblatt.com/politik/deutschland/einwanderungsreform-bun- desregierung-gibt-verbaenden-eine-woche-mehr-zeit-fuer-stellungnahmen/28996260.html (2023), https://www.dggg.de/presse/pressemitteilungen-und-nachrichten/kurze-fristen-bei-ver- baendeanhoerungen (2022), https://www.zfk.de/politik/deutschlan/verbaende-kritisieren- kanzleramt-extrem-kurze-anhoerungs-fristen (2019).

11

VII. International remedies

50. No international remedies were invoked.

VIII. Confidentiality

51. The information of the communicant can be made transparent.

IX. Supporting documents (copies, not originals)

52. Copies of the following documents are supplied in support of the communication:

• Excerpt of the Copy of the GGO (in German and English) (paragraphs 6, 17-18,

47-49) [annex 1],

• Excerpt of the ArbZG (in German and English) (paragraph 6) [annex 2],

• Email of the BMWK to the associations with the draft KSG legislation, specifying

the comment deadline (in German and English) (paragraphs 6-7, 13, 21-23, 36,

44) [annex 3],

• Article 9 EU directive EC/1367/2006 (paragraph 19) [annex 4],,

• Statement submitted by the communicant (in German and English) (paragraphs

11, 44) [annex 5],,

• Reply of the BMUV to an UIG request from June 8, 2022 (in German and English)

(paragraphs 19, 48) [annex 6],,

• Power of attorney for legal representation (in German and English) (paragraph

1) [annex 7].

X. Signature

Prof. Dr. Remo Klinger

Legal representative

English

Annex_1_GGO_paras47-51_GER_ENG.pdf

34 Rechtsetzung

Verantwortung der Bundesministerin oder des Bundesmi- nisters für eilige Vorhaben ihres oder seines Geschäftsbe- reichs wird hierdurch nicht berührt.

§ 46 Rechtssystematische und rechtsförmliche Prüfung

(1) Bevor ein Gesetzentwurf der Bundesregierung zum Beschluss vorgelegt wird, ist er dem Bundesministerium der Justiz zur Prüfung in rechtssystematischer und rechts- förmlicher Hinsicht (Rechtsprüfung) zuzuleiten.

(2) Bei Übersendung des Entwurfs ist darauf Rücksicht zu neh- men, dass dem Bundesministerium der Justiz bei Entwür- fen größeren Umfanges genügend Zeit zur Prüfung und Erörterung von Fragen, die bei der Prüfung nach Absatz 1 anfallen, zur Verfügung stehen muss.

(3) Hat das Bundesministerium der Justiz an der Vorbereitung eines Entwurfs mitgewirkt und ihn hierbei schon der Prü- fung nach Absatz 1 unterzogen, kann mit seiner Zustim- mung von einer nochmaligen Zuleitung des Entwurfs abgesehen werden.

§ 47 Beteiligung von Ländern, kommunalen Spitzenverbänden, Fachkreisen und Verbänden

(1) Der Entwurf einer Gesetzesvorlage ist Ländern, kommu- nalen Spitzenverbänden und den Vertretungen der Län- der beim Bund möglichst frühzeitig zuzuleiten, wenn ihre Belange berührt sind. Ist in wesentlichen Punkten mit der abweichenden Meinung eines beteiligten Bundesministe- riums zu rechnen, hat die Zuleitung nur im Einverneh- men mit diesem zu erfolgen. Soll das Vorhaben vertraulich behandelt werden, ist dies zu vermerken.

(2) Das Bundeskanzleramt ist über die Beteiligung zu unter- richten. Bei Gesetzentwürfen von besonderer politischer Bedeutung muss seine Zustimmung eingeholt werden.

(3) Für eine rechtzeitige Beteiligung von Zentral- und Gesamt- verbänden sowie von Fachkreisen, die auf Bundesebene bestehen, gelten die Absätze 1 und 2 entsprechend. Zeit- punkt, Umfang und Auswahl bleiben, soweit keine Sonder- vorschriften bestehen, dem Ermessen des federführenden Bundesministeriums überlassen.

35Rechtsetzung

(4) Bei der Beteiligung nach den Absätzen 1 und 3 ist ausdrück- lich darauf hinzuweisen, dass es sich um einen Gesetzent- wurf handelt, der von der Bundesregierung noch nicht beschlossen worden ist. Dem Gesetzentwurf können die Begründung und das Vorblatt beigefügt werden.

§ 48 Unterrichtung anderer Stellen

(1) Sollen die Presse sowie andere amtlich nicht beteiligte Stellen oder sonstige Personen Gesetzentwürfe aus den Bundesministerien erhalten, bevor die Bundesregierung sie beschlossen hat, bestimmt das federführende Bundes- ministerium, bei grundsätzlicher politischer Bedeutung das Bundeskanzleramt, in welcher Form dies geschehen soll.

(2) Wird ein Gesetzentwurf den Ländern, den beteiligten Fachkreisen oder Verbänden beziehungsweise Dritten im Sinne von Absatz 1 zugeleitet, so ist er den Geschäftsstellen der Fraktionen des Deutschen Bundestages, dem Bundes- rat und auf Wunsch Mitgliedern des Deutschen Bundes- tages und des Bundesrates zur Kenntnis zu geben.

(3) Über die Einstellung des Gesetzentwurfs in das Intranet der Bundesregierung oder in das Internet entscheidet das federführende Bundesministerium im Einvernehmen mit dem Bundeskanzleramt und im Benehmen mit den übrigen beteiligten Bundesministerien.

(4) Bei der Unterrichtung nach Absatz 1 bis 3 gilt § 47 Absatz 4 entsprechend.

§ 49 Kennzeichnung und Übersendung der Entwürfe

(1) Gesetzentwürfe sind mit dem Datum und dem Zusatz „Ent- wurf“ zu versehen. Änderungen gegenüber dem jeweils vorangegangenen Entwurf sind kenntlich zu machen.

(2) Bei der Übersendung ist darzulegen, ob es sich um ein Gesetzgebungsvorhaben handelt, das der Zustimmung des Bundesrates bedarf.

§ 50 Frist zur abschließenden Prüfung

Die Frist zur abschließenden Prüfung des Gesetzentwurfs durch die nach den §§ 44, 45 und 46 Beteiligten beträgt in

36 Rechtsetzung

der Regel vier Wochen. Sie kann verkürzt werden, wenn alle Beteiligten zustimmen. Bei umfangreichen oder rechtlich schwierigen Entwürfen verlängert sich die Frist auf acht Wochen, wenn dies von einem Ressort im Rahmen der Beteiligung nach § 45 beantragt wird.

Abschnitt 4 Behandlung von Gesetzentwürfen durch die Bundesregierung

§ 51 Vorlage an das Kabinett

Werden Gesetzesvorlagen nach Abschnitt 3 der Bundesre- gierung zum Beschluss vorgelegt, ist im Anschreiben zur Kabinettvorlage unbeschadet des § 22 anzugeben, 1. ob die Zustimmung des Bundesrates erforderlich ist, 2. dass das Bundesministerium der Justiz die Prüfung

nach § 46 Absatz 1 bestätigt hat, 3. dass die Anforderungen nach § 44 erfüllt sind, 4. welche abweichenden Meinungen aufgrund der Betei-

ligungen nach den §§ 45 und 47 bestehen, 5. mit welchen Kosten die Ausführung des Gesetzes Bund,

Länder oder Kommunen belastet und ob das Bundes- ministerium der Finanzen und die in den §§ 44, 45 genannten Stellen ihr Einverständnis erklärt haben,

6. ob der Nationale Normenkontrollrat nach § 45 Absatz 2 zu dem Gesetzentwurf Stellung genommen hat und ob hierzu der Entwurf einer Stellungnahme der Bundesre- gierung vorliegt,

7. inwieweit im Falle der Umsetzung einer Richtlinie oder sonstiger Rechtsakte der Europäischen Union über deren Vorgaben hinaus weitere Regelungen getroffen werden,

8. ob die Vorlage ausnahmsweise besonders eilbedürftig ist (Artikel 76 Absatz 2 Satz 4 Grundgesetz).

§ 52 Einheitliches Vertreten der Gesetzesvorlagen; Formulierungshilfe für den Deutschen Bundestag und den Bundesrat

(1) Die von der Bundesregierung beschlossenen Gesetzes- vorlagen sind vor dem Deutschen Bundestag und dem Bundesrat einheitlich zu vertreten, auch wenn einzelne Bundesministerien eine andere Auffassung hatten.

34 Legislation

This shall not affect the responsibility of the Federal Minister for urgent projects in his or her portfolio.

§ 46 Legal system and legal form examination

(1) Before a bill is submitted to the Federal Government for a decision, it shall be forwarded to the Federal Ministry of Justice for examination from a legal system and legal formal point of view (legal examination).

(2) When sending the draft, it shall be taken into account that in the case of drafts of a larger volume, the Federal Ministry of Justice must have sufficient time to examine and discuss issues arising during the examination pursuant to paragraph 1.

(3) If the Federal Ministry of Justice has participated in the preparation of a draft and has already subjected it to the examination pursuant to paragraph 1, it may, with its consent, refrain from submitting the draft again.

§ 47 Participation of Länder, municipal umbrella organizations, specialist groups and associations

(1) The draft of a bill shall be forwarded to the Länder, the central associations of local authorities and the representations of the Länder to the Federation as early as possible if their interests are affected. If the opinion of a federal ministry involved is likely to differ on essential points, the bill shall be forwarded only in agreement with that ministry. If the project is to be treated confidentially, this must be noted.

(2) The Federal Chancellery shall be informed of the participation. In the case of draft legislation of particular political significance, its consent must be obtained.

(3) Paragraphs 1 and 2 shall apply mutatis mutandis to the timely participation of central and general associations and of expert groups existing at the federal level. The timing, scope and selection shall be left to the discretion of the lead Federal Ministry, unless special provisions exist.

Subscribe to DeepL Pro to translate larger documents. Visit www.DeepL.com/pro for more information.

Legislation 35

(4) When participating in accordance with subsections (1) and (3), express reference shall be made to the fact that the bill in question has not yet been adopted by the Federal Government. The explanatory memorandum and the preliminary sheet may be attached to the draft bill.

§ 48 Informing other bodies

(1) If the press and other agencies not officially involved or other persons are to receive draft laws from the federal ministries before the federal government has passed them, the lead federal ministry, or the Federal Chancellery in the case of fundamental political importance, shall determine the form in which this is to be done.

(2) If a bill is forwarded to the Länder, the specialist groups or associations concerned or third parties within the meaning of subsection 1, it shall be made available to the offices of the parliamentary groups of the German Bundestag, the Bundesrat and, on request, to members of the German Bundestag and the Bundesrat.

(3) The lead Federal Ministry shall decide on the posting of the bill on the intranet of the Federal Government or on the Internet in agreement with the Federal Chancellery and in consultation with the other Federal Ministries involved.

(4) Section 47 (4) shall apply mutatis mutandis to information provided in accordance with (1) to (3).

§ 49 Marking and sending the drafts

(1) Draft bills shall be marked with the date and the addition "Draft". Amendments to the previous draft shall be indicated.

(2) When sending the bill, it must be stated whether it is a legislative project requiring the consent of the Bundesrat.

§ 50 Deadline for the final examination

The period for final consideration of the bill by the parties involved under sections 44, 45, and 46 shall be as follows in

36 Legislation

usually four weeks. This period may be shortened if all parties involved agree. In the case of extensive or legally difficult drafts, the period shall be extended to eight weeks if requested by a department within the framework of participation pursuant to Section 45.

Section 4 Treatment of bills by the federal government

§ 51 Submission to the Cabinet

If bills under section 3 are submitted to the Federal Re- gime for decision, the cover letter to the Cabinet submission shall state, notwithstanding section 22, 1. whether the consent of the Bundesrat is required, 2. that the Federal Ministry of Justice has confirmed the

examination pursuant to Section 46(1), 3. that the requirements according to § 44 are fulfilled, 4. which divergent opinions exist on the basis of the

participations pursuant to Sections 45 and 47, 5. with which costs the implementation of the Act will

burden the Federal Government, the Länder or local authorities and whether the Federal Ministry of Finance and the bodies referred to in sections 44, 45 have declared their consent,

6. whether the National Standards Control Council has issued an opinion on the bill pursuant to section 45(2) and whether a draft statement by the Federal Government is available in this regard,

7. the extent to which, in the case of implementation of a directive or other legal acts of the European Union, further regulations are made beyond their requirements,

8. whether the submission is exceptionally particularly urgent (Article 76 (2) sentence 4 of the Basic Law).

§ Section 52 Uniform Representation of Bills; Formulation Guide for the German Bundestag and the Bundesrat

(1) Bills passed by the Federal Government shall be presented uniformly before the German Bundestag and the Bundesrat, even if individual Federal Ministries took a different view.

Annex_2_ArbZG_paras3-9_GER_ENG.pdf

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;����������+��

������ ����� �� �����&#xd;�� �����

�� ���&#xd;���� �����������������

�����&#xd;����

�����&#xd;� ��&#xd;��� ����������� ��&#xd;������!"�"���#�$����%�&���%�'&��� ������������()����&#xd;*������� ��� ���� �����++� ,��������+�+��!"�"���#�$��---�'���.������/������&#xd; ��

������ ����������.���������()�������������++��+�+�+��#�---�

�������

��������� ��&#xd;���� ���������������� ������������ ����� �������� !"��#�$�%�����&�"�'�(���%�%"#�#�� �������)�*��+������� �)��,���,�,##%�-�%##,����� . � ,� ���/������� ����&#xd;*�������������������#���%��!����0�'�����"���� ������ ()�� ����1 �&#xd; ������������+��$����+ �&#xd;� � ��������%������&#xd;��2�� �����������

�� ������ ������ �������������� �������� �!<&#xd;"��#��� ��� �� �

�/�(*��� ��� ���� �&#xd; ��� & ��NN �&#xd;��$&#xd;()��)�&#xd;������������ ���)�&#xd;� ()������������&#xd;���)����&#xd;������"���� ��3���&#xd;*�,��� ()����

����&#xd;�������� ()�&#xd;�4�&#xd;()���5&#xd;�� ()� � �������&#xd;���������&#xd;� ��&#xd;��� �������������/.)���&#xd; ���������&#xd;� 0�)������&#xd;�������� 6�� ��7&#xd;��������&#xd;� ��&#xd;������������ ���� �/&#xd;� N

+�NN ����$������������&#xd;�� �����&#xd;()�����*�������8�&#xd;��������� �9������������&#xd;� ��)���������� ���&#xd; ()���1�)����� ��������&#xd;���)������� ()6����� N

�$<%������ �� ���������

!�'�����&#xd;� ��&#xd;��&#xd;��$&#xd;�����&#xd;� � ��� ���� �&#xd; ���&#xd;����&#xd;������"��&#xd;����&#xd; �����1������������&#xd;���)����&#xd;��0�)�3�� ��: ����&#xd;� ��&#xd;������&#xd;���)����������&#xd;�������� &#xd;����� ���������()�����#��"�������������9�����.)�����&#xd;� 0�)�3�� �����������&#xd;� ��&#xd;��

!+'�����&#xd;���)����&#xd;��$&#xd;�����&#xd;� � ��� ���� � &#xd;�������&#xd;������������ ������� �/&#xd;���&#xd;�����&#xd;)����"��� �&#xd;����� "� (). �&#xd;�����

!-'�;�()���&#xd;��&#xd;��$&#xd;�����&#xd;� � ��� ���� �&#xd; ���&#xd;����&#xd;������+-��&#xd; ���<)�&�&#xd;��".(*���&#xd;�������2���&#xd;����&#xd;����&#xd;����&#xd;����� ++��&#xd; �=�<)��

!�'�;�()�����&#xd;��&#xd;��$&#xd;�����&#xd;� � ��� ���� �&#xd; ��>��������&#xd;�&��&#xd;����)���� ��/�&#xd;�$�����������;�()���&#xd;���� �4��

!='�;�()�����&#xd;���)����&#xd;��$&#xd;�����&#xd;� � ��� ���� � &#xd;�������&#xd;���)���&��&#xd;� ��NN �� �������&#xd;)��������&#xd;� ��&#xd;��� ����������������/�&#xd; ��;�()�����&#xd;��&#xd;��5�() �� ()&#xd;()�������&#xd; ����)���������

N

+�NN ;�()�����&#xd;������&#xd;��� ��� ��?�9�����&#xd;��2�������>�)����&#xd; ���� N

&#xd;"�������� ������ &��#�'������������� �������������� ������&#xd;����� �(<������ ��������������������

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;���+������+��

,&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;����������&#xd;���)������� ��()��$��������&#xd;()��6��� ()��&#xd;�����$&#xd;��*������ ��&#xd; ������)� $���������������.������/�����&�/����&#xd;����)�������� �() �2��������������������&#xd;����)��������+��5�()���&#xd;� ,��() ()�&#xd;����()��$�������/��*�.��&#xd;()��&#xd;()��6��� ()�&#xd;�����/������

�)<*���+�� ��

,&#xd;������&#xd;��&#xd; �����()�&#xd;������� � � � ��)�����0�)�3�� ��������&#xd;��� ��� �-��@&#xd;��������&#xd;��&#xd;��������&#xd;� ��&#xd;� ������)���� � �() ��&#xd; ���������$������������=�@&#xd;��������&#xd;��&#xd;��������&#xd;� ��&#xd;��������)���� ������$������ &#xd;� �� ���������������()����,&#xd;��0�)�3�� �����()�$������*A�����&#xd;����&#xd;��� ()�&#xd;��������>�/�&#xd;� ��&#xd;��� ��� ��= @&#xd;�������� ����&#xd;���/�������B.������� � �() �$�������)&#xd;�����&#xd;��������6� �������&#xd;���)�����&#xd;()���)���0�)�3�� � �� (). �&#xd;���/������

�,<*��� ���

!�'�,&#xd;������&#xd;���)�����6 �����()�"����&#xd;����������.��&#xd;()�������&#xd;� ��&#xd;���&#xd;�������������()����0�)���&#xd;����� �&#xd;��� ��� ��� �$�������)�����

!+'�,&#xd;��,���������0�)���&#xd;���� ��� ���� ���*����&#xd;��2���*��).� ����������������1&#xd;��&#xd;()�����������"�)�������& C ���������"�������������C�� ����&�&#xd;���� � �.�����������������1&#xd;��&#xd;()�����������"�/&#xd;����������"�)���������& &#xd;�����*�)� ����&#xd;����&���&#xd;��0��� ��*� �/&#xd;��&#xd;������B���/&#xd;�� ()� ������&#xd;������9&#xd;��)�����������&#xd; �����&#xd;���$����� ���*6����/�����&�/����>�������*6����������0�)���&#xd;��&#xd;����)�����&#xd;�� �2������������ ������&#xd;����)���������&#xd;�� 5�()������()�����.���������&#xd;������������0�)���&#xd;���� ��&#xd;��� ��� ��/A� �$��������� ����&#xd;()���/&#xd;���

!-'���/�&#xd;()���������� ������*A�����&#xd;��2���*��).� ����������������1&#xd;��&#xd;()�����������"�)�������&�C �������� "�������������C�� �����26������������0�)���&#xd;�����()�#��� 3��()��)����/.)���������0� ����&#xd;� ()� �&��&#xd;� �&#xd;()����)���� ��&#xd;��D.� �������0�)���&#xd;����������&��������������&#xd;������ ����&#xd;()���/������

!�'�!/���� �����'

�-<.�����������������������

!�'�,&#xd;������&#xd;� ��&#xd;������;�()�������$()&#xd;()�����&#xd;���)����&#xd; ����()������� &#xd;()����������&#xd;� /&#xd; �� ()� ��&#xd;()�� 1�*�����&#xd; ���6�����&#xd;����� ()������()����� ����������������&#xd;�� � ���������

!+'�,&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;������;�()�����&#xd;���)������� ��()��$��������&#xd;()��6��� ()��&#xd;�����$&#xd;��*������ ��&#xd; �����)��$���������������.������/�����&�/������/�&#xd;()��������E�-�&#xd;����)���������&#xd;����2����������������� &#xd;����)���������&#xd;���5�()���&#xd;��,��() ()�&#xd;����()��$�������/��*�.��&#xd;()��&#xd;()��6��� ()�&#xd;�����/�������86����&#xd;��.���& &#xd;��������;�()�����&#xd;���)����&#xd;��$&#xd;������ �E�+��� ��=�;���+��&#xd;()������;�()�����&#xd;��)������������/�����&� &#xd;�����E�- $����+���/�������

!-'�;�()�����&#xd;���)���� &#xd;�������()�&#xd;��&� &#xd;()�����"��&#xd;�������"� (). �&#xd;�������������()�&#xd;��������.4&#xd;��� ��&#xd;��� �.����������&#xd;()��/��&#xd;������ ����&#xd;� �)��������&#xd;� ���&#xd;�&#xd;�&#xd; ()������ �()�������� ����;�()�������������� =���B���� >�)�� � ��)��;�()�����&#xd;���)������&#xd;� � �0�()��&#xd;����&#xd;��� �.����������&#xd;���� �)������,&#xd;��2� ������� <���� �()������)�����������&#xd;����������������&� � ��������&#xd;��<���� �()����������;�()�����&#xd;���)������&#xd;()� *� ����� ����()��&#xd;����"���&#xd;�� �����������&#xd;����6�������&#xd;���&#xd;()���,&#xd;�� ������"���&#xd;�� .���������&#xd;�����

!�'�,�������&#xd;�������)�������;�()�����&#xd;���)������ ��� ��������������� ��&#xd;���� 6��&#xd;)�����&#xd;������ 9��� ����&#xd;� 3��������� �����&�/��� �'NN ��()�����&#xd;� ���&#xd;�&#xd;�&#xd; ()���8� � ���������&#xd;��/�&#xd;���������&#xd;()���������;�()�����&#xd;����������&#xd;���)����&#xd;�� �&#xd;���

�� ���)�&#xd;���� .)��������� N

�'NN &#xd;��D�� )������ �����&#xd;���)��� ��&#xd;��2&#xd;����������/A� � �)��������&��� ��&#xd;()�������&#xd;������������&#xd;��D�� )��� ���������C�� �����������/������*���&����� N

('NN ��������&#xd;���)�����&#xd;���� ()/��3 �������6� �&#xd;��������)A�&#xd;���������� ������)��&������&#xd;()�������&#xd;��� ��������&#xd;��D�� )�����������������)A�&#xd;������� �����/������*���& N

� ���������&#xd;()����&#xd;�����������&#xd;���&#xd;()��1� ������&#xd; ���������� ��)����$��)�������<� ��������� ;�()�����&#xd;���)��� ��� ��&#xd;���� 6��&#xd;)�����&#xd;�������9��� ����&#xd;� 3�������()��� � ������ �����&#xd;������ ��&#xd;�����������&#xd;���&#xd;()��1� ������&#xd; ����������&� ��&#xd; ������"���&#xd;�� �������C�� �����������)A�����,���"���&#xd;�� ������ C�� ��������*������������&#xd;���������� ()�.��� 6���&#xd;���<� ���������������&#xd;����

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;���-������+��

!='�$�/�&#xd;��*�&#xd;������&#xd; ��������&#xd;()����� ���&#xd;() ������������� ��)��&�)�����������&#xd;�����������;�()�����&#xd;���)��� 6���&#xd;��/.)���������;�()���&#xd;������&#xd; ����������&#xd;� ��������&#xd;��������� ������)������)����� ��&#xd;���9�������� �&#xd;���������� ������� ()������ ��� �&#xd;)��)&#xd;�� 6���� ��)�����"���������&#xd;� �������������/.)����

!�'�1 �&#xd; �� &#xd;()���� ������&���4�;�()�����&#xd;���)�����������&#xd;()������������������&#xd;���&#xd;()���5�&#xd;����&#xd;������������ �� �&#xd;�� A���������@�4��)����)�����/&#xd;���&#xd;��6��&#xd;��������&#xd;���)����

�/<��"���������*���������

!�'�#���&#xd;����9��&#xd; ��������������� ��������&#xd;�� �9��&#xd; ������� �&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;��������*��� ������ ���/�����& ��NN ��/�&#xd;()��������E�-

�'NN �&#xd;������&#xd;� ��&#xd;��6������)��$�������/��*�.��&#xd;()��������.�����&�/����&#xd;���&#xd;������&#xd;� ��&#xd;��������.4&#xd;����� &#xd;����)���&#xd;()���<� ��������&#xd;� ����&#xd;� ()� �������"���&#xd;� ()� � �&#xd;�� �� .���& N

�'NN �&#xd;�������������� ���&#xd;() ��&#xd;������ � ��������& N

('NN !/���� �����' N

N

+�NN ��/�&#xd;()��������E���$����+��&#xd;���� �������������0�)�3�� ���&#xd;��$()&#xd;()�����&#xd;������������*�)� ����&#xd;���� �� �2���3�� ������������� �����,������� ����&#xd;���& N

-�NN ��/�&#xd;()��������E�=��� �����&#xd;��0�)���&#xd;������&#xd; �����/�&#xd;�$����������*6����&�/�����&#xd;��������������&#xd;���&#xd;� �� ������������&#xd;��26����������0�)���&#xd;��&#xd;����)�����&#xd;�� � � �������������� ���&#xd;() ��&#xd;����� ��� ����&#xd;()�� /&#xd;��& N

��NN ��/�&#xd;()��������E����� ��+ �'NN �&#xd;������&#xd;� ��&#xd;��6������)��$�������/��*�.��&#xd;()�)&#xd;��� ��������.�����&�/����&#xd;���&#xd;������&#xd;� ��&#xd;�

������.4&#xd;������&#xd;����)���&#xd;()���<� ��������&#xd;� ����&#xd;� ()� �������"���&#xd;� ()� � �&#xd;�� �� .���& N

�'NN �&#xd;�������������� ���&#xd;() ��&#xd;������ � ��������& N

N

=�NN ����"��&#xd;����� � &#xd;���� �6��&#xd;����;�()���&#xd;����� ��� �E�+��� ��-��� ��&#xd;����&#xd;���/&#xd; ()���++�����+��<)� � ��������� N

!+'�$� ���������� ���)�&#xd;� ()������������&#xd;���)�������()��&#xd;������� 3��()��������&#xd;��� ���&#xd;()���/.)���&#xd; ��� /&#xd;��&�*����&#xd;���&#xd;����9��&#xd; ��������������� ��������&#xd;�� �9��&#xd; ������� �&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;������� ������������ ���/�����& ��NN ��/�&#xd;()��������E�=��� �����&#xd;��0�)���&#xd;������&#xd;�0� ����&#xd;� ()� ������"� �����)�&#xd;�����&#xd;� � �,&#xd;�� ��

����3� ��&�&#xd;� �� �������26������������0�)���&#xd;��&#xd;� ���������#��� 3��()��)����/.)������&#xd;� � �,&#xd;�� �� �������������&#xd;������ �����&#xd;()��& N

+�NN �&#xd;��0��������������EE�-&�=��� ��������E����� ��+�&#xd;������B���/&#xd;�� ()� ������"� ������� ������1������&#xd;�� �/&#xd;� ����5&#xd;������� �&#xd;� �6 �������3� ��& N

-�NN �&#xd;��0��������������EE�-&��&�=��� ��������E����� ��+���&#xd;�����"�)�������&�C ���������"�������������C�� ���� ����1&#xd;��������&#xd;� ���9.�&#xd;�*�&#xd;����������5�)���&#xd;� ���C�� �������� 3��()��������3� ��& N

��NN �&#xd;��0��������������EE�-&��&�=��� ��������E����� ��+���&#xd;����/�������������"���&#xd;������� �"���� &���� B.����&���������&#xd;��������� �� �&#xd;����2A�3�� ()� ���&��� �����������$�&#xd; ��������� �A ����&#xd;()���0�()� �/&#xd;����&#xd;�������������&#xd;�������&��&#xd;������9��&#xd; �&#xd;�������&#xd;�� � 6������A ����&#xd;()���,&#xd;�� ������������������&#xd;�� &#xd;��/� ����&#xd;()���&#xd;�)��� ���&#xd;()���9��&#xd; ������� �������&#xd;����&�����1&#xd;�����������9.�&#xd;�*�&#xd;����&#xd;��&#xd;� ���$������ ����3� ��� N

!+�'�#���&#xd;����9��&#xd; ��������������� ��������&#xd;�� �9��&#xd; ������� �&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;��������*��� ��/�&#xd;()������������EE�-&�=��� ��������E����� ��+������� ���/�����&��&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;����()��)�� �� ���&#xd;()�6�����()��$��������������.�����&�/����&#xd;���&#xd;������&#xd;� ��&#xd;��������.4&#xd;������&#xd;����)���&#xd;()���<� ��� ����&#xd;� ����&#xd;� ()� �������"���&#xd;� ()� � �&#xd;�� �� .�����������()��� �������0���������� &#xd;()���� ������/&#xd;��&��� ��&#xd;� �� ���)�&#xd;����������&#xd;���)�����&#xd;()���� .)�����/&#xd;���

!-'�#��������� ����&#xd;()��&#xd;�� �9��&#xd; ������� ���()��� �����&�+������+��*A�������/�&#xd;()��������&#xd; ��������&#xd;()� 0����������&#xd;��"���&#xd;����&#xd;�� ��&#xd;()�����&#xd; ���������������&#xd;������ ����()�"���&#xd;�� �������,&#xd;�� �����&#xd;������������& /�����&#xd;��"���&#xd;�� �������C�� ���������&#xd;()���� ��)�&����()� ()�&#xd; ��&#xd;()������&#xd;���������/&#xd; ()�����������&#xd;������

1&#xd;��$���&#xd;(���� �"���� �&#xd;�&#xd; ���&#xd;�� ����� � �&#xd;�� �/&#xd;���� �"���� ��� � 6� � �&#xd;��M�///��� �����&#xd;��&#xd;����������

��$�&#xd;����������+��

������������&#xd;���)����6����������/�������2A������� ��������&#xd;�� � ��()���9��&#xd; ������� ���/�&#xd;()���� 0����������&#xd;���&#xd;����"���&#xd;�� �������,&#xd;�� �����&#xd;������������� ���/�����&�*������()�&#xd;��"���&#xd;������&#xd;�� �&#xd;()�����&#xd; ���������������&#xd;������ �������������()�����()��/�������1&#xd;�����()��� ����+�;���������� ��� ��/�&#xd;()��������&#xd; ��������&#xd;()��0��������)����/&#xd; ()����&#xd;()�����&#xd; ���������������&#xd;����������������&#xd;���)���� �������&�/�����/&#xd; ()���&#xd;)�����&#xd;����/����������� 6������A ����&#xd;()���,&#xd;�� ���������������&#xd; ��������&#xd;()�� "� �&#xd;������������&#xd;������&#xd; �������&#xd;������&#xd;��������&#xd;��2� ������ �"���&#xd;�� �6���/&#xd;�������&#xd;����/���������&#xd;� $&#xd;������ �D�� )��� ��()� ���(*���

!�'�,&#xd;��2&#xd;�()��������&#xd;��A ����&#xd;()���()��&#xd;()���0��&#xd;�&#xd;�� �� ��� ()� ����*A������&#xd;��&#xd;���� �����&�+������+� ������������/�&#xd;()������&#xd;��&#xd;)����0������������� �)���

!='�#���&#xd;����"���&#xd;()&�&#xd;������0�������������()�9��&#xd; ��������6��&#xd;()��/�&#xd; ���&#xd;()������� ���/�����&�*A���� �� ��)����&#xd;��0�)������ ��� ���� ��&�+������+�����()��&#xd;���� &#xd;()� ��)A������/&#xd;��&#xd;���/�����&�/�����&#xd;� ��� ����&#xd;���&#xd;()�����6������� ������&#xd;()�&#xd; �������&#xd;���� ���)�&#xd;����������&#xd;���)�����&#xd;()���� .)�����/&#xd;���

!�'�,&#xd;��"���� ���&#xd;������*�������()�0�()� ������������&#xd;���� �&#xd;�������� �"���� ���� ��� ��)����&#xd;� 0�)������ ��� ���� ��������+����� ��&� � �����&#xd;� ��� �����&#xd;���&#xd;()�����6������� ������&#xd;()�&#xd; �������&#xd;� �� ���)�&#xd;����������&#xd;���)�����&#xd;()���� .)�����/&#xd;���

!%'��� ��������&#xd;����0����������()��� ����+������������� .�����-��&#xd; �=�>�/�&#xd;� �&#xd;������&#xd;�������&#xd;���� ����+� ��� ��&#xd;������&#xd;� ��&#xd;����������.������/�����&�/������������&#xd;���)���� ()�&#xd; ��&#xd;()��&#xd;���/&#xd;��&#xd;���)����,�������&#xd;���)��� *�����&#xd;��1&#xd;�/&#xd;��&#xd;������&#xd;���&#xd;����8�&#xd; ������ �() �@������� ()�&#xd; ��&#xd;()�/&#xd;����� ����,�������&#xd;���������� ��&#xd;��� ����&#xd;���)�����&#xd;()������()��&#xd;�&#xd;���&�/�&#xd;���&#xd;� ����&#xd;��1&#xd;�/&#xd;��&#xd;�������������.����������������&#xd;� ��&#xd;���&#xd;()����*�.�� ������&#xd;��1&#xd;�/&#xd;��&#xd;�����/&#xd;����� ���)���

!?'�5������0������������()��� ������;����������&��� ����+�;���+��&#xd; �������� ��()��0������������ ������ ������ .����-������������� ��&���� ��&#xd;������&#xd;� ��&#xd;���?�$�������/A()����&#xd;()�&#xd;��,��() ()�&#xd;��������/A� 2����������������&#xd;()��6��� ()��&#xd;�����1� ������&#xd;������ ������ ��������� ��� ���� �=&���� ��&#xd;������&#xd;� ��&#xd;���? $�������/A()����&#xd;()�&#xd;��,��() ()�&#xd;������� �() �2��������������������+��5�()����&#xd;()��6��� ()��&#xd;����

!�'�5&#xd;����&#xd;��/��*�.��&#xd;()������&#xd;� ��&#xd;��6�����/A� �$�������)&#xd;��� �����.�����&��� �&#xd;�����&#xd;������������ ()�� ��� �&#xd;��"����&#xd;�������������&#xd;� ��&#xd;���&#xd;���0�)���&#xd;�������&#xd;��� ��� ��� �$���������/.)���/������

�0<���'����������������

,&#xd;��"���� ���&#xd;������*�������()�0�()� ������������&#xd;���� �&#xd;�������� �"���� ���� � 6���&#xd;������ "� (). �&#xd;���� ����&#xd;()�&� 6���� �&#xd;���������&#xd;��������� 6���� �&#xd;���������&#xd;���)������33��&���&#xd;��������� ������ �� �)���� 6���&#xd;���� ���)�&#xd;����������&#xd;���)���������/������ &#xd;��&��&#xd;������&#xd;� ��&#xd;��6����E�-�)&#xd;��� ��� ()�.�*��& �&#xd;��0�)�3�� �������0�)���&#xd;����6�����&#xd;��EE�������=�)&#xd;��� ��� ��)���&��&#xd;��0��������������$()��������;�()�� ����$()&#xd;()�����&#xd;���)����&#xd;��E�����/�&#xd;����������&#xd;����/�&#xd;()��� �A��&#xd;()*�&#xd;������()�E�%��� ()�.�*��&� �/�&#xd;���&#xd;� ����$()���������� ���)�&#xd;����������&#xd;���)������ ������&#xd;()�&#xd; ���$�������&#xd;����&#xd;()�� 6��"� (). �&#xd;���� ����&#xd;()����� ����&#xd;����&#xd;��"���&#xd;����&��&#xd;������"����� &#xd;()��������&#xd;�����

1��������� ������ ������������������ ���� �2<������������������ ����

!�'�����&#xd;���)�����6� ������$����������� ����&#xd;()���8�&#xd;���������������&#xd; �+��<)���&#xd;()���� (). �&#xd;���/������

!+'�#����)� ()&#xd;()�&#xd;����"���&#xd;������&#xd;��������.4&#xd;����9��������;�()� ()&#xd;()��*����"��&#xd;��������1��������$���� ����8�&#xd;����� ��)������&#xd; ���� �() �$��������������������6(*��������/�����&�/���� 6���&#xd;���� �����"��&#xd;������ 0�)���&#xd;�� ���������+��$�����������"���&#xd;�����)��

!-'�86��2�� � �)��������"�&#xd; �)����*��������"��&#xd;�������+� �6��&#xd;����$���������8�&#xd;����� ��)������&#xd; �����/�&#xd; $������������������/������

�!3<������������������ �� ��'�������

!�'�$� �����&#xd;������&#xd;�����&#xd;()�����5��*������������������/������*A����&��6� �������&#xd;���)�������$�������� 8�&#xd;����������/�&#xd;()��������E����� (). �&#xd;���/����� ��NN &#xd;��;��������0������ �&#xd;�� ���� �/&#xd;����&#xd;�����8����/�)�&

N

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 1 from 12 -

Working Hours Act (ArbZG) ArbZG

Date of issue: 06.06.1994 Full

citation:

"Working Hours Act of June 6, 1994 (BGBl. I p. 1170, 1171), as last amended by Article 6 of the Act of December 22, 2020 (BGBl. I p. 3334)."

Status: Last amended by Art. 6 G v. 22.12.2020 I 3334

Footnote

(+++ Text reference as of: 1.7.1994 +++) (+++ Implementation of the

EGRL 104/93 (CELEXNr: 393L0104) cf. art. 4b G v. 24.12.2003 I 3002 +++)

The G was passed by the Bundestag as Article 1 G v. 6.6.1994 I 1170 (ArbZRG). It entered into force on July 1, 1994 in accordance with Art. 21 Sentence 2 of this Act.

Section One General Provisions § 1 Purpose of the law

The purpose of the law is to, 1. to ensure the safety and health protection of employees in the Federal Republic of Germany and

in the exclusive economic zone with regard to the organization of working hours and to improve the framework conditions for flexible working hours, and

2. to protect Sundays and state-recognized holidays as days of rest from work and spiritual upliftment for employees.

§ 2 Definitions

(1) For the purposes of this Act, working time is the time from the beginning to the end of work, excluding rest breaks; periods of work for several employers are to be added together. In underground mining, rest breaks count as working time.

(2) Employees within the meaning of this Act are blue- and white-collar workers and those employed for their vocational training.

(3) For the purposes of this Act, night time means the time from 11 p.m. to 6 a.m., and in bakeries and confectioneries the time from 10 p.m. to 5 a.m.

(4) For the purposes of this Act, night work is any work that involves more than two hours of night time.

(5) Night workers within the meaning of this Act are employees who 1. are normally required to work night shifts due to their work schedule, or 2. Perform night work for at least 48 days in a calendar year.

Second section Working hours and non-working hours per working day § 3 Working time of employees

Subscribe to DeepL Pro to translate larger documents. Visit www.DeepL.com/pro for more information.

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 2 from 12 -

The working day of employees may not exceed eight hours. It may be extended to up to ten hours only if an average of eight hours per working day is not exceeded within six calendar months or within 24 weeks.

§ 4 Rest breaks

Work shall be interrupted by pre-established rest breaks of at least 30 minutes if the working time exceeds six to nine hours and 45 minutes if the working time exceeds nine hours. to be interrupted altogether. The rest breaks in accordance with sentence 1 may be divided into periods of at least 15 minutes each. Employees may not be employed for longer than six consecutive hours without a rest break.

§ 5 Rest period

(1) Employees must have an uninterrupted rest period of at least eleven hours after the end of the daily work period.

(2) The duration of the rest period of paragraph 1 may be reduced by up to one hour in hospitals and other facilities for the treatment, care and supervision of persons, in restaurants and other facilities for catering and lodging, in transport companies, in broadcasting as well as in agriculture and animal husbandry, if each reduction of the rest period is compensated for within one calendar month or within four weeks by extending another rest period to at least twelve hours.

(3) By way of derogation from paragraph 1, in hospitals and other facilities for the treatment, nursing and care of persons, reductions in the rest period may be compensated for by taking time off during on-call duty that does not exceed half of the rest period at other times.

(4) (omitted)

§ 6 Night and shift work

(1) The working hours of night and shift workers shall be determined in accordance with established ergonomic findings on the humane organization of work.

(2) The working hours of night workers may not exceed eight hours per working day. It may be extended to up to ten hours only if, by way of derogation from § 3, an average of eight hours per working day is not exceeded within one calendar month or within four weeks. For periods in which night workers within the meaning of Section 2 (5) No. 2 are not required to perform night work, Section 3 Sentence 2 shall apply.

(3) Night workers are entitled to an occupational health examination prior to commencement of employment and at regular intervals of not less than three years thereafter. After completion of the After the age of 50, night workers are entitled to this right at intervals of one year. The costs of the examinations shall be borne by the employer, unless he offers the examinations to the night workers free of charge by a company physician or an inter-company service of company physicians.

(4) The employer shall transfer the night worker to a daytime workplace suitable for him at his request if a) according to occupational medicine, the continued performance of night work endangers the employee's

health, or b) there is a child under the age of twelve living in the employee's household who cannot be cared for by

another person living in the household, or c) the employee has to care for a relative in need of severe care who cannot be cared for by another

relative living in the household, unless there are urgent operational requirements to the contrary. If, in the opinion of the employer, urgent operational requirements prevent the transfer of the night worker to a daytime workplace suitable for him, the works council or personnel council must be consulted. The works council or personnel council may submit proposals to the employer for a transfer.

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 3 from 12 -

(5) In the absence of compensation provisions in collective agreements, the employer shall grant the night worker an appropriate number of paid days off for the hours worked during night time or an appropriate supplement to the gross remuneration to which he is entitled for this purpose.

(6) Ensure that night workers have the same access to in-company training and upward mobility measures as other workers.

§ 7 Deviating regulations

(1) may be permitted in a collective bargaining agreement or on the basis of a collective bargaining agreement in a works or service agreement, 1. deviating from § 3

a) to extend the working time beyond ten hours per working day if the working time regularly and to a considerable extent includes standby duty or on-call duty,

b) to determine a different compensation period, c) (omitted)

2. in deviation from § 4 sentence 2, to divide the total duration of rest breaks in shift work and transport operations into short breaks of appropriate duration,

3. in derogation of Section 5 (1), to reduce the rest period by up to two hours if the nature of the work so requires and the reduction in the rest period is compensated for within a compensation period to be specified,

4. deviating from § 6 para. 2 a) to extend the working time beyond ten hours per working day if the working time regularly

and to a considerable extent includes standby duty or on-call duty, b) to determine a different compensation period,

5. to set the beginning of the seven-hour night period of § 2 para. 3 to the time between 10 p.m. and midnight.

(2) Provided that the health protection of the employees is ensured by an appropriate compensation of time, a collective agreement or, on the basis of a collective agreement, a works or service agreement may further permit this, 1. in deviation from § 5 Para. 1, to adjust the rest periods during on-call duty to the special features of this

duty, in particular to compensate for reductions in the rest period as a result of utilization during this duty at other times,

2. to adapt the regulations of §§ 3, 5 para. 1 and § 6 para. 2 in agriculture to the tilling and harvesting season as well as to the weather conditions,

3. to adapt the regulations of §§ 3, 4, 5 para. 1 and § 6 para. 2 in the treatment, care and support of persons in accordance with the nature of this activity and the welfare of these persons,

4. to adapt the provisions of Sections 3, 4, 5 (1) and 6 (2) in the case of administrations and establishments of the Federal Government, the Länder, the municipalities and other corporations, institutions and foundations under public law, as well as in the case of other employers who are subject to a collective bargaining agreement applicable to the public service or to a collective bargaining agreement with substantially the same content, to the specific nature of the work performed at such establishments.

(2a) By way of derogation from Sections 3, 5(1) and 6(2), a collective agreement or a works or service agreement based on a collective agreement may permit the working day to be extended beyond eight hours without compensation if the working time regularly and to a considerable extent includes standby duty or on-call duty and special arrangements are made to ensure that the health of the employees is not endangered.

(3) Within the scope of application of a collective bargaining agreement pursuant to subsections 1, 2 or 2a, deviating collective bargaining provisions in the business of an employer who is not bound by a collective bargaining agreement may be adopted by means of a works or service agreement or, if there is no works or staff council, by written agreement between the employer

A service of the Federal Ministry of Justice and the Federal Office of Justice - www.gesetze-im-internet.de

- Page 4 from 12 -

and the employee. If, on the basis of such a collective agreement, deviating provisions can be made in a works or service agreement, use may also be made of this in establishments of an employer not bound by the collective agreement. A collective agreement concluded in accordance with subsection 2 no. 4. deviating collective bargaining agreement shall apply between employers and employees who are not bound by collective bargaining agreements if the application of the collective bargaining provisions applicable to the public service has been agreed between them and the employers predominantly cover the costs of the operation with grants within the meaning of budgetary law.

(4) The churches and the religious societies under public law may provide for the deviations referred to in paragraphs 1, 2 or 2a in their regulations.

(5) In an area in which regulations are not normally made by collective agreement, exceptions within the scope of paragraphs 1, 2 or 2a may be granted by the supervisory authority if this is necessary for operational reasons and the health of the employees is not endangered.

(6) The Federal Government may, by ordinance with the consent of the Bundesrat, permit exceptions within the scope of paragraph 1 or 2, provided that this is necessary for operational reasons and the health of employees is not endangered.

(7) On the basis of a regulation pursuant to subsection 2a or subsections 3 to 5, in each case in conjunction with subsection 2a, the working time may only be extended if the employee has consented in writing. The employee may revoke the consent in writing with six months' notice. The employer may not discriminate against an employee because the employee has not declared his consent to the extension of working hours or has revoked such consent.

(8) If regulations pursuant to paragraph 1 nos. 1 and 4, paragraph 2 nos. 2 to 4 or such regulations are approved on the basis of paragraphs 3 and 4, the working time may not exceed 48 hours per week on average over twelve calendar months. If the approval is based on paragraph 5, the working time shall not exceed 48 hours per week on average over six calendar months or 24 weeks.

(9) If the working day is extended beyond twelve hours, a rest period of at least eleven hours must be granted immediately following the end of the working time.

§ 8 Dangerous work

The Federal Government may, by ordinance and with the consent of the Bundesrat, restrict working hours beyond Section 3 for individual areas of employment, for certain types of work or for certain groups of employees where particular hazards to the health of employees are to be expected, extend the rest breaks and rest periods beyond §§ 4 and 5, extend the regulations for the protection of night and shift workers in § 6 and limit the possibilities for deviation in accordance with § 7, insofar as this is necessary to protect the health of the employees. Sentence 1 shall not apply to areas of employment and work in establishments subject to mining supervision.

Third section Sunday and holiday rest § 9 Sunday and holiday rest

(1) Employees may not be employed on Sundays and public holidays from 0 a.m. to midnight.

(2) In multi-shift operations with regular day and night shifts, the start or end of the Sunday and holiday rest period may be brought forward or back by up to six hours if operations are suspended for the 24 hours following the start of the rest period.

(3) For drivers and co-drivers, the start of the 24-hour Sunday and holiday rest period may be brought forward by up to two hours.

§ 10 Sunday and holiday employment

(1) If the work cannot be performed on working days, employees may be employed on Sundays and public holidays in derogation of § 9 1. in emergency and rescue services as well as in the fire department,

Annex_3_BMWK_PublicParticipation_Deadline_KSG_GER_Redacted.pdf

Annex_4_EC-1367-2006_Art9_ENG.pdf

�����5�����ৎ�(1�ৎ������������ৎ���������ৎ���

7,7/(� ,,,

PUBLIC PARTICIPATION CONCERNING PLANS AND PROGRAMMES RELATING TO THE ENVIRONMENT

Article 9�

�� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� SURYLGH�� WKURXJK� DSSURSULDWH�SUDFWLFDO�DQG�RU� RWKHU� SURYLVLRQV�� HDUO\� DQG� HIIHFWLYH�RSSRUት WXQLWLHV�IRU�WKH�SXEOLF� WR�SDUWLFLSDWH�GXULQJ� WKH�SUHSDUDWLRQ��PRGLILFDWLRQ� RU� UHYLHZ�RI�SODQV�RU�SURJUDPPHV� UHODWLQJ� WR� WKH�HQYLURQPHQW�ZKHQ� DOO� RSWLRQV� DUH� VWLOO� RSHQ�� ,Q� SDUWLFXODU�� ZKHUH� WKH� &RPPLVVLRQ� SUHSDUHV� D� SURSRVDO� IRU� VXFK� D� SODQ� RU� SURJUDPPH� ZKLFK� LV� VXEPLWWHG� WR� RWKHU ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� RU� ERGLHV� IRU� GHFLVLRQ�� LW� VKDOO� SURYLGH� IRU� SXEOLF� SDUWLFLSDWLRQ� DW� WKDW� SUHSDUDWRU\� VWDJH��

�� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� LGHQWLI\� WKH� SXEOLF� DIIHFWHG� RU� OLNHO\� WR� EH� DIIHFWHG� E\�� RU� KDYLQJ� DQ� LQWHUHVW� LQ�� D� SODQ� RU� SURJUDPPH� RI� WKH� W\SH� UHIHUUHG� WR� LQ� SDUDJUDSK� ��� WDNLQJ� LQWR� DFFRXQW� WKH� REMHFWLYHV� RI� WKLV� 5HJXODWLRQ��

�� ॼM1� 8QLRQ�୑�LQVWLWXWLRQV�DQG�ERGLHV�VKDOO�HQVXUH�WKDW�WKH�SXEOLF� UHIHUUHG� WR� LQ� SDUDJUDSK� �� LV� LQIRUPHG�� ZKHWKHU� E\� SXEOLF� QRWLFHV� RU� RWKHU� DSSURSULDWH�PHDQV�� VXFK� DV� HOHFWURQLF�PHGLD� ZKHUH� DYDLODEOH�� RI��

�D�� WKH� GUDIW� SURSRVDO�� ZKHUH� DYDLODEOH��

�E�� WKH�HQYLURQPHQWDO� LQIRUPDWLRQ�RU�DVVHVVPHQW� UHOHYDQW� WR� WKH�SODQ�RU� SURJUDPPH� XQGHU� SUHSDUDWLRQ��ZKHUH� DYDLODEOH�� DQG�

�F�� SUDFWLFDO� DUUDQJHPHQWV� IRU� SDUWLFLSDWLRQ�� LQFOXGLQJ��

�L�� WKH� DGPLQLVWUDWLYH� HQWLW\� IURP� ZKLFK� WKH� UHOHYDQW� LQIRUPDWLRQ� PD\� EH� REWDLQHG��

�LL�� WKH� DGPLQLVWUDWLYH� HQWLW\� WR� ZKLFK� FRPPHQWV�� RSLQLRQV� RU� TXHVWLRQV�PD\� EH� VXEPLWWHG�� DQG�

�LLL�� UHDVRQDEOH� WLPH�IUDPHV� DOORZLQJ� VXIILFLHQW� WLPH� IRU� WKH� SXEOLF� WR�EH� LQIRUPHG�DQG� WR�SUHSDUH�DQG�SDUWLFLSDWH�HIIHFWLYHO\�LQ� WKH� HQYLURQPHQWDO� GHFLVLRQ�PDNLQJ� SURFHVV��

��� $� WLPH� OLPLW� RI� DW� OHDVW� HLJKW� ZHHNV� VKDOO� EH� VHW� IRU� UHFHLYLQJ� FRPPHQWV�� :KHUH� PHHWLQJV� RU� KHDULQJV� DUH� RUJDQLVHG�� SULRU� QRWLFH� RI� DW� OHDVW� IRXU� ZHHNV� VKDOO� EH� JLYHQ�� 7LPH� OLPLWV� PD\� EH� VKRUWHQHG� LQ� XUJHQW� FDVHV� RU� ZKHUH� WKH� SXEOLF� KDV� DOUHDG\� KDG� WKH� RSSRUWXQLW\� WR� FRPPHQW� RQ� WKH� SODQ� RU� SURJUDPPH� LQ� TXHVWLRQ��

��� ,Q� WDNLQJ� D� GHFLVLRQ� RQ� D� SODQ� RU� SURJUDPPH� UHODWLQJ� WR� WKH� HQYLURQPHQW� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� WDNH� GXH� DFFRXQW� RI� WKH� RXWFRPH� RI� WKH� SXEOLF� SDUWLFLSDWLRQ� ॼM1� 8QLRQ�୑� LQVWLWXWLRQV� DQG� ERGLHV� VKDOO� LQIRUP� WKH� SXEOLF� RI� WKDW� SODQ� RU� SURJUDPPH�� LQFOXGLQJ� LWV� WH[W�� DQG� RI� WKH� UHDVRQV� DQG� FRQVLGHUDWLRQV� XSRQ� ZKLFK� WKH� GHFLVLRQ� LV� EDVHG�� LQFOXGLQJ� LQIRUPDWLRQ� RQ� SXEOLF� SDUWLFLSDWLRQ�

ॽB

Annex_5_DUH_Statement_KSG_GER_ENG_Redacted.pdf

Seite - 2 - der Stellungnahme zum Entwurf des KSG

Eine tiefere juristische Prüfung ist deshalb nicht notwendig. Schon die grundsätzliche Logik und Aus- richtung der Änderungen führt zu Verstößen gegen Artikel 20a GG in Verbindung mit dem Pariser Klimaschutzabkommen.

Die Deutsche Umwelthilfe fordert deshalb das Bundeskabinett sowie die Abgeordneten des Deut- schen Bundestags auf, diese Gesetzesnovellierung grundsätzlich abzulehnen.

Mit einer Veröffentlichung dieser Stellungnahme erklären wir uns einverstanden.

DUH e.V. is recognized as a non-profit organization. The annual financial statements are subject to voluntary auditing by an independent auditing firm.

Page - 2 - of the statement on the draft KSG.

A deeper legal examination is therefore not necessary. The basic logic and direction of the amendments alone lead to violations of Article 20a of the German Basic Law in conjunction with the Paris Climate Agreement.

Deutsche Umwelthilfe therefore calls on the Federal Cabinet and the members of the German Bundestag to reject this amendment in principle.

We agree to the publication of this statement.

Annex_6_BMUV_Email_GER_ENG_Redacted.pdf

Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection

Page 2

Regarding your question under point 1: The BMUV involves the Länder, associations and other bodies in all draft environmental legislation - and so also in the last 5 years (10.5.2017 to 10.5.2022) (§§ 47ff. GGO). As a rule, a processing period of four weeks is granted, which can be shortened to two to three weeks, taking into account the scope and complexity of the project. A further shortening of the participation period is possible if there are special reasons for faster processing in individual cases.

Regarding your question under point 3: In all participation procedures, the BMU(V) has involved the business associations concerned (d) and the recognized environmental associations (c). The general public (a) as well as individuals or legal entities (b; e) are not regularly involved. However, the draft references are published on the BMUV website so that all citizens and companies have the opportunity to comment on the draft. Publication on the Internet is only possible if all other departments and the Federal Chancellery agree. See also Annex 1 "House memo: "Publication of draft legislation and external Comments on the BMU website".

Regarding your question under point 4: The comments received are reviewed by the relevant department. Whether to take up the suggestions contained in a comment is a decision made on a case-by-case basis after careful consideration of all the feedback on the draft bill, including the feedback from the departmental vote. There is no provision for a backward-looking process of tracking which suggestions from which comments ultimately had an influence on the government draft. However, the comments from the state and association hearings are published on the BMUV website if they can be made accessible without barriers. This gives scientists and interested citizens the opportunity to take this backward look for themselves.

Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection

Page 4

Notes on data protection: The personal data you have provided (e.g. name and address) has been or will be processed for the purpose of contacting you and dealing with your request. The legal basis for this is Article 1 6 (l) (e) of the General Data Protection Regulation in conjunction with Section 3 of the German Federal Data Protection Act. Your data will be stored in accordance with the time limits applicable to the retention of schilitgut in the Registratun Directive, which supplements the Joint Rules of Procedure of the Federal Ministries (GVO). For more information on this and on your rights as a data subject, please refer to the BMUV's data protection statement: www.bmuv.de/datenschutz.

Attachment - Annex 1: House memo: "Publication of draft bills and external

comments on the BMU website".

Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection

Page 3

Regarding your question under point 5: The adoption of the findings and recommendations of the Aarhus Convention Compliance Committee (ACCC) in the proceedings ACCC/C/2014/120 against Slovakia were finally confirmed at the 72nd session of the ACCC from 18 to 21 October 2021. This was too late to allow a decision to be taken at the 7th Conference of the Parties to the UN ECE Aarhus Convention, which was meeting in parallel. In this respect, the decision under international law on these findings and recommendations is still pending. Accordingly, no consultations have yet taken place within the member states of the European Union or within the German government. For this reason, it is not possible at present to comment on any conclusions that may be drawn for the German legislative system and the involvement of the public.

For the existing system of implementation of Article 8 of the Aarhus Convention, reference is made to the current National Implementation Report of the Aarhus Convention in Germany (2021), which is available as follows: https://www.bmuv.de/fileadmin/Daten BMU/Download PDF/Umweltin- formation/aarhus umsetzungsbericht 2021 de clean bf.pdf."

Please let me know if you feel that your request has not been met. Should you require further information on the procedure or any other questions, please do not hesitate to contact me.

Remedies An appeal against this decision may be lodged within one month of notification. The appeal must be lodged with the Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection,

Yours sincerely on behalf of

gez. Dr. Jan Schärlau

§ Federal Ministry I for the Environment, Nature

Conservation, Building and Nuclear Safety

Special House Notice Unit Z 1 4Bonn , June 22, 2017

Publication of draft bills and external comments on the BMUB website.

Transparent legislative processes are essential for the acceptance of political decisions. As an open and dialog-oriented ministry, our ministry also serves as a role model when it comes to transparency. For this reason, all draft bills of the BMUB will be published on our website in the future. The same applies to the comments on these draft laws, i.e. those of the Länder, the municipal umbrella organizations, the expert groups and associations involved, as well as other bodies or other persons.

In the future, the following should therefore be observed when working on drafts for laws and legal ordinances:

a. In future, draft bills and ordinances are to be published on the BMUB website when the consultation process with the Länder and associations is initiated. In each individual case, the relevant department shall expressly inform the State Secretary responsible of the intended publication when submitting the bill for the initiation of departmental consultation.

b. As part of the initiation of departmental coordination, the department shall involve the departments concerned in writing in accordance with Section 48 (3) GGO in order to reach agreement with them on the publication of the draft bill and shall obtain the agreement of the BK Office.

c. The intended publication must be indicated in the letter sent to the countries and associations for consultation by means of the following text block:

"Please note that the comments you submit will generally be published on our website. This also includes names and other personal data contained in the document. By sending the statement, you agree that the

2

personal data contained in the statement will be published. We ask you to remove from the document any information that you do not agree to be published.

If you object to the publication on the Internet as a whole, the ministry page will le- dically note that a statement was submitted and who wrote it. Please send us electronically readable documents, if possible as barrier-free PDF documents and as Word files, so that barrier-free access to the documents can be made possible. With the submission, you grant the BMUB the rights of use for any graphics, images, maps and similar material included for publication on the BMUB website for an unlimited period of time."

d. Simultaneously with the initiation of the Länder and association hearings, the specialist unit sends the draft bill in PDF and Word format as well as a brief description of the content and classification of the legislative process to the ÖA unit for publication and informs it of the comment period.

When documents intended for publication are sent to the Public Relations Department, care must be taken to ensure that they are barrier-free. If required, the Public Relations Department will prepare barrier-free documents. For this purpose, all documents must also be submitted in Word format. If the documents contain graphics, images, maps and similar material, proof of the right to use these for publication on the BMUB website for an unlimited period of time must be provided. All documents must be submitted in electronically readable form (scanned versions in particular are not suitable).

e. After the deadline for comments has expired, the department electronically sends the comments approved for publication in PDF format to the Public Affairs department and encloses a list of all parties who have submitted comments. The file names of the comments must allow clear assignment to the list of participants.

f. After the government draft has been passed, the BMUB website refers to the Documentation and Information System of the German Bundestag (DIP) for the further parliamentary procedure.

gez. Flasbarth

Annex_7_PowerOfAttorney_Klinger_GER_ENG_Redacted.pdf

Vollmacht Den Rechtsanwälten und Rechtsanwältinnen Dr. Reiner Geulen, Prof. Dr. Remo Klinger & Dr. Caroline Douhaire LL.M., Karoline Borwieck, David Krebs und Lukas Rhiel wird hiermit in Sachen Beschwerde Aarhus Compliance Comitee wegen Beteiligungsrechten nach Art. 8 der Konvention Vollmacht erteilt 1. zur Prozessführung (u.a. nach §§ 81 ff. ZPO) einschließlich der Befugnis zur Er-

hebung und Zurücknahme von Widerklagen; 2. zur Vertretung in sonstigen Verfahren (insbesondere vor den Verwaltungsbehör-

den) und bei außergerichtlichen Verhandlungen aller Art; 3. zur Vertretung und Verteidigung in Strafsachen und Bußgeldsachen (§§ 302, 374

StPO) einschließlich der Vorverfahren sowie (für den Fall der Abwesenheit) zur Vertretung nach § 411 II StPO und mit ausdrücklicher Ermächtigung auch nach §§ 233 I, 234 StPO, zur Stellung von Straf- und anderen nach der Strafprozessord- nung zulässigen Anträgen und von Anträgen nach dem Gesetz über die Entschä- digung für Strafverfolgungsmaßnahmen;

4. zur Begründung und Aufhebung von Vertragsverhältnissen und zur Abgabe und Entgegennahme von einseitigen Willenserklärungen (z. B. Kündigungen) in Zu- VDPPHQKDQJ PLW GHU REHQ XQWHU ÄZHJHQ���³ JHQDQQWHQ $QJHOHJHQKHLW;

Die Vollmacht gilt für alle Instanzen und erstreckt sich auch auf Neben-, Folge- und Vorverfahren aller Art (z. B. Widerspruchsverfahren, Erörterungsterminen im Planfest- stellungsverfahren, einstweiliger Rechtsschutz in Verwaltungsverfahren, Arrest und einstweilige Verfügung, Kostenfestsetzungs-, Zwangsvollstreckungs-, Interventions-, Zwangsversteigerungs-, Zwangsverwaltungs- und Hinterlegungsverfahren sowie Insol- venzverfahren). Sie gilt auch rückwirkend für alle vorgenommenen Verfahrenshandlun- gen, einschließlich der Klageerhebung. Sie umfasst insbesondere die Befugnis, Zustellungen zu bewirken und entgegenzu- nehmen, die Vollmacht ganz oder teilweise auf andere zu übertragen (Untervollmacht), Rechtsmittel einzulegen, zurückzunehmen oder auf sie zu verzichten, den Rechtsstreit oder außergerichtliche Verhandlungen durch Vergleich, Verzicht oder Anerkenntnis zu erledigen, Geld, Wertsachen und Urkunden, insbesondere auch den Streitgegenstand und die von dem Gegner, von der Justizkasse oder von sonstigen Stellen zu erstatten- den Beträge entgegenzunehmen sowie Akteneinsicht zu nehmen.

Berlin, 06.07.2023 Jürgen Resch .............................................. ......................................................... (Datum) (Unterschrift und Name)

«««««««««««««««««««««««««««««««««««. (Adresse)

Power of attorney

The Lawyers and attorneys at law Dr. Reiner Geulen, Prof. Dr. Remo Klinger & Dr. Caroline Douhaire LL.M., Karoline Borwieck, David Krebs and Lukas Rhiel

is hereby appointed in the matter of

Complaint Aarhus Compliance Committee because of participation rights according to Art. 8 of the Convention

Power of attorney granted

1. to conduct legal proceedings (inter alia pursuant to Sections 81 et seq. of the German Code of Civil Procedure), including the power to raise and withdraw counterclaims;

2. for representation in other proceedings (especially before administrative authorities) and in out-of-court negotiations of any kind;

3. for representation and defense in criminal cases and cases involving fines (Sections 302, 374 of the Code of Criminal Procedure), including preliminary proceedings, as well as (in the event of absence) for representation pursuant to Section 411 II of the Code of Criminal Procedure and, with express authorization, also pursuant to Sections 233 I, 234 of the Code of Criminal Procedure, for filing criminal and other motions admissible under the Code of Criminal Procedure and motions pursuant to the Act on Compensation for Criminal Prosecution Measures;

4. to establish and terminate contractual relationships and to issue and receive unilateral declarations of intent (e.g. notices of termination) in connection with the matter mentioned above under "due to...". above;

The power of attorney shall apply to all instances and shall also extend to ancillary, subsequent and preliminary proceedings of all kinds (e.g. opposition proceedings, hearings in plan approval proceedings, interim relief in administrative proceedings, attachment and temporary injunction, cost assessment, compulsory execution, intervention, forced sale, forced administration and deposit proceedings as well as insolvency proceedings). It also applies retroactively to all procedural actions taken, including the filing of a lawsuit.

It includes, in particular, the authority to effect and receive service, to transfer the power of attorney in whole or in part to others (sub-power of attorney), to file, withdraw or waive appeals, to settle the legal dispute or out-of-court negotiations by way of settlement, waiver or acknowledgement, to accept money, valuables and documents, in particular also the subject matter of the dispute and the amounts to be reimbursed by the opposing party, the court cashier or other bodies, and to inspect files.

Berlin, 06.07. 2023Jürgen Resch .............................................. ......................................................... (date) (signature and name)

.......................................................................................................... (address)

Subscribe to DeepL Pro to translate larger documents. Visit www.DeepL.com/pro for more information.