Health Data

What is health data?

Health data generally includes all information that provides insight into a person's health status, such as illnesses, treatments, or lifestyle. This data is collected during doctor visits, in hospitals, or through specialized examinations.

The health data at the Health Data Lab (HDL) consists of billing data from all individuals with statutory health insurance in Germany. This includes, for example, diagnoses, prescribed medications, and information about hospital stays. In the future, data from electronic patient records (ePA) will also be added. The transmission of these records is expected to begin gradually in mid-2025.

All health data at the HDL are pseudonymized. They are already delivered in a pseudonymized form and never contain directly identifiable characteristics, such as a person's name or address.

The Dataset Description

Future applicants can use our dataset description to learn more about the data available at the Health Data Lab. It is provided in German language.

The Public Use File

The HDL provides a Public Use File for future applicants. The Public Use File has the same structure as the HDL data but is fully anonymized and contains no correlations. This allows researchers to familiarize themselves with the structure of the research data and begin developing their analysis scripts before or during the application process. It is provided in German language.

Currently, the Public Use File is available for Data Models 1 and 2. It can be accessed via this link.

Graphic on the data available in the Health Data Lab broken down by sectors: Outpatient Sector, Inpatient Sector, Health Data Lab Records, Master Data, Other Sectors. For each sector the following details are listed: Outpatient Sector: Diagnoses (ICD), procedures (OPS), type of treatment: beginning/end, case costs Inpatient Sector: Treatment and duration, care, ventilation hours, pre- and post-hospital care services and charges, diagnoses (ICD) & procedures (OPS) Master Data: Year of birth, sex, health insurance company number, postal code of residence Other Sectors: Data characteristics on remedies and aids, ambulance services, home healthcare, midwife assistance and digital health applications, ePA

The extended dataset for analyses (currently under construction)

Note on expansion of the dataset

The new HDL is currently in its establishment process which will take place in phases. Within the framework of the German morbidity-oriented risk-structure compensation scheme (Morbi-RSA), data for the reporting years 2009 to 2015 have already been transferred to the former Data Preparation Centre by the Federal Office for Social Security. The HDL will adopt this dataset. A new, slightly expanded dataset with data from the reporting year 2016 will be transferred directly by the National Association of Statutory Health Insurance Funds to the HDL for the first time. New data fields (in accordance with Section 3 (1) DaTraV) will be added gradually in subsequent years. Structural changes will result in a new data body and the calendar year 2024 marks the first time that the entire dataset as specified by law will be transmitted for the reporting year 2023. This will include the services offered by other service providers as well as further information on inpatient care.