Data Usage at the HDL

Purposes of Use at the Health Data Lab (HDL)

Pseudonymized health data cannot be used for just any purpose. The purposes of use that serve the public interest are legally defined. You can read more about the specific purposes of use here.

How Data Usage Works at the Health Data Lab (HDL)

The use of data at the HDL follows strict legal regulations. Below, we explain the steps required to use data for research purposes:

1. Application Submission:
Researchers submit their application through the HDL’s application portal. The application must include:

  • Identification of the applicants using an electronic ID card (eID).
  • A detailed description of the research question.
  • Specification of the legal purpose under which the research question falls.

Specification of the specific data required for the research. More information about the available data can be found in the dataset description (provided in German).

2. Application Review:
HDL staff carefully review the application to assess:

  • Whether the research question aligns with the specified legal purpose.
  • Whether the requested data are appropriate for the research question.
  • Whether the test dataset is kept as small as possible to minimize the risk of re-identification.

3. Access to the Secure Processing Environment:

  • Once the application is approved, researchers gain access to a secure processing environment. This environment functions like a virtual computer. Within this space, researchers can access the data approved for their project and the necessary software for analysis.
  • No data can be downloaded from the secure processing environment, and no additional personal data can be introduced into it.

4. Conducting the Analysis:

  • Researchers develop and test their analyses using data subsets. Data subsets consist of smaller portions of data that may include different types of data.
  • These analyses are researcher-written instructions that perform specific statistical evaluations. For example: Determining the proportion of men and women in the sample of "diabetics in Germany."
  • Once the analysis instructions are finalized, researchers notify the HDL. The HDL then executes the researchers’ analysis instructions on the complete pseudonymized dataset and reviews the results before transmitting them. Researchers do not have direct access to the full pseudonymized dataset.

5. Review and Transmission of Results:

  • The results are typically provided in the form of tables. These tables do not contain any personal health data but instead provide answers to statistical questions. Continuing the previous example: The table would show the percentage of men and women among diabetics in Germany.
  • HDL staff carefully review the result tables to minimize the risk of re-identification. If everything is in order, the results are sent to the researchers.
  • A new application is required for any further use of the data.

This process ensures that data usage complies with legal requirements and that data protection is maintained.