Within the EU member states, the data minimisation principle can be interpreted in different ways.
The HealthData@EU Pilot project, aimed at laying the groundwork for the future European Health Data Space (EHDS, European Health Data Space), put this principle to the test. In particular, it explored one of its five use cases between 2022 and 2024: the development of models predicting the risk of cardiovascular disease from patients’ care pathway histories.
Four countries were involved: Denmark, Finland, Norway, and France, with a node hosted at the University of Bordeaux.
In Norway, concerns were raised about the risk of reidentifying individuals given the quantity and level of detail of the requested variables. Consequently, it was decided to omit exact dates and to reduce the granularity of diagnostic codes.
In France, the CNIL judged that accessing data covering the entire population was not justified by the use case. It requested that the sample be limited to 12 million individuals, i.e., the largest volume the French Health Data Hub had used for a study to date.
Access issues contributed to the extension of the HealthData@EU Pilot project, which was originally planned to run over two years. Beyond the divergent interpretations of the minimisation principle, the documentation requirements varied. The diversity of the actors’ statuses did not help. Nor did the lack of clear processes to govern access by certain health bodies to data outside this domain (socioeconomic data, for example).
The core infrastructure hosted on AWS
The assessment of the use cases was delivered in December 2024. The architecture document had been completed in November.
With these milestones in place, the development of the EHDS continues, with a roadmap through early 2027, roughly a release every four months.
The infrastructure must connect three categories of participants:
- The member states, each designating a national contact point (which maintains a national metadata catalog) and appointing one or more bodies responsible for examining requests to access health data
- The EU institutions, bodies and agencies, represented by a service of the European Commission (the UHDAS, Union Health Data Access Service) which also has a role in examining requests
- Other authorised participants (digital infrastructure or research consortia, international organisations…)
At the heart of the EHDS infrastructure is the central platform, which aggregates metadata from the national catalogs. It is hosted on AWS, leveraging, among other services, EFS (file storage), KMS (encryption), ECR (container registry), OpenSearch and DocumentDB.
The access requests are submitted via the central platform, using a common form. The exchanges rely on eDelivery – an implementation of the AS4 messaging protocol that today constitutes a foundational block of Europe’s digital framework.
DCAT-AP, Piveau-Hub, Simpl… European building blocks to structure the EHDS
To harmonise the description of datasets, an extension of the DCAT-AP specification was developed. This is based on a W3C standard (the RDF Data Catalogue Vocabulary ontology). It feeds a large number of EU data portals. There are other extensions as well, for example for statistical data and geographic data.
To ensure interoperability with other European data spaces, another EU-funded component is brought into play: the Simpl middleware. It has already been piloted within a project that linked the EHDS with five other European data spaces (public procurement data, language data, Destination Earth, and more).
Other European components underpinning the EHDS include EU Login (authentication; with Keycloak for authorization), eTranslation (machine translation), Europa Analytics, Corporate Notification Service and Interoperability Test Bed (conformity testing). The metadata catalog relies on Piveau-Hub, whose interface has been adapted to the Europa Component Library guidelines (ECL).
Cost estimation and quality indicators for datasets will be addressed in 2026
Since version 4, released in May 2025, the UI is multilingual (all official EU languages plus Norwegian and Icelandic for all static content).
Version 5 (September 2025) added the ability to request partial access to a dataset. It also introduced a Drupal backend for static content management, a HealthDCAT-AP spec explorer, a dataset description assistant and automated translation of incoming datasets and requests.
With version 6 (January 2026), it is planned to be able to receive on the central platform updates on the status of requests. An EU registry of access decisions is also in the works, as well as the ability to request modifications to an access authorization or to appeal a negative decision.
Version 7 (May 2026) is expected to allow time-based constraints on specific statuses, as well as calculating the fees associated with requests. It should also include a register of sanctions and penalties imposed.
A quality and usefulness indicator for datasets is on the roadmap for version 8 (September 2026). The same goes for the assignment of specific authorization roles within an organization.
The EHDS is architected as a microservices platform with REST APIs. OpenSearch is used to index data (a SPARQL query editor is available on the central platform); PostgreSQL stores statistics; MongoDB preserves information about uploaded files.