SYNTHEMA-Synthetic generation of haematological data over federated computing frameworks
Hematological diseases (HDs) are a large group of disorders resulting from quantitative or qualitative abnormalities of blood cells, lymphoid organs and coagulation factors. Despite most of them are rare, the overall number of HD affected patients worldwide is important, placing a considerable economic burden on healthcare systems and societies.
Aim of the Project
SYNTHEMA, a cross-border hub to develop & validate federated learning supported #AI techniques for anonymization & synthetic data generation in rare hematological diseases
This project aims to generate reliable, high-quality synthetic data that can shape new virtual patients to further enhance diagnostic capacity, assess treatment options and predict outcomes in rare hematological diseases.
Despite the existence of several collaborative research groups at national and EU level, current clinical approaches are often ineffective, particularly for rarest conditions, due to the relatively low number of patients per disease and the high number of unconnected clinical entities. SYNTHEMA aims to establish a cross-border data hub where to develop and validate innovative AI-based techniques for clinical data anonymisation and synthetic data generation (SDG), to tackle the scarcity and fragmentation of data and widen the basis for GDPR-compliant research in RHDs. The project will focus on two representative RHD use cases: sickle-cell disease (SCD) and acute myeloid leukaemia (AML).
Role of Vicomtech
Vicomtech is the leader of WP3 Data anonymisation and synthetic data generation pipelines. And it also participates in the design and implementation three key aspects of the project: the federated learning platform, metrics for the evaluation of the privacy of the generated syntetic data and fidelity and utility of the generated synthetic data. Vicomtech will make progress on Deep Learning technologies applied to the generation of synthetic data, such as Generative Adversarial Networks or Diffusion models. Regarding the metrics, Vicomtech will make progress on the definition an implementation of innovative metrics both general and specific for the use cases of the project. Additionally, it will work on visual analytics techniques to visualize these two key aspects of the synthetic data, it's utility and privacy.
SYNTHEMA will develop a federated learning (FL) infrastructure, equipped with secure multiparty computation (SMPC) and differential privacy (DF) protocols, connecting clinical centres bringing standardised, interoperable multimodal datasets and computing centres from academia and SME. This framework will be utilised to train the developed algorithms and perform SMPC-based global model aggregation in a privacy-preserving fashion. The resulting data will be validated for their clinical value, statistical utility and residual privacy risks. The project will develop legal and ethical frameworks to guarantee privacy by-design in the collection and processing of health-related personal data and attain an ethics-wise algorithm co-creation.
Sector of application
Project outcomes, including pipelines, standards and data, will be made openly available to stakeholders in the healthcare, academia and industry field, and contribute to existing rare disease registries.
The consortium is led by Universidad Politécnica de Madrid, who also lead the federated learning infrastructure design and implementation. The rest of the partners are:
SBA RESEARCH GEMEINNUTZIGE GMBH AT
ALMA MATER STUDIORUM - UNIVERSITA DIBOLOGNA
THE EUROPEAN INSTITUTE FOR INNOVATION THROUGH HEALTH DATA
FUNDACIO HOSPITAL UNIVERSITARI VALL D'HEBRON
HUMANITAS MIRASOLE SPA
UNIVERSITAIR MEDISCH CENTRUM UTRECHT
ASSISTANCE PUBLIQUE HOPITAUX DE PARIS
CHARITE - UNIVERSITAETSMEDIZIN BERLIN
GLSMED LEARNING HEALTH SA
UNIVERSITA DEGLI STUDI DI PADOVA
UNIVERSITY OF SOUTHAMPTON
Horizon Europe Program
Looking for support for your next project? Contact us, we are looking forward to helping you.