France possesses one of the world’s most valuable datasets and most Europeans do not know it exists. The Health Data Hub — Plateforme des Données de Santé in French — contains pseudonymized health data from 67 million individuals, representing near-complete coverage of France’s population through the Système National des Données de Santé (SNDS). Hospitalizations, prescriptions, diagnoses, medical imaging, biological test results, and causes of death — all linked at the individual level, all available for research under strict access protocols.
This asset makes France a potential AI drug discovery and clinical research superpower. France 2030 has committed €50 million directly to the Hub’s technical infrastructure, with a further €250 million directed to companies and academic institutions building AI applications on top of this data. No comparable publicly accessible national health database exists in the United States, Germany, or the United Kingdom — making the Health Data Hub a genuine structural competitive advantage.
Architecture and Governance
The Health Data Hub was established by law in November 2019 (Law 2019-774 on the organization and transformation of the healthcare system) and became operational in March 2020 — accidentally launching just as COVID-19 hit France, which immediately validated its use case. The Hub is organized as a public interest grouping (groupement d’intérêt public, GIP) under the joint supervision of the Ministry of Health, INRIA, INSERM, and the Caisse Nationale d’Assurance Maladie (CNAM), which administers France’s national health insurance.
The data architecture is layered:
Layer 1: SNDS Core — The historical backbone. Pseudonymized data from France’s universal health insurance system covering all reimbursed care since 1992. This includes:
- 1.3 billion medical consultations per year
- 300 million prescriptions annually
- All hospitalizations (Programme de Médicalisation des Systèmes d’Information, PMSI)
- Causes of death from the national death registry
- 400,000+ healthcare providers’ billing records
Layer 2: EHR Integration — Electronic health records from participating hospital groups (AP-HP Paris, HCL Lyon, AP-HM Marseille), adding clinical detail not available in insurance claims: vital signs, laboratory values, imaging reports, pathology findings. As of 2026, six CHU hospital groups contribute EHR data.
Layer 3: Genomic and Biobank Data — In partnership with France Genomique, the Hub is integrating genomic sequencing data from the Plan France Médecine Génomique 2025, which aimed to sequence 235,000 patients. This genotype-phenotype linkage is transformational for drug target identification.
Layer 4: Medical Imaging Repository — A dedicated Medical Imaging Repository (MIR), funded with €10 million from France 2030, stores anonymized CT scans, MRIs, and pathology slides from the PMSI hospital dataset. By 2025, the MIR contained 6 million imaging studies — one of the world’s largest publicly curated medical imaging datasets.
GDPR Compliance: The Microsoft Controversy and Its Resolution
In 2021, the Health Data Hub faced a major governance crisis: La Quadrature du Net and the Conseil d’État raised concerns that hosting French health data on Microsoft Azure — the original infrastructure partner — created a pathway for US authorities to access French patient data under the US CLOUD Act. The Conseil d’État ruled that the arrangement was legally permissible but not ideal, ordering the Ministry of Health to accelerate a transition to European cloud infrastructure.
By 2023, the Hub completed its transition to OVHcloud (French) and Scaleway (French, Iliad Group) for primary data storage, with Microsoft Azure retained only for specific computation workloads under enhanced contractual protections. This resolution positioned the Health Data Hub as a model for European health data sovereignty — directly aligned with France 2030’s digital sovereignty objectives and creating a commercial case study for OVHcloud’s sovereign cloud offering.
Access Protocols and Research Use
Access to Health Data Hub data is granted through a formal application process managed by the Institut National des Données de Santé (INDS):
- Applications reviewed by an independent scientific committee and the CNIL (France’s data protection authority)
- Approval timeline: 8-12 weeks for standard academic research, 4-6 weeks for expedited public health research
- Access modality: secure remote environment (no data download); researchers analyze data within the Hub’s computational environment
- Cost: tiered fees from €0 (public health research) to €250,000/year (commercial pharmaceutical research)
As of Q1 2026, the Hub had approved 387 research projects since 2020, spanning oncology (largest category, 31%), cardiovascular disease (18%), rare diseases (15%), infectious disease (12%), and mental health (10%). Thirty-one pharmaceutical companies have active research access agreements, including Sanofi (partner since 2022), Roche, AstraZeneca, and Servier.
AI Applications: The France 2030 Investment Thesis
France 2030’s €250 million investment in AI health applications built on the Health Data Hub targets three commercial categories:
Diagnostic AI: Companies including Owkin (French-US, Paris/New York), Therapanacea (Paris), and Incepto Medical (Paris) have built diagnostic AI models on Hub data for oncology image interpretation, treatment response prediction, and adverse event prediction. Bpifrance’s i-AI program has funded €85 million in diagnostic AI companies since 2022.
Drug Discovery and Repurposing: The Hub’s population-scale longitudinal data enables pharmacoepidemiological studies that identify drug repurposing opportunities — finding new uses for approved drugs by observing unexpected outcome correlations in real-world data. Sanofi has used Hub data for five such studies; Servier for three. BioMérieux uses Hub infectious disease data for diagnostic algorithm validation.
Health Economics: The CNAM uses Hub data for health technology assessment — evaluating whether new drugs and devices justify reimbursement — at a population scale that no other EU country can match. This strengthens France’s negotiating position with pharmaceutical companies on drug pricing.
INRIA and INSERM Partnerships
The Hub’s two primary research institution partners have distinct roles:
INRIA (Institut National de Recherche en Informatique et en Automatique) provides computational infrastructure, machine learning methodology, and privacy-preserving computation research (federated learning, differential privacy). INRIA’s €30 million partnership with the Hub, funded under France 2030, focuses on building next-generation privacy-preserving AI tools that can extract research insights from Hub data without centralizing individual records.
INSERM coordinates the Hub’s biomedical research agenda — prioritizing which research questions to enable, connecting clinical investigators at France’s CHU hospitals with Hub data infrastructure, and ensuring research outputs are translated into clinical practice guidelines. INSERM’s France 2030 allocation includes €25 million for Hub-related research programs.
Competitive Landscape: UK, Germany, Estonia
UK Biobank and NHS Digital — The UK’s most comparable assets are the UK Biobank (500,000 deeply phenotyped research participants with genomic data) and NHS Digital (equivalent to France’s SNDS). UK Biobank is globally pre-eminent for genetic studies but covers only 500,000 individuals; NHS Digital covers 60 million but has historically had more restrictive access. Post-Brexit alignment complexity has slowed NHS data’s integration into European research consortia.
Germany’s NAKO Cohort — 200,000 participants, deeply phenotyped, but far smaller than the French SNDS. Germany’s federal data protection law complexity makes national health data integration significantly harder than France’s centralized model.
Estonia’s X-Road — Often cited as the gold standard for national digital health infrastructure, Estonia’s linked health data system covers all 1.4 million Estonians with impressive longitudinal depth. Unmatched per-capita, but the absolute scale is incomparable to France’s 67 million records.
France’s Health Data Hub is the most scientifically powerful general-population health database in continental Europe. The critical challenge remains access speed: the current 8-12 week approval timeline for academic research is too slow for competitive pharmaceutical research, where timing of observational studies can determine patent strategy and regulatory submissions. France 2030’s investment in access infrastructure aims to reduce commercial research turnaround to 4 weeks by 2027.
Related: Sanofi France 2030 | Pandemic Preparedness | Clinical Trials in France | France 2030 AI Sector