Matthieu Chaldebas

ML Scientist · Genomics

Building end-to-end ML frameworks to decode how genetic variants drive disease

Professional Experience

Computational Biologist

St. Giles Laboratory of Human Genetics of Infectious Diseases

The Rockefeller University, New York & Imagine Institute, Paris | 2021 – Present

PhD Candidate

2022 – Present

  • Designed and shipped 5ULTRA end-to-end: trained ML models on 28 million gnomAD variants, built a public prediction webserver, and published findings in AJHG (2026).
  • Applied 5ULTRA to identify novel disease-causing mechanisms in cancer (ABI1, NRAS) and infectious disease (TNF, RPSA), demonstrating real-world variant prioritization at scale.
  • Selected to deliver a Platform Presentation at the ASHG 2025 Annual Meeting in Boston — one of the top-tier oral presentations at the field's flagship conference.

Bioinformatics Assistant

2021 – 2022

  • Scaled and maintained variant-calling pipelines across 26,000+ WES/WGS datasets using GATK, VEP, and custom Snakemake workflows on HPC clusters.
  • Performed bulk and single-cell RNA-seq analyses (differential expression, pathway enrichment, Seurat) across multiple disease cohorts.
  • Co-developed AGAIN, a genome-wide intronic splicing variant detector, contributing key features and WES/WGS validation (PNAS, 2023).

Data Scientist Intern

BioMérieux, Lyon, France | 6 Months – 2021

  • Optimized supervised ML models (classification) to predict antibiotic resistance from microbial genomic data.
  • Improved model generalization by designing phylogeny-aware cross-validation schemes, reducing overfitting across bacterial clades.

Data Engineer Intern

CEA, Paris Saclay, France | 4 Months – 2019

  • Engineered data management pipelines for diverse biological datasets, enabling systematic phylogenetic analysis and 3D protein modeling.

Research Intern

Imagine Institute, Paris, France | 2 Months – 2018

  • Executed Western blot and quantitative RT-PCR assays to assess the effects of specific drugs on target gene expression.

Publications & Research Highlights

Collaborative Contributions to High-Impact Studies

Beyond my own primary publications, a significant part of my work at The Rockefeller University has been enabling research for the lab's broader scientific portfolio — providing the genomic data infrastructure that underlies discoveries in infectious disease genetics and immunology.

Across cohort studies spanning thousands of patients, I have performed germline variant calling and annotation (WES/WGS via GATK & VEP), bulk and single-cell RNA-seq analysis (differential expression, pathway enrichment, Seurat), and custom pipeline development for disease-specific analyses — work that has directly supported the identification of novel monogenic etiologies of infectious disease and immune dysregulation.

Representative publications supported by this work:

  • Nature — Arias et al., 2024; Chan et al., 2024
  • Science Immunology — Ogishi et al., 2025; Yatim et al., 2026
  • Immunity — Ogishi et al., 2024
  • Genome Medicine — Matuozzo et al., 2023; Conil et al., 2024
  • Journal of Experimental Medicine — Chan et al., 2024
  • The Journal of Clinical Investigation — Arango-Franco et al., 2024

Technical Skills

Machine Learning & Programming

Machine Learning Python R Bash PHP, HTML & CSS

Genomics & Transcriptomics

WES / WGS Analysis RNA-Seq (Bulk & scRNA) Biostatistics Proteomics Structural Biology

Engineering & Infrastructure

Version Control (Git) Cluster Computing Snakemake Docker / Singularity Conda

Education

PhD Candidate in Bioinformatics

Université Paris Cité, France | 2022 – 2026

M.Sc. in Biotechnology Engineering

Sup'Biotech, Paris, France | 2016 – 2021

Specialization in health data science.

Exchange semester at UCSD.