01

Antimicrobial resistance prediction

MALDI-TOF mass spectrometrysupervised learningdeep neural networks

Antimicrobial resistance is one of the defining clinical challenges of our generation. By the time culture-based diagnostics return a phenotype, the patient has often already been on broad-spectrum empirical therapy for days. The question I work on is whether machine learning, applied to data already produced in the lab (mass spectra, microbiology results, electronic health records), can shorten that loop.

My focus is on MALDI-TOF mass spectrometry as a low-cost, high-throughput substrate for resistance prediction. The signal is rich but noisy, the labels are imbalanced, and the data spans heterogeneous clinical sites. The work involves clinical and demographic features integration, calibrated probability outputs, threshold optimization for clinical decision-making, and evaluation grounded in clinical utility rather than benchmark performance.

02

Multi-centre data harmonisation

batch-effect correctionComBatbatch-mixing diagnostics

High-throughput biomedical data does not arrive clean. Different instruments, calibration schedules, reagent lots, and operating procedures introduce systematic variation that machine learning models will happily memorise instead of the biology. Cross-site generalisation lives or dies on how seriously we treat batch effects.

I work on harmonisation methods that integrate cleanly into ML pipelines: ComBat-style corrections (and their cross-validation pitfalls), batch-mixing diagnostics that quantify residual structure after correction, and validation strategies that preserve the independence of held-out sites.

03

Computational patient phenotyping

unsupervised clusteringsurvival analysismulti-state modelling

A diagnostic label rarely captures the full picture of a patient. Within any clinical cohort there are subgroups whose trajectories, treatment responses, and outcomes diverge in ways that simple stratification misses. Unsupervised methods can help surface those subgroups, but only if the resulting clusters are clinically meaningful and statistically reproducible.

My work in this area combines unsupervised learning on heterogeneous patient data with survival analysis and multi-state models, so that discovered phenotypes are anchored to outcomes that matter (event-free survival, transitions between disease states, treatment success). The aim is interpretable subgroups that clinicians can recognise and act on.

04

Genomics & metagenomics analyses

microbial community networksstructural & somatic variantsmutational signaturesCRISPR-Cas9

Microbial communities, somatic variants, and mutational signatures all share a common statistical property: high-dimensional, sparse, and structured. Network-based modelling of microbial communities offers a route to pathogen detection that respects ecological context, while mutational signature analysis turns variant catalogues into hypotheses about underlying biological processes.

I also maintain an interest in computational tools for CRISPR-Cas9 genome editing, particularly methods to compare nuclease activity across clinically relevant contexts. The thread linking these projects is the same: applying careful quantitative methods to genomic data where signal is easily lost in noise.

Browse the project catalog → Read the publications →