Research
Four research threads I am currently pulling. They are connected by a single belief: that quantitative methods, properly applied, can move the needle on real biomedical problems.
01
Antimicrobial resistance is one of the defining clinical challenges of our generation. By the time culture-based diagnostics return a phenotype, the patient has often already been on broad-spectrum empirical therapy for days. The question I work on is whether machine learning, applied to data already produced in the lab (mass spectra, microbiology results, electronic health records), can shorten that loop.
My focus is on MALDI-TOF mass spectrometry as a low-cost, high-throughput substrate for resistance prediction. The signal is rich but noisy, labels are imbalanced, and the data spans heterogeneous clinical sites and patient populations. The work involves clinical and demographic feature integration, calibrated probability outputs, threshold optimisation for clinical decision-making, and evaluation grounded in clinical utility rather than benchmark performance.
Two methodological threads run through this work. The first is cross-site harmonisation: different instruments, calibration schedules, and operating procedures introduce systematic variation that ML models will happily memorise instead of the biology, so I work on ComBat-style corrections, batch-mixing diagnostics, and validation strategies that improve models' generalizability. The second is generative modelling of mass spectra to address labelled-data scarcity and broaden the regime in which deep models become viable on clinical MALDI-TOF data.
02
Infectious disease is more than the resistance phenotype of a single isolate. Individual patients carry risk that depends on phenotype, immune status, and clinical trajectory; at the population scale, pathogens evolve, spread, and circulate through environments and communities. The two views share the same quantitative vocabulary: high-dimensional structured data, time-to-event outcomes, and dynamics that resist simple summaries.
At the patient scale, I work on computational phenotyping: unsupervised learning on heterogeneous clinical data combined with survival analysis and multi-state models, so that fragile populations (transplant recipients, critically ill patients) are stratified by outcomes that matter (event-free survival, transitions between disease states, treatment success). At the population scale, the work spans pathogen surveillance through metagenomic monitoring of circulating strains, and epidemic modelling for outbreak dynamics.
03
Genomic data carries signal that is easily lost in noise: structural variants spanning complex regions, somatic mutations diluted by tumour heterogeneity, mutational signatures that need careful deconvolution to be interpretable. Each step from raw reads to clinically usable variant is a small statistical decision that compounds with the next.
My work in this area builds methods and tooling to support structural and somatic variant discovery from short- and long-read sequencing, and to make mutational signature analysis tractable in clinically relevant genomic contexts.
04
A recurring part of my work is building tools: small, focused, open-source libraries that solve a biomedical question and can be reused by others. Some are born inside a specific project and grow into standalone packages; others start as standalone tools designed to plug into pipelines.