Projects - Ettore Rocchi

Flagship ecosystem

MaldiSuite

GitHub ↗ Docs ↗

A Python ecosystem for MALDI-TOF spectral processing and analysis in antimicrobial resistance research. Three sklearn-compatible packages that chain into an end-to-end clinical AMR pipeline.

MaldiAMRKit

Preprocess

Smoothing, baseline correction, peak detection (including persistent homology), alignment strategies, and flexible binning. Parallelised, sklearn-integrated, opinion-free about which method you should use.

github.com/EttoreRocchi/MaldiAMRKit ↗

MaldiBatchKit

Harmonise

Batch-effect correction tailored to MALDI-TOF data, with leakage-safe cross-validation handling and diagnostics that quantify residual structure after correction. Multi-site by design.

github.com/EttoreRocchi/MaldiBatchKit ↗

MaldiDeepKit

Classify

Deep learning models for MALDI-TOF spectra, with calibrated outputs and clinically oriented evaluation (decision thresholds, cost-sensitive metrics, cross-site validation).

github.com/EttoreRocchi/MaldiDeepKit ↗

Python scikit-learn PyTorch PyPI MALDI-TOF AMR

Clinical ML frameworks

End-to-end frameworks for clinical questions

Self-contained ML frameworks built around a specific clinical question. Each one ships as an installable package with a single configuration file, runs end-to-end from raw inputs to reproducible results and figures, and has been used in peer-reviewed work.

Framework

ResPredAI

GitHub ↗

A reproducible ML framework for predicting antimicrobial resistance from clinical and microbiology data.

Built around nested cross-validation with proper hyperparameter selection, nine model architectures (linear, tree-based, kernel methods, neural networks), probability calibration, and decision-threshold optimization. The whole pipeline is configured via a single .ini file and produces reproducible results, including calibration plots, performance tables, and SHAP-based interpretability reports.

Used in: Bonazzetti C, Rocchi E, et al. npj Digital Medicine, 2025

Python XGBoost PyTorch CLI PyPI AMR

Framework

phenocluster

GitHub ↗

An unsupervised framework for clinical phenotype discovery with survival and multi-state outcomes.

Combines unsupervised clustering with survival analysis and multi-state trajectory modelling to surface clinically meaningful patient subgroups from heterogeneous cohorts. Designed for prognostic interpretation: every cluster comes with hazard ratios, transition probabilities, statistical reports, and graphical summaries.

Built for: heterogeneous patient cohorts where outcomes are time-to-event or multi-state

Python clustering survival multi-state phenotyping

Standalone tools

Single-purpose libraries & bioinformatic tools

Focused tools that solve one problem well. Some plug into the frameworks above, others stand on their own and are used by collaborators across labs.

combatlearn GitHub ↗

Scikit-learn compatible ComBat batch-effect correction.

Integrates Johnson, Fortin (neuroComBat), and Chen (CovBat) harmonization methods into scikit-learn pipelines, with leakage-safe cross-validation handling. Plugs into existing workflows without breaking the scikit-learn API contract.

Python scikit-learn PyPI ComBat

nestkit GitHub ↗

Nested cross-validation with calibration, threshold optimization, and statistical tests.

A toolkit for the kind of model evaluation that survives peer review: nested CV with proper hyperparameter selection, probability calibration, decision-threshold tuning, and built-in statistical comparisons between models.

Python scikit-learn nested CV calibration

CATS GitHub ↗

Automated Cas9 PAM-compatibility comparison with ClinVar integration.

A bioinformatic tool for comparing Cas9 nucleases across clinically relevant genomic contexts. Detects overlapping PAM sites between variants and identifies allele-specific targets arising from pathogenic mutations. Published in Frontiers in Genome Editing (2025).

Python CRISPR genomics bioinformatics

APOBECSeeker GitHub ↗

APOBEC-style mutation identification from multiple sequence alignment.

Detects mutational patterns consistent with APOBEC enzyme activity from multiple sequence alignments, supporting downstream statistical description of mutational signatures.

Python mutational signatures APOBEC MSA

CAMISIM-BrokenStick GitHub ↗

Broken-stick model extension for metagenomic simulation.

Extends the CAMISIM metagenomic simulator with a broken-stick abundance model and a configurable number of strains, producing synthetic communities with controlled relative-abundance distributions for benchmarking metagenomic pipelines.

Python metagenomics simulation benchmarking

Read about the research → See the publications →