Train Command#
The train command trains models on the entire dataset using GridSearchCV for hyperparameter tuning, then saves the best model for each model-target combination.
Usage#
respredai train --config <path_to_config.ini> [options]
Options#
Required#
--config, -c- Path to the configuration file (INI format)
Optional#
--quiet, -q- Suppress banner and progress output--models, -m- Override models (comma-separated)--targets, -t- Override targets (comma-separated)--output, -o- Override output folder--seed, -s- Override random seed
How It Differs from run#
Aspect |
|
|
|---|---|---|
Purpose |
Evaluate model performance |
Train models for cross-dataset validation |
CV Strategy |
Nested CV (outer + inner) |
Single CV (only for HP tuning) |
Data Split |
Multiple train/test splits |
Uses entire dataset |
Output |
Metrics, confusion matrices |
Trained model files, ready for evaluation on another dataset |
Model Files |
Per-fold models (optional) |
Single model per target |
Configuration Parameters#
The train command uses the same configuration file as run, but some parameters are ignored:
Parameter |
Used |
Notes |
|---|---|---|
|
Yes |
Path to training data |
|
Yes |
Target columns to train |
|
Yes |
Features to scale |
|
Yes |
Used for grouped CV during HP tuning |
|
Yes |
Models to train |
|
Yes |
CV folds for hyperparameter tuning |
|
No |
Only used by |
|
Yes |
Enables threshold optimization |
|
Yes |
Method for threshold optimization |
|
Yes |
Objective function for threshold optimization |
|
Yes |
Enables probability calibration |
|
Yes |
Calibration method (sigmoid or isotonic) |
|
Yes |
CV folds for probability calibration |
|
Yes |
Random seed for reproducibility |
|
Yes |
Parallel jobs |
|
Yes |
Output directory |
|
No |
Always |
|
Yes |
Compression level for saved models |
Output Structure#
output_folder/
├── trained_models/
│ ├── LR_Target1.joblib
│ ├── LR_Target2.joblib
│ ├── RF_Target1.joblib
│ ├── ...
│ └── training_metadata.json
└── reproducibility.json # Reproducibility manifest
Model Bundle Contents#
Each .joblib file contains:
{
"model": <fitted_classifier>,
"transformer": <fitted_scaler>,
"threshold": 0.42, # Calibrated threshold
"hyperparams": {"C": 0.1, ...},
"feature_names": [...], # Original feature names
"feature_names_transformed": [...], # After one-hot encoding
"target_name": "Target1",
"model_name": "LR",
"training_timestamp": "2025-12-11T..."
}
training_metadata.json#
Contains information needed for evaluation on new data:
{
"features": ["age", "sex", "category_col"],
"continuous_features": ["age"],
"categorical_features": ["sex", "category_col"],
"targets": ["Target1", "Target2"],
"feature_names_transformed": ["age", "sex_M", "category_col_A", "..."],
"feature_dtypes": {"age": "float64", "sex": "object"},
"training_data_path": "data.csv",
"training_timestamp": "2025-12-11T..."
}
Example Workflow#
# 1. Train models
respredai train --config config.ini --output ./trained_output/
# 2. Later, evaluate on new data
respredai evaluate \
--models-dir ./trained_output/trained_models \
--data new_patients.csv \
--output ./evaluation_results/
Calibration#
Probability Calibration (if calibrate_probabilities = true):
Wraps best estimator in
CalibratedClassifierCVFits on full dataset with internal CV
Saves calibrated model
Threshold Optimization (if calibrate_threshold = true):
Runs GridSearchCV to find best hyperparameters
Selects method: OOF (< 1000 samples) or CV (>= 1000 samples)
OOF: Gets out-of-fold predictions, optimizes threshold using configured objective
CV: Uses
TunedThresholdClassifierCVfor integrated threshold optimizationSaves optimized threshold with the model
Both calibrations are automatically applied during evaluation.
See Also#
Evaluate Command - Apply trained models to new data
Run Command - Full nested CV pipeline for model evaluation
Create Config Command - Configuration file setup
Validate Config Command - Validate configuration before training