Train Command#

The train command trains models on the entire dataset using GridSearchCV for hyperparameter tuning, then saves the best model for each model-target combination.

Usage#

respredai train --config <path_to_config.ini> [options]

Options#

Required#

  • --config, -c - Path to the configuration file (INI format)

Optional#

  • --quiet, -q - Suppress banner and progress output

  • --models, -m - Override models (comma-separated)

  • --targets, -t - Override targets (comma-separated)

  • --output, -o - Override output folder

  • --seed, -s - Override random seed

How It Differs from run#

Aspect

run Command

train Command

Purpose

Evaluate model performance

Train models for cross-dataset validation

CV Strategy

Nested CV (outer + inner)

Single CV (only for HP tuning)

Data Split

Multiple train/test splits

Uses entire dataset

Output

Metrics, confusion matrices

Trained model files, ready for evaluation on another dataset

Model Files

Per-fold models (optional)

Single model per target

Configuration Parameters#

The train command uses the same configuration file as run, but some parameters are ignored:

Parameter

Used

Notes

data_path

Yes

Path to training data

targets

Yes

Target columns to train

continuous_features

Yes

Features to scale

group_column

Yes

Used for grouped CV during HP tuning

models

Yes

Models to train

inner_folds

Yes

CV folds for hyperparameter tuning

outer_folds

No

Only used by run for nested CV

calibrate_threshold

Yes

Enables threshold optimization

threshold_method

Yes

Method for threshold optimization

threshold_objective

Yes

Objective function for threshold optimization

calibrate_probabilities

Yes

Enables probability calibration

probability_calibration_method

Yes

Calibration method (sigmoid or isotonic)

probability_calibration_cv

Yes

CV folds for probability calibration

seed

Yes

Random seed for reproducibility

n_jobs

Yes

Parallel jobs

out_folder

Yes

Output directory

[ModelSaving] enable

No

Always true for train (models are always saved)

[ModelSaving] compression

Yes

Compression level for saved models

Output Structure#

output_folder/
├── trained_models/
│   ├── LR_Target1.joblib
│   ├── LR_Target2.joblib
│   ├── RF_Target1.joblib
│   ├── ...
│   └── training_metadata.json
└── reproducibility.json                  # Reproducibility manifest

Model Bundle Contents#

Each .joblib file contains:

{
    "model": <fitted_classifier>,
    "transformer": <fitted_scaler>,
    "threshold": 0.42,                  # Calibrated threshold
    "hyperparams": {"C": 0.1, ...},
    "feature_names": [...],             # Original feature names
    "feature_names_transformed": [...], # After one-hot encoding
    "target_name": "Target1",
    "model_name": "LR",
    "training_timestamp": "2025-12-11T..."
}

training_metadata.json#

Contains information needed for evaluation on new data:

{
    "features": ["age", "sex", "category_col"],
    "continuous_features": ["age"],
    "categorical_features": ["sex", "category_col"],
    "targets": ["Target1", "Target2"],
    "feature_names_transformed": ["age", "sex_M", "category_col_A", "..."],
    "feature_dtypes": {"age": "float64", "sex": "object"},
    "training_data_path": "data.csv",
    "training_timestamp": "2025-12-11T..."
}

Example Workflow#

# 1. Train models
respredai train --config config.ini --output ./trained_output/

# 2. Later, evaluate on new data
respredai evaluate \
    --models-dir ./trained_output/trained_models \
    --data new_patients.csv \
    --output ./evaluation_results/

Calibration#

Probability Calibration (if calibrate_probabilities = true):

  1. Wraps best estimator in CalibratedClassifierCV

  2. Fits on full dataset with internal CV

  3. Saves calibrated model

Threshold Optimization (if calibrate_threshold = true):

  1. Runs GridSearchCV to find best hyperparameters

  2. Selects method: OOF (< 1000 samples) or CV (>= 1000 samples)

  3. OOF: Gets out-of-fold predictions, optimizes threshold using configured objective

  4. CV: Uses TunedThresholdClassifierCV for integrated threshold optimization

  5. Saves optimized threshold with the model

Both calibrations are automatically applied during evaluation.

See Also#