Create Config Command#

The create-config command generates a template configuration file that you can customize for your data.

Usage#

respredai create-config <output_path.ini>

Options#

Required#

output_path - Path where the template configuration file will be created
- Must end with .ini extension
- Parent directory must exist or be creatable
- File will be overwritten if it already exists

Description#

This command creates a ready-to-use configuration template with all required sections pre-populated and inline comments explaining each parameter.

The generated template follows the INI format required by the run command.

Generated Template#

The command creates a file with the following structure:

[Data]
data_path = ./data/my_data.csv
targets = Target1,Target2
continuous_features = Feature1,Feature2

[Pipeline]
# Available models: LR, MLP, XGB, RF, CatBoost, TabPFN, RBF_SVC, Linear_SVC, KNN
models = LR,XGB,RF
outer_folds = 5
inner_folds = 3
outer_cv_repeats = 1
calibrate_threshold = false
threshold_method = auto
calibrate_probabilities = false
probability_calibration_method = sigmoid
probability_calibration_cv = 5

[Reproducibility]
seed = 42

[Log]
# Verbosity levels: 0 = no log, 1 = basic logging, 2 = detailed logging
verbosity = 1
log_basename = respredai.log

[Resources]
# Number of parallel jobs (-1 uses all available cores)
n_jobs = -1

[ModelSaving]
# Enable model saving for resuming interrupted runs
enable = true
# Compression level for saved models (1-9, higher = more compression but slower)
compression = 3

[Output]
out_folder = ./output/

Customization Steps#

After generating the template, customize it for your data:

1. Update Data Section#

[Data]
data_path = ./path/to/your/data.csv
targets = AntibioticA,AntibioticB
continuous_features = Feature1,Feature3,Feature4
# group_column = PatientID  # Optional

data_path: Path to your CSV file
targets: Comma-separated list of target columns (binary classification)
continuous_features: Features to scale with StandardScaler (all others are one-hot encoded)
group_column (optional): Column name for grouping multiple samples from the same patient/subject to prevent data leakage

2. Select Models#

[Pipeline]
models = LR,RF,XGB,CatBoost

Use respredai list-models to see all available models.

3. Configure Cross-Validation#

outer_folds = 5  # For model evaluation
inner_folds = 3  # For hyperparameter tuning

outer_folds: Number of folds for performance evaluation
inner_folds: Number of folds for GridSearchCV hyperparameter tuning

4. Configure Threshold Optimization (Optional)#

calibrate_threshold = true
threshold_method = auto
threshold_objective = youden
vme_cost = 1.0
me_cost = 1.0

calibrate_threshold: Enable decision threshold optimization
- true: Calibrate threshold using the specified objective
- false: Use default threshold of 0.5
threshold_method: Method for threshold optimization (only used when calibrate_threshold = true)
- auto: Automatically choose based on sample size (OOF if n < 1000, CV otherwise)
- oof: Out-of-fold predictions method - aggregates predictions from all CV folds into a single set, then finds one global threshold across all concatenated samples
- cv: TunedThresholdClassifierCV method - calculates optimal threshold separately for each CV fold, then aggregates (averages) the fold-specific thresholds
- Key difference: oof finds one threshold on all concatenated OOF predictions (global optimization), while cv finds per-fold thresholds then averages them (fold-wise optimization then aggregation)
threshold_objective: Objective function for threshold optimization
- youden: Maximize Youden’s J statistic (Sensitivity + Specificity - 1) - balanced approach
- f1: Maximize F1 score - balances precision and recall
- f2: Maximize F2 score - prioritizes recall over precision (reduces VME at potential cost of increased ME)
- cost_sensitive: Minimize weighted error cost using vme_cost and me_cost
vme_cost / me_cost: Cost weights for cost-sensitive threshold optimization
- VME (Very Major Error): Predicted susceptible when actually resistant
- ME (Major Error): Predicted resistant when actually susceptible
- Higher vme_cost relative to me_cost will shift threshold to reduce false susceptible predictions

5. Configure Repeated Cross-Validation (Optional)#

outer_cv_repeats = 3

outer_cv_repeats: Number of times to repeat outer cross-validation (default: 1)
- 1: Standard (non-repeated) cross-validation
- >1: Repeated CV with different random shuffles for more robust estimates
- Total iterations = outer_folds × outer_cv_repeats
- Example: 5 folds × 3 repeats = 15 total train/test iterations

6. Configure Probability Calibration (Optional)#

calibrate_probabilities = true
probability_calibration_method = sigmoid
probability_calibration_cv = 5

calibrate_probabilities: Enable post-hoc probability calibration
- true: Apply CalibratedClassifierCV to calibrate predicted probabilities
- false: Use uncalibrated probabilities (default)
- Applied after Applied after hyper-parameters tuning and before threshold tuning
probability_calibration_method: Calibration method
- sigmoid: Platt scaling - fits logistic regression (default, works well for most cases)
- isotonic: Isotonic regression - non-parametric (requires more data)
probability_calibration_cv: CV folds for calibration (default: 5)
- Internal cross-validation used by CalibratedClassifierCV
- Must be at least 2

Note: Calibration diagnostics (Brier Score, ECE, MCE, reliability curves) are always computed regardless of this setting.

8. Configure Uncertainty Quantification (Optional)#

[Uncertainty]
margin = 0.1

margin: Margin around the decision threshold for flagging uncertain predictions (0-0.5)
- Predictions with probability within margin of the threshold are flagged as uncertain
- Default: 0.1
- Uncertainty scores and flags are included in evaluation output
Uncertainty score computation:
```
distance = |probability - threshold|
max_distance = max(threshold, 1 - threshold)
uncertainty = 1 - (distance / max_distance)
is_uncertain = distance < margin
```
- Score ranges from 0 (confident, at probability extremes) to 1 (uncertain, at threshold)
- When threshold is calibrated, uncertainty is computed relative to the calibrated threshold

9. Configure Preprocessing (Optional)#

[Preprocessing]
ohe_min_frequency = 0.05

ohe_min_frequency: Minimum frequency for categorical values in OneHotEncoder
- Categories appearing below this threshold are grouped into an “infrequent” category
- Values in (0, 1): proportion of samples (e.g., 0.05 = at least 5% of samples)
- Values >= 1: absolute count (e.g., 10 = at least 10 occurrences)
- Omit or comment out to disable (keep all categories)
- Useful for reducing noise from rare categorical values and preventing overfitting

10. Adjust Resources#

[Resources]
n_jobs = -1  # Use all cores

-1: Use all available CPU cores
1: No parallelization (useful for debugging)
N: Use N cores

11. Configure Model Saving#

[ModelSaving]
enable = true
compression = 3

enable: Set to true to save models every folds
compression: 0-9 (0=no compression, 3=balanced, 9=maximum)

12. Set Output Location#

[Output]
out_folder = ./results/

The folder will be created if it doesn’t exist.