Feature Importance Command#

The feature-importance command extracts and visualizes feature importance or coefficients from trained models across all outer cross-validation iterations.

Usage#

respredai feature-importance --output <output_folder> --model <model_name> --target <target_name> [options]

Options#

Required#

  • --output, -o - Path to the output folder containing trained models

    • Must be the same folder used in the run command

    • Must contain a models/ subdirectory with saved model files

    • Example: ./output/ or ./out_run_example/

  • --model, -m - Model name to extract importance from

    • Must match one of the models trained in the pipeline

    • Examples: LR, RF, XGB, CatBoost, Linear_SVC

    • Case-sensitive

  • --target, -t - Target name to extract importance for

    • Must match one of the targets from the training pipeline

    • Example: Target1, Ciprofloxacin_R

    • Case-sensitive

Optional#

  • --top-n, -n - Number of top features to display (default: 20)

    • Features are ranked by absolute importance

    • Range: 1 to total number of features

    • Example: --top-n 30 for top 30 features

  • --no-plot - Skip generating the barplot

    • Only CSV file will be created

    • Useful for batch processing or server environments

  • --no-csv - Skip generating the CSV file

    • Only plot will be created

    • Useful if you only need visualizations

  • --seed, -s - Random seed for SHAP reproducibility

    • Ensures reproducible SHAP values across runs

    • Only affects models using SHAP fallback

Supported Models#

The command uses native importance when available, with SHAP as fallback:

Native Importance (Primary)#

Linear Models (Coefficients)

  • LR (Logistic Regression) - Uses coefficient values

  • Linear_SVC (Linear SVM) - Uses coefficient values

Tree-Based Models (Feature Importances)

  • RF (Random Forest) - Uses Gini importance

  • XGB (XGBoost) - Uses gain-based importance

  • CatBoost - Uses feature importance scores

For tree-based models importance values are always positive.

SHAP Fallback#

For models without native importance/coefficients, SHAP (SHapley Additive exPlanations) values are computed as a fallback:

  • MLP (Multi-Layer Perceptron) - Uses KernelExplainer

  • RBF_SVC (RBF SVM) - Uses KernelExplainer

  • TabPFN - Uses KernelExplainer

SHAP values are computed on the test fold of each outer CV iteration and aggregated across folds. The mean absolute SHAP value represents feature importance.

Note: SHAP computation with KernelExplainer can be slow for large datasets.

Output Files#

The command generates files in the following structure:

output_folder/
└── feature_importance/
    └── {target}/
        ├── {model}_feature_importance.csv         # Native importance (if available)
        ├── {model}_feature_importance.png
        ├── {model}_feature_importance_shap.csv    # SHAP importance (fallback)
        └── {model}_feature_importance_shap.png

Files have _shap suffix when SHAP is used instead of native importance.

CSV File Format (Native)#

For models with native importance:

Column

Description

Feature

Feature name

Mean_Importance

Mean importance across folds (signed for linear models)

Std_Importance

Standard deviation across folds

Abs_Mean_Importance

Absolute mean importance (used for ranking)

Mean±Std

Formatted string with mean ± std

CSV File Format (SHAP)#

For models using SHAP fallback:

Column

Description

Feature

Feature name

Mean_Abs_SHAP

Mean absolute SHAP value across folds

Std_Abs_SHAP

Standard deviation across folds

Mean±Std

Formatted string with mean ± std

Features are sorted by importance (absolute mean value).

Across all folds:

  • Calculate mean importance for each feature

  • Calculate standard deviation (uncertainty measure)

  • Rank features by importance

Plot Color Coding#

The barplot uses different colors to indicate importance type:

Method

Color

Meaning

SHAP

Orange

Mean absolute SHAP value

Native (tree-based)

Blue

Feature importance (always positive)

Native (linear, positive)

Red

Positive coefficient

Native (linear, negative)

Green

Negative coefficient

Error bars show standard deviation across CV folds.

Examples#

Basic Usage#

Extract top 20 features for Logistic Regression on Target1:

respredai feature-importance --output ./output --model LR --target Target1

Custom Number of Features#

Show top 5 features:

respredai feature-importance -o ./output -m RF -t Target2 --top-n 5

Multiple Models#

Extract importance for multiple models (run separately):

respredai feature-importance -o ./output -m LR -t Target1
respredai feature-importance -o ./output -m RF -t Target1
respredai feature-importance -o ./output -m XGB -t Target1

CSV Only (No Plot)#

Generate only the CSV file for automated analysis:

respredai feature-importance -o ./output -m LR -t Target1 --no-plot

Plot Only (No CSV)#

Generate only the visualization:

respredai feature-importance -o ./output -m RF -t Target1 --no-csv

See Also#