Feature Importance Command ========================== The ``feature-importance`` command extracts and visualizes feature importance or coefficients from trained models across all outer cross-validation iterations. Usage ----- .. code-block:: bash respredai feature-importance --output --model --target [options] Options ------- Required ~~~~~~~~ - ``--output, -o`` - Path to the output folder containing trained models - Must be the same folder used in the ``run`` command - Must contain a ``models/`` subdirectory with saved model files - Example: ``./output/`` or ``./out_run_example/`` - ``--model, -m`` - Model name to extract importance from - Must match one of the models trained in the pipeline - Examples: ``LR``, ``RF``, ``XGB``, ``CatBoost``, ``Linear_SVC`` - Case-sensitive - ``--target, -t`` - Target name to extract importance for - Must match one of the targets from the training pipeline - Example: ``Target1``, ``Ciprofloxacin_R`` - Case-sensitive Optional ~~~~~~~~ - ``--top-n, -n`` - Number of top features to display (default: 20) - Features are ranked by absolute importance - Range: 1 to total number of features - Example: ``--top-n 30`` for top 30 features - ``--no-plot`` - Skip generating the barplot - Only CSV file will be created - Useful for batch processing or server environments - ``--no-csv`` - Skip generating the CSV file - Only plot will be created - Useful if you only need visualizations - ``--seed, -s`` - Random seed for SHAP reproducibility - Ensures reproducible SHAP values across runs - Only affects models using SHAP fallback Supported Models ---------------- The command uses native importance when available, with SHAP as fallback: Native Importance (Primary) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Linear Models (Coefficients)** - **LR** (Logistic Regression) - Uses coefficient values - **Linear_SVC** (Linear SVM) - Uses coefficient values **Tree-Based Models (Feature Importances)** - **RF** (Random Forest) - Uses Gini importance - **XGB** (XGBoost) - Uses gain-based importance - **CatBoost** - Uses feature importance scores For tree-based models importance values are always positive. SHAP Fallback ~~~~~~~~~~~~~ For models without native importance/coefficients, SHAP (SHapley Additive exPlanations) values are computed as a fallback: - **MLP** (Multi-Layer Perceptron) - Uses KernelExplainer - **RBF_SVC** (RBF SVM) - Uses KernelExplainer - **TabPFN** - Uses KernelExplainer SHAP values are computed on the test fold of each outer CV iteration and aggregated across folds. The mean absolute SHAP value represents feature importance. Note: SHAP computation with KernelExplainer can be slow for large datasets. Output Files ------------ The command generates files in the following structure: :: output_folder/ └── feature_importance/ └── {target}/ ├── {model}_feature_importance.csv # Native importance (if available) ├── {model}_feature_importance.png ├── {model}_feature_importance_shap.csv # SHAP importance (fallback) └── {model}_feature_importance_shap.png Files have ``_shap`` suffix when SHAP is used instead of native importance. CSV File Format (Native) ~~~~~~~~~~~~~~~~~~~~~~~~ For models with native importance: .. list-table:: :header-rows: 1 :widths: 30 70 * - Column - Description * - ``Feature`` - Feature name * - ``Mean_Importance`` - Mean importance across folds (signed for linear models) * - ``Std_Importance`` - Standard deviation across folds * - ``Abs_Mean_Importance`` - Absolute mean importance (used for ranking) * - ``Mean±Std`` - Formatted string with mean ± std CSV File Format (SHAP) ~~~~~~~~~~~~~~~~~~~~~~ For models using SHAP fallback: .. list-table:: :header-rows: 1 :widths: 30 70 * - Column - Description * - ``Feature`` - Feature name * - ``Mean_Abs_SHAP`` - Mean absolute SHAP value across folds * - ``Std_Abs_SHAP`` - Standard deviation across folds * - ``Mean±Std`` - Formatted string with mean ± std Features are **sorted by importance** (absolute mean value). Across all folds: - Calculate **mean importance** for each feature - Calculate **standard deviation** (uncertainty measure) - Rank features by importance Plot Color Coding ~~~~~~~~~~~~~~~~~ The barplot uses different colors to indicate importance type: .. list-table:: :header-rows: 1 :widths: 20 20 60 * - Method - Color - Meaning * - SHAP - Orange - Mean absolute SHAP value * - Native (tree-based) - Blue - Feature importance (always positive) * - Native (linear, positive) - Red - Positive coefficient * - Native (linear, negative) - Green - Negative coefficient Error bars show standard deviation across CV folds. Examples -------- Basic Usage ~~~~~~~~~~~ Extract top 20 features for Logistic Regression on Target1: .. code-block:: bash respredai feature-importance --output ./output --model LR --target Target1 Custom Number of Features ~~~~~~~~~~~~~~~~~~~~~~~~~ Show top 5 features: .. code-block:: bash respredai feature-importance -o ./output -m RF -t Target2 --top-n 5 Multiple Models ~~~~~~~~~~~~~~~ Extract importance for multiple models (run separately): .. code-block:: bash respredai feature-importance -o ./output -m LR -t Target1 respredai feature-importance -o ./output -m RF -t Target1 respredai feature-importance -o ./output -m XGB -t Target1 CSV Only (No Plot) ~~~~~~~~~~~~~~~~~~ Generate only the CSV file for automated analysis: .. code-block:: bash respredai feature-importance -o ./output -m LR -t Target1 --no-plot Plot Only (No CSV) ~~~~~~~~~~~~~~~~~~ Generate only the visualization: .. code-block:: bash respredai feature-importance -o ./output -m RF -t Target1 --no-csv See Also -------- - :doc:`run-command` - Train models with nested CV and save model files - :doc:`train-command` - Train models on entire dataset for cross-dataset validation - :doc:`create-config-command` - How to create the configuration file